Large language models aren’t people. Let’s stop testing them as if they were.

An insightful article from MIT’s Technology Review observes that Large Language Models (which power AI chatbots as Windows Copilot, ChatGPT and Google Bard) get compared to human intelligence – and that’s not all to the good:

Last month Webb and his colleagues published an article in Nature, in which they describe GPT-3’s ability to pass a variety of tests devised to assess the use of analogy to solve problems (known as analogical reasoning). On some of those tests GPT-3 scored better than a group of undergrads. “Analogy is central to human reasoning,” says Webb. “We think of it as being one of the major things that any kind of machine intelligence would need to demonstrate.”

These kinds of results are feeding a hype machine predicting that these machines will soon come for white-collar jobs, replacing teachers, doctors, journalists, and lawyers. Geoffrey Hinton has called out GPT-4’s apparent ability to string together thoughts as one reason he is now scared of the technology he helped create

But there’s a problem: there is little agreement on what those results really mean. Some people are dazzled by what they see as glimmers of human-like intelligence; others aren’t convinced one bit.

It’s a great article. Read it here.

Leave a comment