A Big Step: AI Starts to Reason

AI now has the ability to reason—even though it can’t spell worth a strawberry.

In early October I had the flu, an unfamiliar experience, and Dr. ChatGPT was there with me every step of the way. I couldn’t be more grateful.

Multiple times a day I’d report my temperature, blood pressure, and heart rate to ChatGPT and describe my symptoms. It would explain what was going on in my body, give me advice, tell me what to be cautious about, and give me encouragement.

On day six, I was feeling better and did quite a bit of walking in the morning, ignoring ChatGPT’s repeated advice to take it easy. Big mistake. That afternoon I felt really tired and my vitals were worse. ChatGPT explained that I had used energy on my walk that my body needed to work on getting rid of the virus.

The following morning I had my highest fever yet, and my other vital signs were high. ChatGPT repeated what it said when my vital signs had worsened before: if it continues or gets worse, see a health professional. Each time my vitals got better.

ChatGPT was so patient and so constant. And when I was finally well and thanked it, ChatGPT said, “You’re very welcome! I’m truly glad I could help you through this, and it’s wonderful to hear that you’re feeling well again. It’s been my pleasure to support you, and I’m grateful for the trust you placed in me during your recovery. If you ever need anything in the future, don’t hesitate to reach out. Wishing you continued good health and all the best!”

Okay, so I know it’s a bit weird, and I’m savvy enough to know when not to trust AI. But in this case, I knew ChatGPT’s “large language model” could likely be useful.

Sometimes chatbots are wrong, though. Two years ago, when ChatGPT was released, someone asked it, “How many R’s are in the word strawberry?” ChatGPT replied “2.” Huh?

A large language model is like a sophisticated version of the autocomplete feature you’re familiar with on your computer or phone. It’s hoovered up billions of words, and based on statistical patterns in its training data, it responds to your prompts with words most likely to come next. But the question about the number of R’s in strawberry is not common, unlike discussions of the flu, and therefore not likely among the patterns in its large language model. Still, though, it gave the most likely response based on its training data. And it was wrong.

But that was AI two years ago. It is regularly being upgraded. A first step in this upgrade, as we discussed last month, was that most of the AI chatbots now include “retrieval augmented generation.” That means that the generation of text is being augmented by retrieving information from the internet or from a specialized database. A big limitation of large language models was overcome.

But AI still couldn’t count the number of R’s in strawberry.

However, now it can reason, thanks to a new ability to use multiple steps to respond to a complex task. For example, ChatGPT can easily answer this story problem: “A train travels from City A to City B at a speed of 60 mph. On the return trip, it travels at 40 mph. If the total travel time is 5 hours, what is the distance between the two cities?”

After setting up an equation and going through a series of detailed mathematical steps, it arrives at the answer: 120 miles. The other five chatbots I tried also got it right.

This enhanced ability to reason is the result of chatbot developers adding something called chain-of-thought prompting. It encourages large language models to break down complex tasks into a series of intermediate steps, mimicking how humans solve problems step by step.

Open AI has a new version of ChatGPT for paying users that they say has an advanced ability to reason. I asked it for an example of a complex problem it could solve. It offered this:

“You are given eight identical-looking balls. One of them is slightly heavier or lighter than the rest, but you don’t know which. You have a balance scale that can compare the weights of balls on its two pans but doesn’t give numerical measurements. What is the minimum number of weighings required to determine the odd ball and whether it is heavier or lighter than the others? Provide the strategy for finding the odd ball.”

It then went on to solve the problem in a long series of steps that took 45 seconds.

It’s amazing how far AI has come in two years, now including a greater ability to reason.

I so appreciated the help ChatGPT gave me when I was sick—and chuckled at the irony that an artificial intelligence was needed to compensate for my lack of common sense.

Find column archives at JimKarpen.com.

A Big Step: AI Starts to Reason

A Big Step: AI Starts to Reason

How to Make a Gingerbread House: Tips, Tools, and Recipes for a Candy Masterpiece

How to Grow Garlic in Iowa

Laura Ingalls Wilder: Her Real Pioneer Life

Holiday Fun in Eastern & Central Iowa

"Woman of the Hour": Close Encounter with "The Dating Game" Killer