On Speaking Terms: Voice Conversations with a Bot

It’s now even easier to be on speaking terms with AI.

The frontier of computers continues to be artificial intelligence, and my most astonishing recent experience has been having voice conversations with a bot.

I click an icon of headphones to begin the conversation, then start speaking. The computer understands what I’m saying and responds at the appropriate times in a voice that sounds very much like that of a human.

I don’t need to do anything to indicate I’m ready to have it respond. Instead, it listens for a pause and then starts speaking.

In some ways, I’ve been talking to a computer for years, such as when I interact with Siri on my iPhone. But this is different—this partner in conversation knows just about everything, even when I ask about the fundamental nature of matter.

ChatGPT-4o (ChatGPT.com). This new frontier comes courtesy of OpenAI, the company that launched the AI frenzy back in late 2022. In May, the company released their next big thing: GPT-4o. The “o” stands for “omni” and is indicative of the “multimodal” nature of this new model. Plus, it’s said to be three times faster and much more efficient than the previous GPT-4.

Perhaps most amazing is that the free version now gives you access to this most powerful model. It also includes access to the web (which isn’t available in the free version of 3.5), as well as advanced features such as data analysis, file uploads, vision, and custom GPTs. The vision feature, for example, can describe what’s in an image, analyze charts and graphs, and more.

Simply go to Chatgpt.com and create an account. The free version limits the number of interactions during a three-hour period. If you exceed that limit, then it drops down to the older, less capable 3.5 version until the next three-hour window begins.

As I write this, the voice feature is available for free in the app for iPhone and Android phones, but it isn’t yet available on desktop computers.

To take advantage of some of the multimodal capabilities, such as creating images, you’ll need to subscribe to the pro version at $20 per month.

OpenAI is leading the way, but the other industry leaders are also moving ahead fast. Let’s look at two new offerings that were also announced in late April and early May: Meta AI and Google’s AI Overview.

Meta AI (Meta.ai). Facebook’s parent company Meta has long been working on its own large language model chatbot and finally released Meta AI in late April. It’s available within Facebook, Instagram, WhatsApp, and Messenger, as well as on the web at Meta.ai (log in with your Facebook account). Meta AI is yet another amazing AI offering. In addition to creating text, you can use Meta AI to create animations and high-quality images. It’s fast and generates images as you type. I asked it to create an animation of a mouse jumping over a pond. The result was delightful, but the animation is a second long, at most.

AI Overview (Google.com). It’s satisfying to know that Google read my April column, in which I suggested that Googling now feels antiquated and recommended you instead use Microsoft Co-pilot, Google Gemini, or Perplexity. The reason being, of course, is that it’s so much more convenient when asking a question to receive a well-organized summary, courtesy of artificial intelligence, instead of a simple list of links. Plus, each of these AI tools gives you the option of clicking on links to see the sources of information.

In May, Google introduced AI Overview: when you ask a question, Google returns an overview that uses AI to summarize all the information, just as the other new AI tools do. Certainly they got the idea to do that after reading my column.

Note, though, that the AI Overview feature only appears if you ask something that’s more complex. If you simply want the link to the Amazon website, you won’t see an AI Overview at the top.

It did well when I asked it about the fundamental nature of matter, but AI Overview got off to a rocky start in May when one user asked, “How many rocks should I eat?” It came up with some bizarre answers that quickly went viral: “According to geologists at UC Berkeley, you should eat at least one small rock per day. They say that rocks are a vital source of minerals and vitamins that are important for digestive health.”

Turns out, this response was based on a humorous article on The Onion website. Google has been working hard to make sure this doesn’t happen again.

Note: AI Overview and GPT-4o’s ability to carry on a conversation is gradually being rolled out but should be universally available by the time this column appears in print.

Find column archives at JimKarpen.com.