ChatGPT: we tested its experimental audio and photo version

Released on September 25, an experimental version of ChatGPT, OpenAI's generative artificial intelligence (AI), is only accessible to paying subscribers paying the tidy sum of 23 euros per month

ChatGPT: we tested its experimental audio and photo version

Released on September 25, an experimental version of ChatGPT, OpenAI's generative artificial intelligence (AI), is only accessible to paying subscribers paying the tidy sum of 23 euros per month. However, it deserves our curiosity, because it gives this AI an “eye”, allowing it to decipher images, as well as a “mouth”, allowing it to hold real oral discussions.

Do these new “senses” change the usefulness of this conversational AI, which until now has only been through writing? To verify this, we interacted with her for a few hours, on a smartphone and in French. Here's a first look:

Almost natural exchanges

When you ask a question orally to this ChatGPT in beta, its vocal response is identical to the one it would give in writing. However, there is something more natural, pleasant and relaxing to exchange orally when you are calm and have time.

This will be particularly obvious to people who don't like typing on a smartphone. Those who are very curious will be tempted to follow up on the AI's answers, as if they were discussing with a tireless scholar (although sometimes imprecise), as this example shows:

The exchange is all the more fluid as ChatGPT understands oral requests surprisingly well, even complex ones, and it speaks in clear and logical French, with an almost natural voice, although mixed with a slight English coloring. .

Even if small vocal glitches sometimes crop up, we don't have the impression of speaking to a hard-of-hearing robot like Google's voice assistant, which only really understands simple requests like the weather or the age of celebrities. Incidentally, we see that ChatGPT is making progress: in the context of our tests, it made fewer clear errors than in the past, even if it sometimes meant responding evasively.

Faced with intimate questions, this experimental ChatGPT responds in a disembodied way. He expresses himself coldly, in numbered points, wasting no opportunity to warn that, as a machine, he is devoid of feelings. His jokes fall flat, but his tone of voice is pleasant and his determination to serve us so stubborn that some may be able to relate to it. Probably not to the point of falling in love with it, like the character played by Joaquin Phoenix in the film Her (2013), but perhaps enough to reserve a place for it in their daily lives.

The fantasy of the talking computer, however, has not been realized: ChatGPT knows how to do few things at the moment. He is incapable of programming a memo or booking a train ticket and his main skill is to retrieve information from the Internet and then summarize it. Without shying away from political, medical or psychological questions, for which he often provides quite good advice, even if he answers in a very general way, without taking any risks. And without ever citing its sources, which will not fail to annoy the editors of information sites, some of whom consider themselves plundered, even short-circuited.

Decipher images

In addition to speech, this new ChatGPT is gifted with sight: to this ChatGPT Vision, as its publisher OpenAI has called it, you can submit images photographed by smartphone or retrieved from the Internet. At first glance, his analyzes appear impressive: he reads subway maps, deciphers graphs or maps and generally understands well the constituent elements of the images submitted to him.

When we walk through a zoo, for example, he often correctly identifies the felines and gives some explanations about them – in writing because ChatGPT's voice doesn't work when his eye is working. It identifies houseplants and gives watering advice. When he is shown a section of the earth or an anatomical view of a skull with intimidatingly brief legends, he explains them clearly.

ChatGPT Vision can also give aesthetic or gastronomic advice, suggest a recipe by analyzing a photo of a fridge and give suggestions for improvement for a photo or interior decoration. But for now, he is reading these images poorly. His advice, which is quite general, can make you think, but is often insufficiently precise, creative, and personalized to really help.

Furthermore, when we ask him a specific question, he gets it wrong. His answers to mathematical problems are often wrong, his subway directions may be wrong, his readings of graphs approximate or inaccurate, his interpretations of cartoons poor, although confidently stated. When shown a bicycle or a car engine, he may misidentify the oil tank or the derailleur.

This very first version of ChatGPT Vision often gives vague or inaccurate answers, and for this reason is not yet a convincing tool. On the other hand, ChatGPT's ability to converse orally is promising. To the point, perhaps, of prefiguring the future of voice AI.