If you read only one thing about AI/LLMs this week, make it this:

ChatGPT can now see, hear and speak

September 29, 2023

If you read only one thing about AI/LLMs this week, make it this "ChatGPT can now see, hear and speak" by OpenAI

TL;DR

‍
• ChatGPT can now interact via images and audio, in addition to text.
• Take a minute, watch the demos. If you have to pick one, watch the bike seat demo. It’s worth your time.
• Both of these demos are shown on mobile phones.

My take:

Don’t overthink it. This changes everything.

‍
• The most interesting thing to me is that these use cases are mobile first — this isn’t writing code, or blog posts, or productivity, it’s magic in your pocket. My first instinct was to downplay the voice piece — who cares if you can actually hear or speak to it — but it’s true that as intuitive as the chat UX is, a voice or image UX may be even more intuitive, and open up use cases which text doesn’t serve well. This also pairs with the recent announcement that DALL-E 3 is coming to ChatGPT in October, so that it can respond with images too.
• We at Stride have been emphasizing how important design and UX are when you are rethinking workflows, both with and without LLMs — and this just shows how crucial it is that you meet people where they are. Prompt engineering, as cool as it is, was just for early adopters — UXes are now evolving to make it clear to users how they should interact with and add context to this kind of product, which paves the way for deeper adoption. The power is still there, behind the polish.
• OpenAI got here first, but others are coming — Google at minimum, in the very near future. I can’t wait to see how open source models respond, and whether the current delta between the best cloud models and the best hosted models (there are some real differences!) narrows or widens when you go multimodal. Either way, buckle up! The pace of change continues to accelerate.

‍

Link to original post by Dan Mason on LinkedIn