Key Takeaways
- Google’s Project Astra was a big focus at Google I/O 2024, but it may not beat OpenAI’s GPT-4o.
- Project Astra has cool features like memory and mini-games, but OpenAI is ahead in multimodal AI interface development.
- While Project Astra showed potential in a demo, it may still need more work and time to compete with GPT-4o.
Artificial intelligence was the main focus of Google I/O 2024, and Google even poked fun of itself by counting the number of times it said “AI” in the keynote (121). Of the countless AI features that were announced at Google I/O, the big one was Project Astra. It’s a multimodal AI interface that you can interact with using vision and audio. We saw a pre-recorded demo of what Project Astra can do during the Google I/O keynote, but I spent about 10 minutes with Project Astra hands-on afterward. My takeaway was this: Project Astra looks really cool, but I’m doubtful it’s going to replace the Google Assistant or even the classic Gemini on Android phones anytime soon.
Google has been playing catch-up in the AI race from the start, and that’s still true here. Just days ago, OpenAI unveiled GPT-4o, a new multimodal AI model that can handle simultaneous vision, audio, and text inputs. While the multimodal functionality is not fully rolled out yet in ChatGPT, the model’s core feature set and speed improvements are already live in ChatGPT and APIs. Though I haven’t spent hands-on time with GPT-4o and its multimodal interface yet, I strongly suspect that OpenAI’s model is ahead of Google’s Project Astra.
Related
What is GPT-4o? Everything you need to know about the new OpenAI model that everyone can use for free
GPT-4o has just been announced, and it’s a faster, better successor to GPT-4 Turbo that everyone can use. Here’s everything you need to know!
What I learned from my Project Astra demo
Project Astra appears limited right now, and it could be a while before it rolls out
The Project Astra demo comes in two variants: a Pixel 8 Pro running Astra and a large touchscreen display setup with an overhead camera and microphone running Astra. Google told me that there was no special hardware in the bigger demo unit — it was essentially just a blown-up version of what Project Astra would look like running on a smartphone. There are at least three ways to interact with Project Astra, which are through the touchscreen, with the microphone and your voice, and via the camera feed.
The coolest part of the demo was the “freeform” version of Project Astra, where you can interact with the AI interface casually and conversationally. Project Astra has memory, and as of now can remember things it sees and hears for a minute. For example, when shown stuffed animals and after being told their names, Project Astra could identify what breed of dog was on the table and remember its name. A Google rep said AI memory is something its research teams are exploring, but added that Project Astra’s memory could likely be scaled further to remember more things.
Beyond the freeform component, the Project Astra demo included a few mini-games to show off what the multimodal AI interface can do. One of them was Pictionary, which was like playing the classic game with AI. After drawing something on the touchscreen, Project Astra would try to guess what it was, and it got it right every time. Project Astra didn’t recognize it immediately every time, but it used back-and-forth banter with the Pictionary player using voice to figure it out. In one instance, Project Astra correctly named the movie that a Pictionary drawing was based on after receiving audio instructions. It was the perfect example of multimodal AI — it used vision to detect the drawing, audio to receive user instructions, and its knowledge base to match that drawing with a movie’s storyline.
Other demos I saw included alliteration, which created a story based on the objects it could see using alliteration, and storytelling, which made up a comprehensive story about what it could see in the frame.
The entire experience would have been impressive in a vacuum, but it seemed far less groundbreaking than OpenAI’s GPT-4o or even Google’s own pre-recorded Project Astra demo. In the video below, you can see what Google showed off Project Astra doing in the Google I/O keynote. It was a full-fledged AI interface with a ton of knowledge, and could even detect lines of code and tell the user exactly what they do.
To say that this was not what using Project Astra was like in the hands-on demo would be an understatement. Some things were the same, like the visual comparison elements and the alliteration component. But the really groundbreaking parts of Project Astra — such as the ability to name and explain specific parts of a speaker or dissect lines of code — weren’t part of the demo.
How does it compare to GPT-4o?
OpenAI confidently showed off more working features than Google
It seems like every new AI feature has these lofty use cases, only to underwhelm when they actually ship. As such, I’m hesitant to say that GPT-4o will definitely be better than Project Astra. For example, Project Astra’s response time in the demo seemed on-par with GPT-4o’s response time. However, I can say that it seems OpenAI is ahead of Google in the development of multimodal AI interfaces.
0:47
Related
We tried out GPT-4o, and it’s so much faster than GPT-4
If you’re curious how much faster GPT-4o is, the difference between it and GPT-4 is staggering.
OpenAI showed off countless real-time demos of GPT-4o powering ChatGPT in its keynotes, and that is a sign of its confidence about the model. The idea that Google is behind OpenAI really isn’t speculation, either — ChatGPT will get vision, audio, and text input in the coming weeks, while “some of these capabilities” in Project Astra will come to the Gemini app later this year, according to Google. Until we see Project Astra doing more things on real consumer phones, it’s hard to believe Project Astra is ready to beat GPT-4o just yet.