ChatGPT4 Vision: What is it? And what can it do? The latest evolution of everyone’s fave AI assistant is here. ChatGPT4 has a raft of improvements, the most remarkable being the introduction of multi-modality. More or less, it can “speak”, “see” and “hear”.
Users now have the ability to engage in conversations with ChatGPT using their voice, and ChatGPT4 speaks back. You can also now upload images and use them in your interactions with ChatGPT4. This part of the operation is called ChatGPT4 Vision and it allows users to upload an image and ask ChatGPT all about it.
What is ChatGPT4 Vision?
Not only can ChatGPT4 Vision “read” an image and tell a user what it is, it can also generate images from natural language descriptions. So, users can “speak” an image into life.
The addition of image capabilities expands the ways in which ChatGPT can enhance your daily life. For instance, when travelling, you can capture an image of a landmark and engage in a live conversation to explore its noteworthy aspects.
Upon returning home, you can photograph your refrigerator and pantry to help decide what to prepare for dinner, even requesting step-by-step recipe guidance through follow-up questions. After dinner, you can assist your child with a maths problem by snapping a photo, circling the problem, and receiving hints from ChatGPT.
Image Discussion with ChatGPT4 Vision
It is not just about conversations about landmarks. You can employ the Vision feature to troubleshoot issues – say, a grill that won’t start, or a bike that need fixing.
To initiate image discussion, tap the photo button to capture or select an image. If you’re using iOS or Android, begin by tapping the plus button. Additionally, you can engage in discussions involving multiple images or employ the drawing tool to guide your assistant’s attention to a particular spot on the image.
Vision-based models present a unique set of challenges, including the risk of generating false information or relying on the model’s interpretation of images. The AI assistant is still prone to hallucinations so keep this in mind.
Expansion of Access:
Voice and image features will be accessible first to Plus and Enterprise users, with plans to expand to other user groups, including developers, in the near future. Some lucky people have been given early access, and have been Tweeting about all of the wow factors.
ChatGPT4 Vision: The CRAAAZY Things it Can do
In daily use, this is how it could look. AI expert and influencer Rowan Cheung has 322,600 followers on Twitter. He recently was given early access to ChatGPT Vision, and says on an X post: “It’s officially been two weeks since ChatGPT4 Vision started rolling out. My prediction was right, Multimodal AI is about to disrupt SO many sectors.”
Here are some things that blew Cheung’s socks off.
ChatGPT 4 Vision can decode X-rays
The AI can now be a “doctor in your pocket” of sorts. Users can upload X-rays, prescriptions or other medical reports. Then, ask questions and get answers about the images or the text. But of course, always check with your human doctor!
Uncover things in redacted government documents
Here, Cheung is impressed by an X by Brian Roemmele in which he discusses how ChatGPT4 Vision could decode a redacted government document from NASA about a UFO sighting. Roemmele says, “I have tested this on 100s of redacted documents and I can say we are in a new world.”
Create a workout plan by “seeing” an image of your home gym
Cheung says “ChatGPT Vision turned a picture of my home gym equipment into a full 8-week workout program. This is better than 99% of any programs I’ve ever bought.”
The level of homework help is amazing
ChatGPT4 Vision can now offer a more in-depth level of help with homework. AI aficionado McKay Wrigley pretended to be a kid doing his homework, and uploaded a diagram of a human cell from a biology textbook. Wrigley said: “Look at the ability of this model to list out all 18 of these labels and provide accurate explanations. It’s incredible. Every single student in the world now has an expert tutor.”
The AI also gave him a quiz to ensure he understood the diagram. The mind boggles.
Weaving pictures together to write a movie script
This is where ChatGPT4 Vision gets creative, something people have complained was lacking before. Ethan Mollick, a professor at Wharton School, uploaded four pictures he’d created with another AI. He said he asked it to construct a plotline. And it did pretty well! Not as well as humans, but it seems pretty close!
Understand ridiculously complicated signs
Tech fan Peter Yang had the idea of uploading a jaw-droppingly confusing bunch of signs that made no sense. And he asked ChatGPT4 Vision whether he could park in front of the signs or not.
Give you a recipe from any picture of food
This tweet is from an AI entrepreneur who uploaded a picture of a dish and asked ChatGPT4 to give him the recipe. Not only did the AI throw out a recipe, it gave an estimated calorie count!
ChatGPT4 Vision: Conclusion
This multimodal AI is already disrupting multiple sectors. With its ability to break down complex information, interpret redacted documents, and help with various creative and practical tasks, ChatGPT4 Vision promises to reshape our interactions with the world. It isn’t perfect yet, and it is still prone to saying wild things that aren’t true, however, a future with such AI assistants in it, is surely a bright one.