Yesterday morning, OpenAI released GPT-4, the latest version of its AI chatbot.
GPT-4 is a significant step up from ChatGPT: OpenAI says the chatbot “surpasses ChatGPT in its reasoning capabilities.” But what exactly does it do, and how does it differ from GPT-3.5?
Bigger, better, stronger
According to OpenAI, GPT-4 can handle over 25,000 words of text, “allowing for use cases like long form content creation, extended conversations, and document search and analysis”, like this example they displayed about Rihanna’s Super Bowl.
GPT-3.5, the version that the ‘OG’ ChatGPT was powered by, could only handle around 8,000 words of text. This means that the latest version triples that.
Moments before GPT-4 went live, OpenAI CEO Sam Altman shared a cheeky tweet hinting at the new model’s launch. “Excited 4 today,” he wrote. Get it? 4, as in GPT-4?
GPT-4 is a smart student
GPT-4 is described as a “large multimodal model”. This means that it has the capacity to handle and process information in different formats including texts and images, as we’ll elaborate below.
Since ChatGPT’s launch, researchers have put it through medical school, the bar exam, and an MBA exam to test out its intelligence. It didn’t disappoint and nearly passed those exams.
OpenAI is seemingly aware of this and similarly tested GPT-4’s performance in simulated exams. The team behind it spent six months “iteratively aligning GPT-4 using lessons from our adversarial testing program” — this means they spent time teaching GPT-4 how to follow instructions more faithfully. The outcome, OpenAI says, is “best-ever results (although far from perfect) on factuality, steerability, and refusing to go outside of guardrails.”
In other words, it’s a good student. Have a look at the results it scored in different exams like AP and SAT compared to GPT-3.5:
From the graph above, you can see that GPT-4 landed around the top 10 percentile in the GRE Verbal test, a paper that forms part of an entry exam for MBA programs worldwide. GPT-3.5, in comparison, had a ~60 percentile.
GPT-4 also “passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.”
Try uploading images
You can probably forget about AI art generators like Stable Diffusion and Midjourney for now, because GPT-4 now includes visual input. Users are now able to upload an image to GPT-4 alongside a query, and GPT-4 will produce “captions, classifications, and analyses.”
For example, simply upload a random funny photo asking GPT-4 what it is, and it will first describe the photo then detect which part of it is funny:
GPT-4’s impressive feats (so far)
In the 48 hours immediately following launch, users had already used GPT-4 to deliver some impressive tasks. For example, a director at crypto exchange Coinbase dumped code for an Ethereum smart contract into GPT-4, and the chatbot picked out a number of vulnerabilities, including that: it’s a Ponzi scheme, and it’s not secure enough to protect itself against hacks.
The boss of a company that provides legal services via a chatbot shared that he got his new hire to draft a lawsuit. Joshua Browder, the CEO of DoNotPay, says they plan to use ChatGPT to deliver customers “one-click lawsuits” to sue robocallers.
Browder says “GPT-3.5 was not good enough, but GPT-4 handles the job extremely well”, as you can see from the video he shared below. Would you consult an AI lawyer?
A design manager at financial products company Brex, Ammaar Reshi, even got it to write code for the classic Snake game – game devs be aware!
Does it have limitations?
Of course. OpenAI’s researchers stressed in their technical report that although GPT-4 is a significant upgrade from its predecessors, it can still “suffer from ‘hallucinations’:”
This tendency can be particularly harmful as models become increasingly convincing and believable, leading to overreliance on them by users… Counterintuitively, hallucinations can become more dangerous as models become more truthful, as users build trust in the model when it provides truthful information in areas where they have some familiarity.
OpenAI
Presumably, they mean that it remains capable of making shit up, like crucial medical information.
How to use GPT-4
Wanna hop on and try it? Sure, but it’ll cost you: Only users who are subscribed to ChatGPT Plus – OpenAI’s monthly subscription service – currently have access to GPT-4. That service sets users back US$20 (~AU$30) per month..
But is paying for an AI chatbot assistant worth it? We discussed it here.
For developers who want to use it as an API, a waitlist is available to join.