ChatGPT coffee test

ChatGPT Failed a Basic ‘Coffee Test’. Here’s Why It’s Not As Smart As We Think

4 min read
Disclaimer

This article is for general information purposes only and isn’t intended to be financial product advice. You should always obtain your own independent advice before making any financial decisions. The Chainsaw and its contributors aren’t liable for any decisions based on this content.

Share

Follow

ChatGPT failed the coffee test: ChatGPT can write code, plan a travel itinerary and even produce a Shakespeare-inspired play, but does it know how to make a simple cup of coffee when you ask it to? Yeah, nah.

What is the ‘Coffee Test’?

First put forward to these new beings by Apple co-founder Steve Wozniak, the Coffee Test is an experiment that lets one assess the intelligence of an AI machine.

According to Wozniak, the test requires an AI machine to:

 … go into an average American household and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee cabinet, etc.

Ben Goertzel in the How Do You Test the Strength of an AI? report, attributed to Steve Wozniak, Apple co-founder

The tester should first ask an AI system, for instance ChatGPT, to visualise a home, enter, locate the kitchen, and then make coffee. 

If an AI machine can perform this task without error, it passes the test and can be considered ‘generally intelligent’ or even human-like.

Why is this important?

The Coffee Test is an important benchmark when it comes to measuring an AI’s intelligence.

This is because the task of making a coffee is simple, yet it is one that encapsulates the fundamentals of basic human function: having the ability to perceive and navigate its way in familiar environments (e.g. a house), having ‘common sense’ reason to correctly deduce that coffee powder would be located in a cupboard and having knowledge of the steps to boil water and brew coffee.

ChatGPT made me a coffee

So, with all this in mind, I went to ask ChatGPT to make me a cup of my daily morning fuel. I began with asking if it’s okay that I run a thought an experiment, which it gladly agreed to.

I began by asking it to identify where the kitchen is in my house. To start off, ChatGPT is aware that it isn’t human, doesn’t have a physical body, and therefore has no experience visiting someone’s home. In other words, it has no friends.

Next, I asked ChatGPT to locate the kitchen, which it does correctly.

Screenshot of my ‘Coffee Test’ conversation with ChatGPT.

Finally, I asked it to make coffee, which it refused twice. It told me to get my lazy arse up and make my own coffee instead, providing me with instructions.

ChatGPT coffee test
Screenshot of my ‘Coffee Test’ conversation with ChatGPT.

Does it fail or pass at this stage? I’d say fail, but I give it some points for self-awareness and for standing its ground in refusing to labour for me.

After my first attempt, I decided to run the test again. Interestingly, this time, it seemed to forget that it’s a language model and went ahead to play the role of a guest.

ChatGPT coffee test
Screenshot of my ‘Coffee Test’ conversation with ChatGPT.

This only lasted for one minute though, as it fell back to default chatbot assistant mode. Fail.

Is ChatGPT as intelligent as we think?

So what does all this mean? It means that ChatGPT isn’t the lifelike prodigy that we make it out to be. Wozniak didn’t provide reference materials for researchers to ‘passing criteria’ for the Coffee Test. However, in an old interview, he did make a remark that it would take very long for AI machines to level or surpass humans:

… you can come into my house and make a cup of coffee, and I could go into yours and I might be able to… but when is a computer ever going to get to that level?

If you had to analyse all the steps: you had to walk, to know what kitchens are … you got that from years of life.

Steve Wozniak, Apple co-founder

ChatGPT made headlines for its ability to nearly pass a medical school test, an MBA exam, and a bar test, but that’s because it’s fed millions or billions of data by humans. It doesn’t, however, know what your go-to coffee order is like the cute barista at your local café.

So, for now, it’s perhaps better to go for coffee runs.