IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Why is AI so bad at the Spelling Bee?

Answer: It just wasn’t made for it.

A human face made up of teal dots connected by lines on a black background with letters over the image.
Shutterstock
You would think that guessing all the possible words that can be formed from a given set of letters would be a walk in the park for generative artificial intelligence (GenAI), but you would be wrong. Engadget’s Pranav Dixit thought so too, turning to OpenAI’s ChatGPT to help him solve the New York Times’ Spelling Bee puzzle.

But when he gave the bot detailed parameters of the game and asked it to give him a list of every word in English that could be spelled with the letters G, Y, A, L, P, O and N, what he got back was a list of nonsensical words like GLNPAYO, YPNL, PGNOYL and ONAPYLG. Even when he attempted to better refine his instructions, like clarifying that the words must be in the dictionary, the results did not improve. Nor did any other GenAI chatbots like Microsoft’s Copilot, Google’s Gemini, Anthropic’s Claude or Meta AI fare any better.

So Dixit turned to the experts to try to figure out why these programs so spectacularly failed at something we humans thought would be easy for them. While the reasons are numerous and get a bit technical, the bottom line seems to be that large language models like these just weren’t made for such things. “The bots suck because they weren’t designed for this,” concluded Chirag Shah, a professor of AI and machine learning at the University of Washington.