Consumer-grade AI and decoding simple ciphers

I was watching a TV show where one of the plot devices was decoding an encrypted message. As with most TV shows, the solution was silly complicated. But it brought up memories of my childhood.

Believe it or not, there are magazines for people who like to do logic puzzles and simple cryptography. They still exist today. I would spend a lot of time counting letters and looking for patterns in ciphertext to decode them. Yes, probably not how you spent your childhood, but it’s how I spent part of mine. I found the logic puzzles especially calming.

I started wondering if the current round of generally available AI systems would be able to solve simple encryption. I’m sure that the NSA and other spook agencies have amazingly tuned AI/ML to crack encryption, but my thoughts were about systems that the average person has access to.

I went to cryptii.com where I could do a lot of simple encryption with a variety of methods. The simplest kind of cipher is known as a Caesar cipher, which is where you shift the letters in the alphabet by a set number. For example of a shift of 4, a becomes e, b becomes f, etc. Claude got this right in short order. I decided to compare the various AI to see their capabilities.

I tried the most basic alphabetic substitution cipher, where a becomes z, b becomes y, etc.

Only Claude got it correct. ChatGPT, Mistral, and Meta AI hallucinated, Gemini & Deepseek (online) paraphrased the answer, but did not do a direct translate. I ran Deepseek:8B (locally) with llama and it could not complete it.

ChatGPT, Mistral, and Meta AI all told me they had solved it, but were way off.

The interesting thing is while Gemini got it wrong, it got some of the basics of the original quote right in the end.

“This is my simple religion. There is no need for temples; no need for complicated philosophy. Our own brain, our own heart is our temple; the philosophy is kindness.” is the original plaintext quote.

Deepseek R1 had similar results of kinda getting the gist of the plaintext.

My friend Leonard tried it on some of the higher power LLMs that he has access to.

let’s see o3-mini-high gets it. o3-mini does as well. o1 does. obviously o1-pro has no problem w/ it. the most surprising thing: my 4o does as well – it wrote a python script to do it (i have custom instructions that tell it to use python for math which probably encourages it to code, default version might not jump directly to write a python script to do it).

Seems like the issue comes from some sort of tokenization issue, where the LLM has to make the leap to tokenize the letters themselves instead of the ciphertext, decode the message and then do the output tokenization so it matches the actual decrypted text.

Look at the plaintext phrase of “our own heart is our temple; the philosophy is kindness.“, which Gemini transforms into “our own heart is the decoder; the protocols are kindness.” and Deepseeks transforms to “our own heart is our secret; the cautionary tale is kindness.“

It’s almost like it understands the meaning, but uses synonym words to rephrase it (kinda).

The most interesting part is that it looks like the models have been able to do the actual decryption, but have trouble making the output match exactly to fit the plaintext on the output tokenization.

I’m not sure of how the input tokenization is done on the models. I’d expect they would need to do it at the character level, not the word or subword levels. For output it looks like word tokenization, hence the synonym type answers.

I tried again with a more complicated alphabetical substitution cipher, where the letter substitutions are randomized.

In this case, only Claude got the decryption correct and even recognized the quote.

ChatGPT and Deepseek:8B both failed to complete it.

Gemini, Meta, and Mistral all came up with gibberish that they assured me was the right decryption.

Deepseek thought it was a bible quote.

Leonard helped test the alphabetical substitution cipher with more powerful models.

Deepseek-V3 came back with an incorrect passage, as did Deepseek-R1 which simply gave up.

ChatGPT o3-mini-high (Reasoned for 3m 2s) got caught up on thinking it was a Tolkein passage.

ChatGPT o1-Pro didn’t get it right either, but was heading down the same track that a human would try with letter frequency and common small words. But also got caught trying to match one it’s known quotes. It convinced itself that it was a quote by Francis Bacon, which is incorrect.

ChatGPT 4o went “on a wild chase writing lots of Python analysis code but gets nowhere, even with hints. It just can’t do it no matter how hard it tries” and failed as well.

It appears that most of the commonly available LLMs have a lot of difficulty with kinds of tasks that involves the actual letters in prompts as opposed to words, like the strawberry issue.

Obviously, cryptography-focused AI models would be specifically trained to handle ciphertext and leverage brute-force computing to solve it.

The big caveat here is that I’m a rank amateur with AI/LLMs and this is all just me playing around with some tools, not serious work.

If you want to see what serious people are doing, check out Leonard or Simon, they are fantastic people.

Author

Michael
View all posts

Author

One thought on “Consumer-grade AI and decoding simple ciphers”