My grade school had a game club at lunch hour. Mr. Newton took time out of his day, every day for us. Still think about him and the many games he brought from his had personal collection. Not sure what the point of this message is other than that he will be remembered until the last time I think about mastermind, Pacific typhoon, star fleet battles, axis and allies and many others..
That reminds me, I wrote it for iPhone when it was released in 2007 - back then there was no App Store, so apps could only be written for browsers. I think I implemented it whenever I learned a new language. By the way, I noticed that Claude, ChatGPT, and DeepSeek - none of those LLMs can solve Mastermind. They get lost after a few iterations, no matter how good the prompt instructions are. Source: https://github.com/muquit/iphonemm - there is a link to play on that page.
I love stuff like this. It always tickles my brain to try and find the optimal way (or, as optimal a way as I can) of solving puzzles. Sometimes it's easy, sometimes it's really hard. Oftentimes you can get something decent with not too much effort, and the dopamine hit is great when you see it working
Really great article. Internalizing that a good realizable Mastermind strategy is to always eliminate the same number of possible solutions is a great way to internalize the value of information theory thinking. Getting hung up if it is exactly "plan k steps forward optimal" (i.e. fine details of the remaining possible cases) can be counter-productive.
Really liked it.
I'm curious about games for which the guesses are not always a valid solution but can contain special operators. For instance, add a state which is true for 2 different values in Secret Code : 1or2. The code won't contain it but it can be useful for getting more information at each step.
The Kodak Research Labs (like Bell Labs) let their researchers play. In the 1960's my father (who later devised the Bayer filter for digital cameras) coded this algorithm for "Jotto" the 5 letter word version of Mastermind.
Computers were so slow that one couldn't consider every word in the dictionary as a potential guess. He decided empirically on a sample size that played well enough.
I became a mathematician. From this childhood exposure, entropy was the first mathematical "concept" beyond arithmetic that I understood.
I play Mastermind with my kids. It hasn’t clicked quite yet with them but I’ve shown them my strategy of eliminating colors by making an entire row a single color successively. Either it’s not present or now you know how many of a particular color. Then you just need to figure out ordering. Again you can use a known “bad” color to avoid ambiguity of multiple white pegs
(I mention this mostly so people know the list exists, and will hopefully email us more nominations when they see an unusually great and interesting comment, like this one.)
Similarly, I explained to someone how to systematically solve it like this, but then when I suggested they now apply it in a new game, I chose all the same color for the code and watched them second guess themselves until all guesses were exhausted. They just couldn’t accept the possibility that the code could simply be all the same color. Was a good opportunity to quote Sherlock Holmes.
I've never been much for word games but have been a Wordle regular. But a friend of mine made the very astute observation that Wordle is more like Mastermind (which I played and even wrote a Windows version of as an exercise once upon a time) than any traditional word game. OK. You need a solid English 5 letter vocabulary. But it's really pretty different from crosswords and a lot of other word games including those on the NYT site.
I think you can distinguish between word games where it helps to know what the words mean (like crosswords) and word games where the meaning of the words is irrelevant, only that they are or are not words (like Wordle).
Weirdly, Scrabble resembles crosswords graphically but fits in the second group.
Sudoku is kind of the same, in that the numerical values of the numbers doesn't come into play at all. You can replace the digits with fruits and the game plays the same.
Although I must say I do like Connections where even subtle meaning very much factors in.
Listen to this article (with local TTS)
How you've just played optimal Mastermind
Mastermind is a game all about information. The Code Master selects one of \( 6^4 = 1\,296 \) secret codes. Each incorrect guess gives us information by eliminating some of these; the more codes that are ruled out, the more information that guess has provided. Let's quantify this insight! Suppose a guess gets some response that reduces the number of possible keys from some number \(n\) to a smaller \(n'<n\). The convention in information theory, a branch of mathematics and computer science dealing with just this type of question, is to define the information \(I\) provided by this guess and response as \[ I := \log_2 \frac{n}{n'}. \] The unit of information (with respect to the base-2 logarithm; other bases can be used) is called bits. One bit of information corresponds to reducing the number of possible keys by half; a guess-response pair providing two bits of information reduces it by \( (1/2)^2 = 1/4 \). There are good theoretical reasons for adopting this admittedly unintuitive apparatus. For our purposes it is enough to note that information is additive. If a first guess provides one bit of information, and a second two bits, the number of possible codes has been reduced by \( 1/2 \times 1/4 = 1/8\); in total the two guesses have provided \( 1 + 2 = 3\) bits of information!
We can now come up with a simple strategy for playing Mastermind: always make that guess which provides the most information. But we don't know the information of a guess in advance, since it depends on which response the Code Master will give us, and therefore on what the true secret code is. Let's suppose that the Code Master selects a secret code uniformly at random, with no code being more likely than any other; this is in fact what the program does under the hood. We then want to make the guess which provides the most information on average.
The probability of getting a certain response to a guess is \( n' / n \), since this is exactly the proportion of all possible codes which would give that response if it was the true secret. Let's call this probability \(p\); we can then rewrite the above expression for the information of this guess-response pair as \[ I = \log_2 \frac{1}{p}. \]
Our strategy should be to make the guess with the highest expected value of information over all possible responses; this expected information is called the entropy of the guess and is denoted by \( H \). (If you're familiar with entropy from thermodynamics, the connection is mostly historical; it is probably unhelpful to look for a conceptual connection.) Let \(p_i = n'_i / n\) be the probability of a guess getting response \(i\), where \(n'_i \) is the corresponding number of then possible secret codes. The entropy of a guess is defined as
\[ H := \sum_{\text{response } i} p_i \log_2 \frac{1}{p_i}, \]
which again has the unit of bits. We have now landed on our final strategy: start by figuring out the number of possible secret codes \(n\). For each guess, calculate the number \(n_i'\) of codes that will still be viable if the Code Master gives response \(i\) in return. Do this for all possible responses. Finally, calculate the entropy of each guess; pick the one with the highest. Surprise, surprise — this is exactly what is done above! The number in bits listed under Best Guesses is the guess' entropy.
So how well does this strategy perform? On average over all 1 296 secret codes, it beats the Code Master in 4.47 guesses (with a standard deviation of 0.75). Take the last significant digit with a grain of salt; there are usually many guesses with equal entropy, of which the program chooses one arbitrarily. Is this performance any good? For starters, the amount of information needed to narrow all \(n\) possible codes down to one is \(\log_2 n\) bits; this is displayed next to the number of possible moves. If you play around for a while, you'll notice that the best guess usually has an entropy of two to three bits. We would then expect it to take around four guesses to get the \(\log_2 1\,296 = 10.3\) bits of information needed at the start of a new game.
This performance is also in line with other Mastermind algorithms. To the best of my knowledge, Donald Knuth published the first analysis of the game with his 1976 The Computer as Mastermind. In the following decades, many other strategies have been suggested, all achieving an expected number of guesses of around 4.4. (See e.g. Neuwirth, Some Strategies for Mastermind, 1982.) It is not surprising that maximizing entropy is a successful strategy; it solves the game with brute force. Calculating the number \(n'\) of still possible codes after a guess-response turns out to be expensive; in fact, it is NP-complete. (Stuckman & Zhang, Mastermind is NP-Complete, 2005.) The necessary lookup table quickly becomes unmanageable as the length of the code and the number of colors grow. At the end of the day, a bit more than four guesses seem to be as good as you can do on average. You simply don't get enough information to narrow down the options quicker. This argument, of course, is informal; if anyone knows of a more rigorous one, please be in touch!
The idea of making the guess with the highest entropy is taken wholesale from 3Blue1Brown's excellent video on Wordle. Worlde is of course a variant of Mastermind, albeit more engaging. All code I have written for this project, including for this webpage is available on GitHub. Published on June 11, 2025.