Friday, January 11, 2019

Update On Deck Shuffling Randomness Audits

All the way back in 2010, I did a post on our company blog about the shuffling used in the game, and the audits that we conduct to help make sure the game is being fair to everyone all of the time. Over the past few months, we seem to regularly be getting emails and other communication from those that play Cribbage Pro, that they feel something has changed in the way the decks are shuffled and the overall fairness of the game in that respect. We do regular audits of the system to make sure it has not been compromised in any way (either by a bug we somehow introduced, or some malicious act by someone else for example). The output of those are generally really boring and just show the exact same thing we posted about back in 2010. Still, it has definitely been a while since that post and I felt it was time to both expand our audit and report on it again publicly. I understand that for many this will not change their feelings about the fairness of the game, but for those who like to see the data it can be helpful. Our hypothesis that we test when we do these audits is: If the deck is shuffled fairly, then a given card will approach being equally represented (have an equal chance of appearing) in any position in the deck when sampled over a sufficient size of decks and grow to a balanced representation as that sample size grows closer or exceeds the maximum possible decks in a 52 card set. In other words, if that doesn't happen in our testing of this hypothesis, we have a problem.

In order to do this most recent audit, I decided to expand the amount of data we pulled in (the sample size) as well as to present it here in some improved and easier to understand terms. To the first goal, instead of looking at hundreds of thousands of decks (which is still quite a lot), I wanted to get to the "millions" level. This sample then was pulled from all decks used in online multiplayer game over the end of the year and into the new year (end of 2018 into start of 2019), and represents around 3 million shuffled decks (2,902,476 to be exact). The shuffling used in multiplayer is identical to single player, but we use that because we don't have your device upload your shuffled deck to us every time you play. To do all that work, I had to rebuild how we pull the data in and process it for the audit, as this is quite a large amount of data to sift through and analyze and the old method simply couldn't handle it.

Just as before, the idea is essentially the same. We take a "marker card", and analyze how it shows up in each position in the deck over this sample. Using that information, we compare it against what would be expected of a random and fair shuffle. This sounds harder than it actually is in practice, as once you have all the data at hand it is just a matter of doing a lot of basic math. You can do more complicated math of course, but honestly I have not found it helpful (if you have something you want done, let me know). This time around we stuck with using the Ace of Hearts (abbreviated as "AH") as our marker card, because, well, we have used it before and any card is technically as good as any other in this type of study (we spot checked several other cards to make sure there was not some sort of crazy anomaly with the AH as well).

The first thing we do in these studies/audits, is to total up the number of times a card is found in each possible position in the deck. This means asking how many times the AH show up in deck positions 1 through 52. That provides us with a total count of each occurrence for each position. Using that number, we can do several things, but the most helpful and easiest to do first is a simple average. If you take each of those "counts" as individual numbers in an average across the 52 positions, you should get a number close to the middle of the deck if the cards were shuffled in a balanced and fair way (on average). Since this is an average, and we are studying a very large maximum possible combinations of decks which is 52 factorial (mathematically that is represented as "52!" - which means 52 * 51 * 50 * ... * 3 * 2 * 1), the resulting average will likely not be exactly the middle of the deck unless you get very close to your sample at least matching if not exceeding that 52! number. Since I have never written that out long hand in a blog post I will do it here, because it is important to understand how big it is. 52! is:


Click here for a decent further explanation of exactly how amazingly huge that number is.

OK, so now that we have established that although 3 million decks may sound like a lot, it is actually very small in comparison to that much larger number. Still, the result is very encouraging. Our average for this sample came out to exactly 26.5022 That is very close to exactly what we expected, and honestly I could probably just leave the audit there and call it done. Still, it is sometimes easier to spot differences and variances when viewed in a graph, and this also shows us each independent position in the deck and the total count for each. An average doesn't show if possibly we have a problem where this card always falls in the middle or always on each end of the deck for example. Here then below is the graph of each possible position in the deck (1 through 52), and the count/number of times that marker card (the AH) was found in each position. Note that the yellow line indicates the "middle" which is the total number of decks divided by 52 (2,902,476 / 52 =  55,816.85) in order to have a reference point in this large scale.

I think that graph pretty much summarizes the entire audit. Feel free to zoom in and spot the fluctuations. In the entire set, there is no statistically meaningful variance between each position in the deck and the "middle" - which is of course why the average is so near to the middle of 52. There are certainly still small movements from position to position. The final take away can be summarized as so: On average, there is an equal chance of any card appearing in any position in the deck. This is how we are defining a fair and random shuffle, and our hypothesis has been confirmed by our experiment. If anyone would like to see something different about this data, feel free to drop us an email at with your suggestion and I will see if we can find a way to get it to you or update this post here.