Monday, February 10, 2014

How likely are you to get 12 wins in the Hearthstone Arena, given your skill level?

Blizzard recently released Hearthstone, a TCG-style video game similar to Magic the Gathering.  Hearthstone has a play mode called the Arena where the player assembles a deck out of random cards and uses it to play against randomly-selected other players.  The player plays games until they lose 3 times (or win a maximum of 12 games).  The player is rewarded depending on the number of wins they get.

A few weeks ago on reddit, there was a post titled "How hard is the Arena? The answer, with Math (TM)", which showed how likely the different arena outcomes would be if each game was decided randomly.  You should read the full post  -- there are some great points about how unlikely 8+ wins is, and how outcomes that some players view as devastating, like 1-3, are actually common.

However, there is something important missing from that analysis -- some decks are better than others!  Here, I'm going to extend on that analysis by incorporating the strength of each deck and the player's skill.

Let's represent each player’s power using a number between 0 and 1.  A player’s power is the fraction of players that player is stronger than.  The weakest player has power 0.  The strongest player has power 1.  The average (median) player has power 0.5, and so on.  Note that this power incorporates both the player's skill and the power of their deck.  Therefore whatever your skill level, you will have a different power level each time you run the arena.

We’re going to make three assumptions about the game:

1) In the arena, you always play someone with the same win-loss record as you.  Blizzard has said they try to perform matchmaking to make this the case, although it is not true all the time.

2) The advantage a player has is proportional to the difference in their power values.  The best player (power 1) has the same advantage over the average player (power 0.5) as the average player has over the worst player (power 0).

3) Prob[A beats B] = Logistic( X * (Pow(A) - Pow(B)) ) .  The Logistic function is a function that converts a number into a probability value, where Logistic(0) = 0.5, and Logistic(Y) gets closer to 1 the larger Y is, and closer to 0 the more negative Y is (see plot below).  (Note that Logistic is symmetric, so the Prob[A beats B] = (1 - Prob[B beats A]), as we would expect.

The value X determines how important power is to determining the course of the game. If we think hearthstone is totally luck-based (like the card game “War”), we would set X to 0, meaning that the outcome of every game is 50-50, regardless of the players’ skills.
If we think hearthstone is very skill-based (like Chess, say), we would set X to a large number, so that if A is even a slightly stronger player than B, A has a very high chance of winning.  From my intuition, I think X=5 is a reasonable value -- the results below use this value.  However, I computed all the arena outcome probabilities for values of X between 0 and 100.  Here is a table of win probabilities given different power differences for X=5:

050.00%
0.0151.25%
0.162.25%
0.2577.73%
0.592.41%
0.7597.70%




Given just these assumptions, we can compute exactly how likely each arena outcome is for a deck of a particular power.  To do that, we start off with all the players at 0-0, with equal frequencies of players of all powers.  (In my calculations, I group the players into 1000 bins).  Then, for players of a given power, we compute the chance that player encounters a player of each other power and the player's likelihood to win against them.  That gives us the fraction of the players at a particular power that will move to 1-0, and how many will move to 0-1.  We repeat that process, calculating how many get to 2-0, 1-1 and 0-2, and so on, all the way up until 12-2.

From that calculation, we can see what the chance of different arena outcomes are for players of different power levels.  First let's look at the outcomes of the average player (power 0.5):
1-316.69%
2-328.24%
3-327.17%
4-315.52%
5-35.63%
6-31.42%
7-30.28%
8-30.04%
9-30.01%
10-30.00%
11-30.00%
12-20.00%
12-10.00%
12-00.00%

As we can see, the average player gets between 2 and 4 wins.  It's worth noting that, unlike the case where all games are decided randomly, the average player is very unlikely to get 0 wins, and it is virtually impossible for them to get 12 wins.  This is because, as the player performs poorly (or well) on the first few games, they get paired with weaker (stronger, respectively) players, pushing their outcome closer to the average.

Now let's look the chance of outcomes for a strong player (power 0.9):
0-30.14%
1-30.96%
2-33.62%
3-38.83%
4-314.72%
5-317.92%
6-317.10%
7-313.62%
8-39.49%
9-36.00%
10-33.53%
11-31.96%
12-21.62%
12-10.42%
12-00.06%

The stronger player generally gets between 4 and 9 wins.  However, even strong players rarely reach 12 wins.  This is because virtually all of the decks at 8+ wins are also very strong.

It's worth noting that extreme outcomes (0-3 or 12 wins) are somewhat more common in the real game than they are according to this analysis because of the fact that you aren't always matched to someone with identical arena records.  This probably doesn't make much difference for common records (like 1-1), but it could make a big difference for rare records like 10-0.  In those cases, you're likely to be matched to a deck with a worse record than yours, and therefore have a higher chance of winning and going on to 12 wins.  Common outcomes (e.g. 3-3) are (very) slightly less likely due to the same fact.

You can view all the results in this spreadsheet.  The spreadsheet shows the full outcome probabilities for many different skill levels and what skill levels you're likely to encounter at different arena records, all for multiple different values of X:

You can see my Python code I used to do the calculations on github.

What do you think of these statistics?  Do they influence the way you play the Arena, or the way you feel about your results?  Leave a comment below.  Also, see more comments on reddit.

Monday, October 8, 2012

On ENCODE's results regarding junk DNA



After I took part in an AMA ("Ask Me Anything") on reddit, there has been some discussion elsewhere (such as by Ryan Gregory and in the comments of Ewan Birney's blog) of what I and the other ENCODE scientists meant.  In response, I'd like to echo what many others have said regarding the significance of ENCODE on the fraction of the genome that is "junk" (or nonfunctional, or unimportant to phenotype, or evolutionarily unconserved).

In its press releases, ENCODE reported finding 80% of the genome with "specific biochemical activity", which turned into (through some combination of poor presentation on the part of ENCODE and poor interpretation on the part of the media) reports that 80% of the genome is functional.  This claim is unlikely given what we know about the genome (here is a good explanation of why), so this created some amount of controversy. 

I think very few members of ENCODE believe that the consortium proved that 80% of the genome is functional; no one claimed as much on the reddit AMA, and Ewan Birney has made it clear on his blog that he would not make this claim either.  In fact, I think importance of ENCODE's results on the question of what fraction of DNA is functional is very small, and that question is much better answered with other analysis, like that of evolutionary conservation.  Lacking proof either way from ENCODE, there was some disagreement on the AMA regarding what the most likely true fraction is, but I think this stemmed from disagreements about definitions and willingness to hypothesize about undiscovered function, not misinterpretation of the significance of ENCODE's results.

I think many members of the consortium (including Ewan Birney) regret the choice of terminology that led to the misinterpretations of the 80% number.  Unfortunately, such misinterpretations are always a danger in scientific communication (both among the scientific community and to the public).  Whether the consortium could have done a better job explaining the results, and whether we should expect the media to more accurately represent scientific results, is hard to say.

I think the contribution of ENCODE lies not in determining what DNA is functional but rather in determining what the functional DNA actually does.  This was the focus of the integration paper and the companion papers, and I would have preferred for this to be the focus of the media coverage.