On Statistics and Prediction Markets
Thoughts on what the percentages really mean
Intro
I was listening to the Odd Lots: Nate Silver podcast today and whenever Tracy Alloway brought up some conceptual questions and comments around statistics, Nate Silver responded that those questions were “above his pay grade”. I felt they were not above mine so I was inspired.
Statistics is a human invention.
It’s important to first understand that statistics is a human invention. It is a tool devised to help us predict situations and understand data at a more useful level. To Tracy’s point, what does it mean for something to have an X% probability? The world is deterministic, everything either has a 100% chance of happening or a 0% chance of happening. When you flip a coin, the universe has already determined what it will land on, and this is further proven by the Berkeley study that quantified all of the variables influencing how a coin lands. So why do we say that a coin has a 50% chance of landing on heads or tails?
The idea behind statistical claims is that we can’t predict with certainty what will happen due to a lack of knowledge. An omniscient entity would have no use for statistics. We don’t use statistics to predict how far an apple will fall if dropped from some height, that is just called measurement. We do use statistics to predict how much rain will fall over a certain area. Typically, the more input variables there are to a system, or the more rapidly those variables change, the more prone we are to have statistical solutions as opposed to classical ones.
What is statistics?
The classic statistics problem can be envisioned as follows - a person must reach into a hat with some combination of red and blue balls. They want to know if they will grab a blue ball. From a statistical perspective this is dependent on the proportion of blue balls in the hat. If all the balls are blue, then there is a 100% chance. Likewise, if all the balls are red, there is a zero percent chance. If 7 of the 10 balls are blue, there is a 70% chance. Of course, when the person’s arm is moving towards the hat the universe has already determined what ball they will grab, but the human can’t know that.
Statistics allows us to compromise with the universe. We can’t know what will happen, but we have some idea of the distribution of what can happen. We can make useful statements like “they will probably grab a blue ball” or “they are more than twice as likely to grab a blue ball as a red ball” but we can’t make statements like “the next ball you grab will be red” and know that we are correct beforehand.
Based on this idea, classic statistics is about repetition. You typically get taught something like, “If you were to pull a ball out of the hat 100 times, how many red and blue balls would you expect to pull out?” The statistical answer to this would be 70 blue balls and 30 red balls. Of course, one may ask a further question like, what is the probability of grabbing 50 blue balls and 50 red balls, but I intentionally want to stay away from more advanced statistical concepts because I don’t think they are necessary to understand prediction markets (which I promise we will get to!)
This idea of repetition enables classic statistics. Meteorologists can say an area has a 30% chance of rain because they’ve studied past behavior of rainclouds. Astrophysicists can say a meteor has a 0.00001% chance of hitting us this year because they’ve studied millions of asteroid trajectories. Bookies can say that the Dodgers have a 55% chance of winning the world series because they’ve analyzed thousands of baseball games. Each of these predictions is enabled by repetition and the observation of similar events in the past.
Statistics in prediction markets
Prediction markets typically involve assessing the probability of a one-off event. “What is the chance of a Russian invasion of Ukraine by the end of this year?” is one of my favorite types of questions. This year is different than all the other years. Russia has different resources this year than other years as does Ukraine. What is the motivation behind the invasion? All of these are great questions to ask and highlight the challenge around trying to use statistics for one off events. However, this article isn’t about how to do this, so I will save that for another day, so let’s proceed with the idea that most interesting prediction market events are somewhat unique. They are certainly less repeatable than reaching into a hat with 7 blue balls and 3 red ones.
So, what are we actually doing? We know the world is deterministic. Statistics is based on the concept of repeated tests. We have moved a couple layers deep at this point. What does it really mean that there is a 12% chance of Trump saying the world “Squirrel” during his next rally?
It’s good to envision predicting the probability of a one-off event less as trying to converge on the long run, “repeat try” probability and more as the following- envision you have 101 different buckets each with a probability on them. One bucket has 0%, another 1%, all the way up to 100%. When you predict an event, you are actually trying to place the event into one of these 101 buckets such that in the future if you were to grab one bucket and check on it, the events you placed inside that bucket would correspond with the probability label on the bucket. If you investigate your 70% bucket and pull out 20 events, you would want 14 of those events to have happened and 6 to not have happened. If you think Kamala Harris has a 40% chance of becoming president, you’d drop that prediction in your 40% bucket. If you think there is a 13% chance of a meteor hitting a population center in the next 100 years, you’d drop that in your 13% bucket. Doing this with a large number of predictions will generate a calibration curve and a good predictor strives to have one that is linear.
Conclusion
The clever amongst you may have noticed what I just did. “Hey, Mason! You just reframed the setting to make it so that unique, one-off events are repeatable!” YES, you are correct. And of course I can reframe it this way; statistics is a human concept. Statistics makes no sense from the universe’s perspective. It’s fluid like gender or love and I can reframe concepts to be useful and have different meaning if needed. There is math behind it (and some may argue that math is a language) but fundamentally, using language to make a concept statistically valid is how statistics works. Every event in the universe is unique and it is only by using language that we can enable statistics. The molecular interactions between the moist air contacting a cloud are little different than the fan in my room pushing air over the sweat on my arm, but we would never group these two events together for statistical predictions. And to give you something to ponder on, this fundamental relationship of using language to guide statistics is why statistics can never be used to explain causation; it is just a mathematical grouping of different concepts brought together by language.
To close, when you are trying to piece together what’s happening when you put a probability to an event, consider that you are grouping that event with other events
you think will happen about as often as the one in consideration. When you grab your 15% bucket and pull out the 200 predictions that are in there, you want 30 of them to have happened and 170 of them to not have happened.

Related article about 538’s calibration: https://projects.fivethirtyeight.com/checking-our-work/
People should definitely put out numerous predictions and grade themselves on a curve relative to the chance they gave to various predictions - then we can come up with a General Predictiveness Factor.
In the movie Longlegs, special agent Lee Harker is put through a series of tests where agents sit alone in a dark room and are tasked with guessing a number (no hints are given). This apparently is to determine who is best at picking correctly.
Lee demonstrates a preternatural ability to pick the correct number, and she goes on to pick the exact house the suspect was staying in based on a gut feeling. She definitely made for a great FBI agent!
I wonder if Philip E. Tetlock has the same ability? Some people just weigh information at exactly the right portions while retaining unclouded judgement. It’s inspiring what superforecasters can do.