Generating Simpson's Paradox with Z3
The article illustrates Simpson's Paradox with two baseball players, showing Player A's higher overall average despite Player B outperforming A against both pitcher types, highlighting statistical interpretation complexities.
Read original articleThe article discusses an example of Simpson's Paradox using a scenario involving two baseball players, A and B. Player A has a higher overall batting average than player B, yet player B outperforms A against both left-handed and right-handed pitchers. This paradox is illustrated through the use of the Z3 Theorem Prover, which helps generate a model that satisfies the conditions of the problem. The model shows that while A has an overall batting average of approximately 0.235, B's overall average is about 0.231. However, B has a batting average of 0.5 against left-handed pitchers and 0.182 against right-handed pitchers, compared to A's averages of 0.4 and 0.167, respectively. The key takeaway is that the players faced different numbers of pitchers, which contributes to the paradoxical results. The article concludes with a tabular representation of the batting averages for clarity.
- Player A has a higher overall batting average than player B.
- Player B has better averages against both left-handed and right-handed pitchers.
- The Z3 Theorem Prover is used to illustrate the paradox.
- The players faced different sets of pitchers, affecting their averages.
- The example highlights the complexities of statistical interpretation in sports.
Related
Banach–Tarski Paradox
The Banach–Tarski paradox challenges geometric intuition by dividing a ball into subsets that can form two identical copies without changing volume. Axiom of choice and group actions play key roles.
The Point of the Banach Tarski Theorem
The Banach-Tarski theorem challenges common sense by showing a solid ball can be split into pieces to form two identical balls in 3D space. It questions measurement principles and the Axiom of Choice's role in resolving mathematical paradoxes.
Win your fantasy league using operations research
Operations research techniques are applied to excel in fantasy football leagues by treating player selection as a knapsack problem. Cost-effective players delivering high scores are chosen within budget constraints, optimizing team value effectively.
The algebra (and calculus) of algebraic data types
The relationship between algebraic data types (ADTs) and mathematical algebra is explored, emphasizing similarities in operations. Examples like Choice and binary trees illustrate how algebraic rules apply to ADTs, despite challenges with structures like Nat. Poking holes in data structures is introduced as a way to understand calculus on data types.
The Myth and Magic of Deliberate Practice
The article explores deliberate practice using Joe DiMaggio's baseball skills, emphasizing that success combines rigorous practice with innate abilities, as genetics significantly influence performance and potential.
- Many commenters emphasize the complexity and potential misleading nature of statistical interpretations, particularly with varying levels of granularity.
- Several users express dissatisfaction with the baseball example used to illustrate the paradox, suggesting it may not be relatable to all audiences.
- There is a consensus on the importance of visualizing data to better understand statistical phenomena, with references to related concepts like Anscombe's Quartet.
- Some commenters highlight the need for multifactor analysis to avoid misinterpretations that can arise from examining single variables.
- Questions about causation and the appropriateness of the term "paradox" are raised, indicating a desire for deeper exploration of the topic.
Along the same lines of "visualize your data to see what's really going on" is Anscombe's Quartet: https://en.wikipedia.org/w/index.php?title=Anscombe%27s_quar...
And then there's the Datasaurus [Dozen], which has some fun with the idea behind Anscombe's Quartet: https://en.wikipedia.org/wiki/Datasaurus_dozen (you can see it animated here: https://blog.revolutionanalytics.com/2017/05/the-datasaurus-... )
If you take the example of Treatment A vs Treatment B for tumors, you can get infinite layers of seemingly contradicting statemens: - Overall, Treatment A has better average results - But if you add tumor size, Treatment B is always better - But if you add gender to size, Treatment B is always better - But if you add age category to gender and size, Treatment A is always better - etc...
It totally contradicts our instincts, and shows statistics can be profoundly misleading (intentionally or not).
[1]: https://robertheaton.com/2019/02/24/making-peace-with-simpso...
Proper multifactor analysis that accounts for all variations simultaneously is required to learn about complex phenomena.
If you get paranoid about its presence it can lead you to second guess pretty much every statistic. "I know that 4 out of 5 dentists recommend chewing X Brand gum but what if I slice the dentists by number of eyes? Maybe both one-eyed dentists and two-eyed dentists aren't so enthusiastic."
Tools like this make me feel a lot better about all the time I wasted playing with predicate logic.
In any case, I would explain the paradox differently than the author. The author says: "The key to understanding the paradox is that the players did not bat against the same set of pitchers. A batted against 5 lefties and 12 righties; B against 2 and 11."
I would say instead that the key to understanding the paradox is to observe that both players are much better when batting against lefties and that player A batted against lefties much more often, both in absolute and relative terms. In other words, A is not as good against lefties as B but he faced a lot more of these comparatively easy pitchers.
This seems to be a pretty fundamental question but I have never seen it addressed.
This is misleading. I am baseball ignorant but I feel this is a contrived and bad example for Simpson's paradox.
(UC Berkeley gender bias example is much better)
Related
Banach–Tarski Paradox
The Banach–Tarski paradox challenges geometric intuition by dividing a ball into subsets that can form two identical copies without changing volume. Axiom of choice and group actions play key roles.
The Point of the Banach Tarski Theorem
The Banach-Tarski theorem challenges common sense by showing a solid ball can be split into pieces to form two identical balls in 3D space. It questions measurement principles and the Axiom of Choice's role in resolving mathematical paradoxes.
Win your fantasy league using operations research
Operations research techniques are applied to excel in fantasy football leagues by treating player selection as a knapsack problem. Cost-effective players delivering high scores are chosen within budget constraints, optimizing team value effectively.
The algebra (and calculus) of algebraic data types
The relationship between algebraic data types (ADTs) and mathematical algebra is explored, emphasizing similarities in operations. Examples like Choice and binary trees illustrate how algebraic rules apply to ADTs, despite challenges with structures like Nat. Poking holes in data structures is introduced as a way to understand calculus on data types.
The Myth and Magic of Deliberate Practice
The article explores deliberate practice using Joe DiMaggio's baseball skills, emphasizing that success combines rigorous practice with innate abilities, as genetics significantly influence performance and potential.