Professional Poker Players Know the Optimal Strategy but Don't Always Use It
Professional poker players balance game theory optimal strategies with exploitative play to outsmart opponents. AI advancements challenge traditional approaches, pushing players to blend defensive and aggressive tactics for competitive success in evolving poker dynamics.
Read original articleProfessional poker players often know the optimal strategy, rooted in game theory, but may choose not to use it in favor of exploiting opponents' weaknesses. The concept of "game theory optimal versus exploitative play" is central in high-level poker discussions. Despite poker's basis on randomness and psychology, mathematicians have found ways to model optimal strategies, with John Nash's Nash equilibrium being a key concept. The AI revolution in poker has led to the development of superhuman algorithms that can play nearly perfectly, challenging traditional human strategies. Players now use software tools like "solvers" to study and improve their game, transitioning poker from an art to a science. While optimal play is defensive, exploitative play is more aggressive, and top players blend both strategies to stay competitive. The emergence of AI in poker has both detractors who feel it diminishes the game's magic and supporters who see it as adding a new layer of complexity. The poker landscape continues to evolve as players navigate between optimal and exploitative strategies to find success in the game.
Related
Analysing 16,625 papers to figure out where AI is headed next (2019)
MIT Technology Review analyzed 16,625 AI papers, noting deep learning's potential decline. Trends include shifts to machine learning, neural networks' rise, and reinforcement learning growth. AI techniques cycle, with future dominance uncertain.
'Superintelligence,' Ten Years On
Nick Bostrom's book "Superintelligence" from 2014 shaped the AI alignment debate, highlighting risks of artificial superintelligence surpassing human intellect. Concerns include misalignment with human values and skepticism about AI achieving sentience. Discussions emphasize safety in AI advancement.
Prepare for AI Hackers
DEF CON 2016 hosted the Cyber Grand Challenge where AI systems autonomously hacked programs. Bruce Schneier warns of AI hackers exploiting vulnerabilities rapidly, urging institutions to adapt to AI-devised attacks efficiently.
"Superhuman" Go AIs still have trouble defending against these simple exploits
Researchers at MIT and FAR AI found vulnerabilities in top AI Go algorithms, allowing humans to defeat AI with unorthodox strategies. Efforts to improve defenses show limited success, highlighting challenges in creating robust AI systems.
Game Designers Run Your Life
Games have a long history influencing human behavior, teaching lessons, and shaping interactions with technology. Designers use rewards to control player behavior, impacting society significantly.
It's interesting how this affects play. If a player is bluffing slightly less than they should be, the adjustment is drastic, you should actually never call with hands that do not beat their value hands. If they are bluffing optimally, you are supposed to call using what is known as "minimum defense frequency". What's interesting about this is the minimum defense frequency is based on what the strongest hands you can possibly have in that situation are and the opponents possible hands do not even factor into it. It's required to prevent your opponent from bluffing with any two cards profitably.
To do the math, if you are on the river and the opponent bets 100 into 100, for this to be profitable for them they need to win 50% of the time or more. If your opponent is bluffing optimally, you need to call with 50% of the strongest hands you have in that specific situation (if you don't know what hands you have in a specific situation that's a problem) and sometimes they can be dogshit like King high.
But, very important to note, very few players actually bluff enough and if they bluff less than they should you should only ever call when your hand actually beats their range of possible value hands. (value vs bluff is kind of a difficult thing to communicate, generally it's value if you want your opponent to call)
Most players don't bluff enough as a result of most players calling too much! When they call too much you should obviously not bluff! This leads to very boring games of poker.
I've actually been working slowly on https://github.com/krukah/robopoker, an open-source Rust implementation of Pluribus, the SOTA poker AI. What I've found interesting is the difference in how I approach actually playing poker versus how I approach building a solver. Playing the game naturally consists of reasoning about narratives and incorporating information like hand history, play style, live tells. Whereas solving the game is about evaluating tradeoffs between the guarantees of imperfect-information game theory and the constraints of Texas Hold'em, finding a balance between abstract and concrete reasoning.
Basically, play one level - exactly one level - beyond where you peg your opponents at.
Poker is not about playing cards. It's about playing people. Cards is just how we keep it civil and not too personal.
https://www.livepokertheory.com
I do personally dislike that GTO became the nomenclature , as I prefer "theory-based", since it causes this confusion, but trying to fight it at this point is hopeless because GTO is the search term people are using. And when people say they "play GTO" they usually mean "equilibrium" rather than "optimal against my specific opponents" which is "exploitative".
If you actually watch what the top players advocate for, everyone suggests you want to play exploitatively. However, there's one equilibrium solution and effectively infinite exploitative solutions, so equilibrium is a reasonble starting point to develop a baseline understanding of the mechanics of the game. It's tough to know how much "too much" bluffing is unless you know a baseline.
Furthermore, if you "exploit" people by definition you are opening yourself up to being exploited so you need to be very careful your assumptions are true.
Also, with solvers like piosolver, you can "node lock" (tell a node in the game tree to play like your opponent, rather than an equilibrium way plays), but there's many pitfalls, such as the solver adjusting in very unnatural ways on other nodes to adjust, and it being impractical to "lock" a strategy every node in the tree. There's new ideas called "incentives" which gives the solver an "incentive" to play more like a human would (e.g. calling too much) but these are new ideas still being actively explored.
Rock paper scissors is frequently used to explain GTO but it's not the best example because equilibrium in rock paper scissors will break even against all opponents, but equilibrium poker strategy will actually beat most human poker players, albeit not as much as a maximally exploitative one.
There's two other huge pieces this article glosses over:
1) It's as impossible for a human to play like a computer in poker as in chess - in fact far more impossible, because in poker you need to implement mixed strategies. In chess there's usually a best move, but in poker the optimal solution often involves doing something 30% of the time and something else 70% of the time. The problem is that, not only are there too many situations to memorize all the solutions, but actually implementing the correct frequencies is impossible for a human. Some players like to use "randomizers" like dice at the table, or looking at a clock, but I find that somewhat silly since it still so unlikely you are anywhere near equilbrium.
2) Reading someone's "tells" live is still a thing. While solvers have led to online poker to decline due to widespread "real time assistance", live poker is booming (the 2024 World Series of Poker Main Event just broke the record yet again) , and in person in live poker, people still give off various information about their hand via body language. From the 70s to the early 2000s, people were somewhat obsessed with "tells" as a way to win at poker. Since computers have advanced so much, it's fallen out of favor, but the truth is, both are useful. It's totally mistaken to think that advancement in poker AI , GTO , and solvers have rendered live reads obsolete. In fact, in 2023, Tom Dwan won the biggest pot in televised poker history (3.1 million) and credited a live read to his decision, in a spot where the solver would randomize between a call and a fold.
Related
Analysing 16,625 papers to figure out where AI is headed next (2019)
MIT Technology Review analyzed 16,625 AI papers, noting deep learning's potential decline. Trends include shifts to machine learning, neural networks' rise, and reinforcement learning growth. AI techniques cycle, with future dominance uncertain.
'Superintelligence,' Ten Years On
Nick Bostrom's book "Superintelligence" from 2014 shaped the AI alignment debate, highlighting risks of artificial superintelligence surpassing human intellect. Concerns include misalignment with human values and skepticism about AI achieving sentience. Discussions emphasize safety in AI advancement.
Prepare for AI Hackers
DEF CON 2016 hosted the Cyber Grand Challenge where AI systems autonomously hacked programs. Bruce Schneier warns of AI hackers exploiting vulnerabilities rapidly, urging institutions to adapt to AI-devised attacks efficiently.
"Superhuman" Go AIs still have trouble defending against these simple exploits
Researchers at MIT and FAR AI found vulnerabilities in top AI Go algorithms, allowing humans to defeat AI with unorthodox strategies. Efforts to improve defenses show limited success, highlighting challenges in creating robust AI systems.
Game Designers Run Your Life
Games have a long history influencing human behavior, teaching lessons, and shaping interactions with technology. Designers use rewards to control player behavior, impacting society significantly.