The Unreasonable Effectiveness of Monte Carlo

Nov 28, 2023

Monte Carlo methods are used in almost every branch of science: to evaluate risk in finance, to generate realistic lighting and shadows in 3D graphics, to do reinforcement learning, to forecast weather, and to solve complex game theory games.

There are many types of Monte Carlo Methods, but they all follow a general pattern — using random sampling to model complex systems.

A simple example: Imagine a complex shape you want to know the area of.

  1. Place the shape on a dartboard.
  2. Randomly throw darts at the dartboard.
  3. Count the number of darts that are inside the shape and outside.
  4. The estimated area of the shape is = (number of darts in shape / number of darts outside of shape) * the area of the dartboard.

(This is computing a definite integral numerically with a method that doesn’t depend on the dimensions! You can even easily estimate the error given the number of samples).

Monte Carlo Tree Search (MCTS). Or use it to play a game like Blackjack (Chess, Go, Scrabble, and many other turn-based games) with Monte Carlo Tree Search. AlphaGo and its predecessors (AlphaGo Zero and AlphaZero) used versions of Monte Carlo Tree Search with reinforcement learning and deep learning.

The idea is fairly simple — add a policy (i.e., a strategy to follow) to the random sampling process. You might start with a simple one (random or stay with a hand under 18). For every move in a game, add that to a tree that describes the game. For Blackjack, that might be a series of hits or stays. When a game is won or lost, go back and update all of the nodes in the tree for that game (the “back propagation”).

After many games, you have a tree of expected utility for each move — that means you can sample the next move much more effectively. The value says something like — “given this current hand and set of actions, I won X% of the time”. You can get more advanced with the reward and update function — for example, you might discount wins that take many turns and prioritize quicker wins.