The Monte Carlo Method, maps and probability paradoxes

We have a bit of a mix here, example code using std::maps, the so called Monte Carlo method and a famous probability paradox.

 To kick us off lets illustrate the Monte Carlo method. In a nutshell this simulates a problem multiple times using random numbers to arrive at an approximate probability for an event.

A classic example is an approximation of PI.

If we pick random positions in a square with a circle filling it, some of the positions will be in the circle some of them outside of it.

Our total number of positions will bear the same ratio to the area of the square as our positions in the circle will to the area of the circle. Somewhere in that ratio we aught to find PI playing a role! Let's start by expressing this paragraph as an equation. The Circle area proportion  equalling the square are proportion. Where R is the ration of the circle.

(PI * R*R) /InCircle = (R*2) * (R*2)/InSquare

We see the R's cancel out. 

PI / InCircle = 4/InSquare

So we end with.

PI = 4 * InCircle/InSquare

I have chosen to do the following program using mainly integer maths to encourage playing with other types to improve the results.

github link

/ / gcc pi.c -lm
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

#define ATTEMPTS 100000
#define DIAMETER 20000
#define RADIUS (DIAMETER/2)

static int getRandom(int low, int high)
{
return rand() % (high + 1 - low) + low;
}

int main(int args, char **argc)
{
int InTheCircle = 0;
int OnTheEdge = 0;
int OutOfCircle = 0;
for (int i = 0; i < ATTEMPTS; i++)
{
int x = getRandom(0, DIAMETER);
int y = getRandom(0, DIAMETER);
int x1 = x - DIAMETER / 2;
int y1 = y - DIAMETER / 2;

// we are sticking with integer arithmetic for the simulation
int hypotenuse = (int)sqrt((double)((x1 * x1) + (y1 * y1)));
// printf("%d %d %d\n",x,y,hypotenuse);
if (hypotenuse == RADIUS)
{
OnTheEdge++;
InTheCircle++;
}
else if (hypotenuse > RADIUS)
{
OutOfCircle++;
}
else
{
InTheCircle++;
}
}
int square_area = DIAMETER * DIAMETER;
float pi1 = 4 * (float)(InTheCircle - OnTheEdge)/(float)ATTEMPTS;
float pi2 = 4 * (float)(InTheCircle + OnTheEdge)/(float)ATTEMPTS;
printf("In %d, out %d edge %d pi %f %f\n", InTheCircle, OutOfCircle, OnTheEdge, pi1, pi2);
}

So what do we get?

In 78589, out 21411 edge 12 pi 3.143080 3.144040

We get a range for PI which is not too far off, but as I say this code is muddled by using int where double would be by far the more sensible option for a really good result.

Now for something more interesting

There is a famous probability novelty/paradox. You are on a game show and the host asks you to choose between 3 boxes one with a prize in it. You make your guess. The presenter then shows one of the other boxes is empty so there are now two boxes, one with the prize in it. The presenter says you can change your guess, should you?

Most people think it is now 50/50 where the prize is and hence it will make no difference.

Let's try a simulation and we will use C++ std::unordered_map as a bit of a tutorial. This is going to be a brick by brick approach rather than simplifying the problem as the idea is to show using the std::unordered_map!


//gcc box.cpp - lstdc++
//gcc unordered_map.cpp -lstdc++
#include <stdlib.h>
#include <stdio.h>
#include <unordered_map>
#include <iterator>

#define NUMBER_OF_BOXES 3
#define ATTEMPTS 400

static int getRandom(int low, int high)
{
return rand() % (high + 1 - low) + low;
}

int main(int args, char **argc)
{
int wins_if_stick = 0;
int wins_if_change = 0;

for (int attempts = 0; attempts < ATTEMPTS; attempts++)
{
std::unordered_map<int, bool> boxes;

// create some empty boxes
for (int box = 1; box <= NUMBER_OF_BOXES; box++)
{
boxes.insert(std::make_pair(box, false));
}

int prize_box = getRandom(1, NUMBER_OF_BOXES);

std::unordered_map<int, bool>::iterator it = boxes.find(prize_box);

it->second = true;

int picked_box_number = getRandom(1, NUMBER_OF_BOXES);

// host removes boxes
for (it = boxes.begin(); it != boxes.end();)
{
// We remove anything that is neither out box or contains the prize
if (it->first != picked_box_number && it->second == false)
{
it = boxes.erase(it);
continue;
}
it++;
}

it = boxes.find(picked_box_number);

if (it->second == true)
{
wins_if_stick++;
}
else
{
wins_if_change++;
}
}
printf("Wins on sicking to chosen box %d\n", wins_if_stick);
printf("Wins on changing box %d\n", wins_if_change);
}


Before looking at the output node with way we remove boxes. The erase function returns the next iterator to the loop. So we cannot use "it++" in the for statement itself. How to erase when looping is a constant problem for container classes!
So the output:

Wins on sicking to chosen box 136
Wins on changing box 264

We win about twice as often when changing boxes!
Lets have the program do the thinking for us, lets up the number of boxes, still leaving just two to pick from.

#define NUMBER_OF_BOXES 25

Wins on sicking to chosen box 9
Wins on changing box 391


Now the penny starts to drop, if there were 25 boxes, it's clearly only one change in 25 our original choice was the right one! So 24 out of 25 times we would be correct in changing our pick. This (N-1)/N probability extends all they way down to 3 boxes where there will be 2 chance in 3 the other box contains the prize. 

If you phrase the question another way and have the host not remove the boxes, but stack the boxes into two columns, one containing your solitary picked box and the other containing two boxes and asking you not to pick a box, but to pick a column, then few people would have any difficultly in making the right choice.  Removing the boxes is really just a misdirection from seeing the sum of the chances as 1 for your pick but 2 in the alternate box or column.














Comments

Popular posts from this blog

A reliable tuya authentication in node-red

node-red a full home automation example

Arduino boards - what I wish I'd known when starting