## Quantophile

### Financial programming for Quants

#### Category: Classical Probability

A random variable $$X$$ is a function on a sample space. Typical random variables are the result of tossing a coin, the value on rolling a die, the number of aces in a Poker hand, of multiple birthdays in a company of $$n$$ people, number of successes in $$n$$ Bernoulli trials. The classical theory of probability was devoted mainly to the study of a gambler’s gain, which is again a random variable.

The position of a particle under diffusion, the energy, temperature etc. of physical systems are random variables, but they are defined in non-discrete sample spaces. The source of randomness in the random variable is the experiment itself, in which events $${X=x_{j}}$$ are observed according to the probability function $$P$$.

In a large number of repetitions of an experiment, what is the average value of the random variable? What is the variance of the random variable? The links to my notes and solved problems is given below.

Theorems and Notes

Conditional probability has numerous applications in scientific, medical and legal reasoning, statistical genetics, cryptography etc. Whenever we observe new evidence or information, the odds of an event must be updated.

##### A motivating example.

Example. The Monty Hall problem.

On the game show Let’s Make a Deal, hosted by Monty Hall, there are three doors, randomly one of the doors has a car behind them, the other two doors have goats behind them. You as the contestant have no idea, which one has the car, they are all equally likely. But Monty Hall knows which one has the car.

You as a contestant, pick a door, say door $$1$$. Then, what happens is, Monty opens up either of the remaining doors $$2$$ or $$3$$, revealing a goat. The door he opens always has a goat behind it (he never reveals the car!), for example, Monty opens goat door $$3$$. So, then you know, that the car is behind the door you initially picked, door $$1$$ or the other unopened door $$2$$.

Monty then offers the contestant the option of switching to the other unopened door, or keeping your original choice. The question is, should you stay with with your initial choice or should you switch?

Assumption. Monty always opens a goat door. If he has a choice, of which door to open, he picks with equal probabilities.

For example, if you initially guessed right, so the car is behind the door $$1$$, doors $$2$$ and $$3$$ both have goats. Monty could open either door $$2$$ or door $$3$$. Assume these are equally likely.

###### Should you stay with your initial choice or switch?

Many people, upon seeing the problem for the first time, argue that there is no advantage to switching : “There are two doors remaining, and one of them has the car, so the chances are 50-50”. Controversy has raged about this problem for years and years.

Solution.
Under the standard assumptions above, the answer is, you should switch. Let’s label the doors $$1$$ through $$3$$. We know that, any of the $$3$$ doors are equally likely to be the car-door.

$$P(C_{1})=P(C_{2})=P(C_{3})=1/3$$

Without the loss of generality, we can assume that the contestant picked the door $$1$$. Suppose that Monty opens door $$3$$. Suppose we believe the car is behind door $$2$$, denoted by event $$C_{2}$$. The prior probability that the car is behind door $$2$$, $$P(C_{2})=1/3$$ can be updated in light of the new information about Monty opening door $$3$$.

Bayes’s rule is used to update prior probabilities by incorporating new information.

\begin{align} P(C_{2}|M_{3})&=P(C_{2})\cdot\frac{P(M_{3}|C_{2})}{P(M_{3})}\\ &=(1/3)\cdot\frac{1}{(1/2)}\\ &=2/3 \end{align}

And, the probability that the car is behind door $$1$$ is,

$$P(C_{1}|M_{3})=1-P(C_{2}|M_{3})=1/3$$

It would be an abuse of the naive definition of probability, if you just immediately say, that the two doors are equally likely to have a car behind them. There is a $$1/3$$rd chance that the car is behind the first door, and a $$2/3$$rds chance it is behind the second door.

Thus, the observation that Monty opened door $$3$$ makes us more sure of our belief that the car is behind door $$2$$.

###### Building correct intuition

Let’s consider an extreme case. Suppose that there are a thousand doors, $$999$$ of which contain goats and $$1$$ of which has a car. After the contestant’s initial pick, Monty opens opens $$998$$ doors with goats behind them. Let’s update our prior probabilities as Monty opens doors with goats behind them.

Assume, Monty opens door $$1000$$.

$$P(C_{2}|M_{1000})=P(C_{2})\frac{P(M_{1000}|C_{2})}{P(M_{1000})}=\frac{999}{998}\cdot\frac{1}{1000}$$

Next, Monty opens door $$999$$.

$$P(C_{2}|M_{999}M_{1000})=P(C_{2}|M_{1000})\frac{P(M_{999}|C_{2}M_{1000})}{P(M_{999}|M_{1000})}=\frac{999}{997}\cdot\frac{1}{1000}$$

Continuing in this fashion,

$$P(C_{2}|M_{3}\ldots{M_{1000}})=\frac{999}{1000}$$

In this extreme case, it becomes clear that the probabilities are not $$50-50$$. As Monty eliminates $$998$$ of the $$999$$ doors, we are extremely confident of the belief that the car is behind the remaining unopened door.

###### R Simulation

A simple R simulation demonstrates that the strategy to always switch has a success probability of $$2/3$$.

The experiment is repeated $$1$$ millon times.

The number of games won and lost are :

The links to my notes and the solved problems are given below.

In a typical $$6/49$$ lotto, where $$6$$ numbers are drawn from a range of $$49$$ and if the six numbers drawn match on the ticket, the ticket holder is a jackpot winner. The odds of this event are $$1$$ in $$13,983,816$$.

The study of counting methods dates at least to Gottfried Wilhelm Leibniz’s Dissertatio de Arte Combinatoria in the seventeenth century. Combinatorics has applications in many fields of study. Applications of combinatorics arise, for example in chemistry, in studying the arrangements of atoms in molecules and crystals; biology in questions about the structure of genes and proteins; physics, in problems in statistical mechanics; communication, in the design of codes for encryption, compression, and especially in computer science, for instance in problems of scheduling and allocating resources.

The links to my notes on how to count and the solved problems are given below.

The points of a sample space may be very concrete objects. For example, on tossing a coin, a head or tail is realized, on tossing a coin twice, a head-head, head-tail, tail-head or tail-tail is realized, on rolling die, one of the numbers $$1,2,3,4,5,6$$ is realized, on drawing a card from a standard deck of cards, one of four suits – club, spade, heart, diamond and one of $$13$$ ranks is realized.

However, sometimes the sample space of an experiment can be incredibly complicated or higher dimensional and the outcomes may be non-numeric. For example, in a sequence of $$N$$ coin tosses, the sample space has an enormous $$2^N$$ sample points. In a survey of a random sample of people in a city, various questions may have numeric (e.g. age or height) and non-numeric answers (e.g. favorite political party or favorite movie). Thus, we require a mathematical description of the sample points.

Random variables fulfill this purpose. They assign numerical values to each sample point.

Random variables and their distributions