Bayes’ Theorem – Part 1

In the first post, we discussed the importance of model updating, now we want to focus on the mathematics that comes into play. To do so, we will make use of Bayes’ theorem. In particular, we will need an alternative formulation of the (hopefully) well-known result.

First of all, we will briefly revise conditional probabilities.

Definition. Given two events A and B in a probability space, with P(B)\neq 0, the conditional probability of A given B is the quantity

    \[P(A|B)=\frac{P(A\cap B)}{P(B)}.\]

The idea is the following: we know that some event (B in the definition above) has happened, and we want to renormalise our notion of probability to the new reality we are living in, the one where B is certain, as it already happened. One can notice that the conditional probability with respect to B is itself a probability measure, and, in particular, satisfies all the properties of a probability. At the same time, it is worth noticing, with an example, that P(A|B), in general, is not equal to P(B|A). Among inmates in Italy, roughly 90% are male, so we can writeP(\text{male}\,|\,\text{inmate})=90\%. However, it is definitely false that the probability that an Italian male is an inmate, that is P(\text{inmate}\,|\,\text{male}), is 90%. At the same time, the two probabilities are connected with one another, as the following result shows.

Theorem (Bayes, probability form). Given two events with nonzero probability A and B, the following identity holds:

    \[P(B|A)=\frac{P(A|B)\cdot P(B)}{P(A)}.\]

The proof of this result is an immediate computation, but we are actually not so interested in the proof itself. Truth be told, we are not particularly interested in the link between P(B|A) and P(A|B), in this context of mathematical modelling. We will then restate the same result in a slightly different way.

Consider a belief you have about the world, in particular something that you can test. For example, if you just popped up in room A206 during a class of “Mathematical Models”, you might think that this is a course for students in the Master in Mathematics for Models. However, you cannot be sure of this, so you assign some probability to your belief, as a measure of your confidence. Let’s denote by H the hypothesis we believe in (in the example, the course being in the program for Models), and with P(H) the amount of our belief, for example, P(H)=60\%. We can now run an experiment, E, to test our assumption and improve our knowledge and understanding. For example, we can ask another attending student the program they are part of, say that E is “the attending student is in the Master for Modelling”. As soon as we gather this information, assuming it is positive, we can update our belief in the following way:

    \[P(H|E)=\frac{P(E|H)}{P(E)}\cdot P(H).\]

Our probabilities change, and we do not expect that a positive answer closes the question once for all, since it could also happen that a student in a program is taking also courses from a different program (and it could also happen that the student is not in the program we think, without it ruling out our hypothesis). We want to measure how a positive (or negative) answer changes our belief. However, to do so, we need to assess the probabilities coming into play. On one hand, we need to know the probability that a generic student is in the Master in Mathematics for Models, P(E), on the other, we also need the probability that, assuming that this course is for the Master in Models, a generic student taking the course is in fact enrolled in that program, P(E|H).

So we have updated our belief from the one we held at the beginning P(H), also called prior belief or just prior, to a new one, P(H|E), that takes into account the information we obtained and is called the posterior (belief). This is the way we are going to use Bayes’ theorem in the context of modelling.

There are two remarks here. The first one regards the denominator: P(E). It could be hard to assess this probability, so sometimes one uses the law of total probability (or factorisation formula) to write it in a different way, using any suitable partition of the sample space. A typical solution is to use this identity:

(1)   \begin{equation*}P(E) = P(E|H)\cdot P(H) + P(E|H^c)\cdot P(H^c), \end{equation*}

where with H^c we are denoting the complement of the event H. Notice that this choice allows us to recycle the probabilities P(H) and P(E|H), and requires us to assess only P(E|H^c), which however might be easier to estimate than P(E) since we are in a specific setting, H^c.

As for the second remark, it has to do with something we have mentioned above: there is this sentence “you cannot be sure of this”. If we want to use Bayes’ theorem, we can never ever assign probability 0 or 1 to an event, or in other terms be absolutely sure of something. If we were to do that, no amount of information or experiments could make us change our minds. To see it, try to plug P(H)=0 or P(H)=1 into (1) and see what happens. This is crucial, it tells us that if we feel we are completely certain about something (say, homeopathy does not work) but we want to be good scientists, we cannot assign a probability 0 to homeopathy, we have to give a very small (but nonzero) probability, for example, 10^-100.

We have mentioned, however, that we want to check multiple times our assumption. For example, after asking the first student about the master program they are enrolled in, we might want to ask a second one. In other words, we need to use Bayes’ theorem with repeated experiments. We will see how to do that, and the issues it raises with the probabilistic formulation of the theorem, in another post.

One thought on “Bayes’ Theorem – Part 1”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.