Let us start with the following problem.

**Problem:** Giovanni is an Italian man, soft-spoken and very calm. He enjoys sharing what he knows and learning new things. He wears glasses and is particularly fond of logic puzzles. Is it more likely that Giovanni is a high school teacher or a metal worker?

Let us start with a gut answer: given the description of Giovanni, we can easily see him as a teacher, he really fits our stereotypical description. At the same time, we want to go a little beyond the gut answer: is there a way to give a more “mathematical” or “scientific” assessment? We can split the problem at hand into smaller ones: we want to figure out how could we come to an answer, what data we have and how can we get from the data to the answer.

So, let us take a look at the actual, hard information we have: Giovanni is an Italian man. Period. That is it. All the remaining information is not something we can put a number on. Sure, it adds flavour to the description, but in terms of actual information, it doesn’t really help us. Ok, maybe the information regarding glasses could be actionable, so let’s keep it in sight for the moment.

Going to the opposite part, how can we read the final question? Well from a mathematical point of view, we can compare the size of the two sets “Italian male high school teacher (with glasses)” and “Italian male metal worker (with glasses)”: we are using all the information we have and this is the best estimate we can give of the relative likelihood.

So we have transformed our problem in an estimate: we have to assess the size of the two sets. Now, since we are only interested in the relative size (that is how the two sets compare with one another) and not the absolute size (how many people satisfy those characteristics), we can get rid of the information regarding glasses. In fact, we do not really expect people with glasses (that is with eye problems) to be more frequent among teachers or metal workers, the proportion is going to be roughly the same in both categories, so let’s forget about that, and make our estimate easier.

Now, why do we keep talking about estimates? Cannot we just go and look the information up? Well, let’s see if we can at least get a good guess at the number. Remember: we are not interested in the precise number, just the order of magnitude is fine.

How many high school teachers do we have in Italy? Well, they are a function of the number of students, so we start with that. In Italy there are 60 million people, which we can assume uniformly distributed between ages 0 and 85 (again, not really true, but a good approximation), so the people between ages 14 and 18 are 1/17 of the total, namely 3.3 million. Not all of them will attend school, so we can assume that the students are somewhat less than 3 million (say, 2.7 or 2.8 million). Now let us move on to the number of classes: each class has some 25-30 students, so we can guess 100,000 classes overall (3 million divided by 30). How many teachers do we have per class? Well, a class has roughly 30 hours/week and a teacher has 18 hours in class per week (again disregarding part-time teachers and so on), so we can assume a little less than two teachers per class, that is 200,000 high school teachers in Italy.

Did we do a good job? If we check the numbers by the Italian Institute for Statistics (ISTAT), we can see that in 2014 there were 191,615 high school teachers. Not bad!

Actually, we are interested in the number of male high school teachers, but we can take roughly half our estimate since we do not really expect a ratio of female to male teachers of 10:1.

Notice that probably this number is smaller than you would have guessed at a gut level: this has to do with the availability bias, we are more familiar with high school teachers, so we probably overestimate their number.

For the metal workers, it might seem more difficult to have an estimate, but if we just want to answer the original question, it is sufficient to ask if the number is smaller or bigger than 100,000. And if we think about it, probably we can be confident that there are for sure no less than 100,000 metal workers in Italy. If we want a more accurate estimate, we can use a trick: we take the minimum number that we deem reasonable (in this case 100,000, as we just said) and the maximum number, a number we are completely sure is an overestimate, say 10,000,000. Then, we take the geometric mean of the two, in this case 1,000,000 (which a posteriori turns out to be off by almost a factor 2 from the actual value, around 1.8 million).

So we can conclude that with the information we are given, it is more likely that Giovanni is a metal worker, after all.

Bonus question: why did we consider the geometric mean and not the arithmetic one?

## One thought on “Fermi estimates”