But First: A Theorem From 1785
All the way back in the year 1785, Marquis de Condorcet expressed a political science theorem about the relative probability of a given group of individuals arriving at a correct majority decision.
Despite its age, the theorem has been applied to a number of different fields, most notably for us the field of machine learning. To see how, let's start with one model, a decision tree with an accuracy of 60 percent.
Condorcet's Jury Theorem says that if each person is more than 50% correct, then adding more people to vote increases the probability that the majority is correct.
True to form, if we ask two more decision trees, each also with 60% accuracy, and decide by the majority vote, then the probability of the vote being right goes up to 65%!
The theorem suggests that the probability of the majority vote being correct can go up as we add more and more models. With eleven models, the majority can reach 75% accuracy!
There is a caveat. The accuracy may not improve if each model produces the same prediction, for example. The mistake of one model would not be caught by the other models. Therefore, we would want to cast a majority vote using the models that are different.
We can add more and more models.
Play with the scrollers yourself! The top controls the number of trees in the ensemble. The bottom controls each tree's accuracy.
In machine learning, this concept of multiple models working together to come to an aggregate prediction is called ensemble learning. It provides the basis for many important machine learning models, including random forests.