How is machine learning accuracy calculated

Python course

Evaluation metrics


Not only in machine learning but in general life, especially business, you will encounter questions like "How accurate is your product?" or "How precise is your machine?" Listen. When people answer like "This is the most accurate product in its field!" or "This machine has the highest imaginable precision!" received, both answers feel good. You should not? If you get both answers at the same time, yes, but just one at a time could be problematic. Indeed, the terms "accurate" and "precise" are very often used interchangeably. We will give precise definitions later in the text, but in short we can say: Accuracy is a measure of the closeness of some measurements to a certain value, while precision is the closeness of the measurements to one another.

These terms are also of the utmost importance in machine learning. We need them to evaluate ML algorithms or, better, for their results.

This chapter of our Python machine learning tutorial introduces four key metrics. These metrics are used to evaluate the results of classifications.

The metrics are:

  • Accuracy
  • Precision
  • Recall
  • F dimension (English F1 score)

We'll introduce each of these metrics and discuss their pros and cons. Each metric measures different aspects of a classifier's performance. The metrics are paramount to all of the chapters of our machine learning tutorial.

Accuracy versus Precision

The accuracy is a measure of the closeness of measurements to a certain value (the desired value) while the precision is a measure of the closeness of the measurements to one another, i.e. not necessarily related to an actual or desired value. In other words, if we have a set of data points from repeated measurements of the same size, the set is said to be accurate if their average is close to the real value. On the other hand, we call the amount precise when the readings are close to each other, but possibly distant from the actual value. The two concepts are independent of each other, which means that the dataset can be exact, precise, precise and accurate, or neither. We show this in the following diagram: