Most beginner machine learning practictioners get confused with these two metrics that are the heart building any successful ML model.

Task: Classify cat/not-cat

After classifying bunch of images:


Just take the ones that are predicted as cat.

How many of those images were actually cat?


How many % of cats were correctly recognised by the model?

Which one is better?

It seems both of them needs to be better metrics for our model. But should precision be high or should recall be high? Or should both of the them be high?

That’s where F1-score comes to play:


It is the average(actually a harmonic mean) of precision and recall. So whichever has better F1 score wins.

If you like my article, don’t forget to follow me on Medium, or connect me on Linkedin, or follow me on twitter.

According to Uncle bob, there are three rules that should be followed to get the best results from TDD. They are:

  • You should write only enough tests that are sufficient to fail
  • You should only write enough code that is sufficient to pass the written failing code
  • Write production code…

Rabin Poudyal

Software Engineer, Data Science Practitioner. Say "Hi!" via email: or visit my website

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store