Jonathan Grandperrin
1 min readJul 2, 2021

--

Hi,

That’s a good question because this scenario is closer to real-world OCR use cases. To make it work with invoices without date, we need to redefine in what quadrant (tp, fp, tn, fn) land each possible case on those invoices.

Let’s consider that when an invoice has no date and the prediction is:

  • empty: it’s a True Negative
  • not empty (it looks unlikely but it can happen): It’s a False Positive

Let’s check how it impacts the metrics:

  • The recall: tp / ( tp + fn ) is not impacted, and it looks normal because the recall gives a score about “How good your model is to extract positive values”. Positive can be interpreted in that case as “Extracting the right date when there is one written”.
  • The precision: tp / ( tp + fp ) is impacted. When the model outputs a date in the scenario there is no date in the invoice, we need to get a lower precision as it represents a score on “How correct is my model when it predicts something”.

Everything looks OK and the proxy still works. If we get back to the definitions:

  • true positive: the OCR correctly extracted the invoice date
  • false positive: the OCR extracted a wrong date or there is no date written in the invoice but the OCR predicted one
  • true negative: There is no date written in the invoice and the OCR returns an empty response
  • false negative: the OCR extracted no invoice date (i.e empty prediction)

--

--

Jonathan Grandperrin
Jonathan Grandperrin

Written by Jonathan Grandperrin

CEO Mindee Computer vision & software dev enthusiast

No responses yet