Hi,. That’s a good question because this… | by Jonathan Grandperrin

1 min readJul 2, 2021

Hi,

That’s a good question because this scenario is closer to real-world OCR use cases. To make it work with invoices without date, we need to redefine in what quadrant (tp, fp, tn, fn) land each possible case on those invoices.

Let’s consider that when an invoice has no date and the prediction is:

empty: it’s a True Negative
not empty (it looks unlikely but it can happen): It’s a False Positive

Let’s check how it impacts the metrics:

The recall: tp / ( tp + fn ) is not impacted, and it looks normal because the recall gives a score about “How good your model is to extract positive values”. Positive can be interpreted in that case as “Extracting the right date when there is one written”.
The precision: tp / ( tp + fp ) is impacted. When the model outputs a date in the scenario there is no date in the invoice, we need to get a lower precision as it represents a score on “How correct is my model when it predicts something”.

Everything looks OK and the proxy still works. If we get back to the definitions:

true positive: the OCR correctly extracted the invoice date
false positive: the OCR extracted a wrong date or there is no date written in the invoice but the OCR predicted one
true negative: There is no date written in the invoice and the OCR returns an empty response
false negative: the OCR extracted no invoice date (i.e empty prediction)

Written by Jonathan Grandperrin

No responses yet