Hi,
In a binary classification problem, the PR curve when setting a threshold value of 1 is actually not defined. If we get back to our definitions with this threshold value:
- The recall: tp / (tp + fn) is equal to 0. tp is null as we never say “yes”, and fn is generally not null (can be null only when the full test data set is positive, which is kind of a nonsense for a binary classification problem)
- The precision: tp / ( tp + fp ) is undefined. tp is null, as we never say “yes”, but fp is also null for the same reason.
For a threshold of 1, we then have the data point (0, undefined). By convention, we set this data point to (0, 1). Why?
Mathematical explanation: Given a sample of test data, you can see the precision as a function of the threshold value. The precision is not defined for threshold=1, but it tends to 1 when the threshold approaches 1.
Qualitative explanation: The precision is about answering the question: How right am I when I say “yes”? When setting a higher threshold, you expect your algorithm to work in a way that makes you a bit more sure to be right when it predicts something positive. The convention (0, 1) fits this.