ROC Curves
Definition
ROC stands for Receiver Operating Characteristic and is a tool used to analyse classifiers (typically binary where it can have a True or False outcome). The Curve is a graph with the rate of true positives vs the rate of false positives. This may sound like it doesn’t account for false negatives and true positives but the equation for finding these rates uses all these variables. See below:
To calculate true and false positives:
True Positive Rate (TPR)
\(True Positives = TP\) \(False Negatives = FN\) \(TPR = \frac{TP}{TP + FN}\)
A high TPR is desirable
False Positive Rate (FPR)
\(True Negatives = TN\) \(False Positive = FP\) \(TPR = \frac{FP}{FP + TN}\)
A low FPR is desirable
ROC Plot
Use the following import to calculate FPR, TRP and thresholds
sklearn.metrics.roc_curve
The plot is a good visual tool but to numerically analyse the tool you can use the area under the curve as a metric. This is calculated using the following import:
sklearn.metrics.auc
An area of 1 means the model is perfect and an area of 0.5 means the model is totally random.
Example code for ROC Plot
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
# Example true labels (0 or 1)
y_true = np.array([0, 0, 1, 1, 0, 1, 0, 1])
# Example predicted probabilities for the positive class
y_scores = np.array([0.1, 0.4, 0.35, 0.8, 0.2, 0.7, 0.05, 0.9])
def calc_fpr_trp(true, score):
# Calculate the ROC curve
fpr, tpr, thresholds = metrics.roc_curve(true, score)
auc = metrics.auc(fpr, tpr)
return fpr, tpr, thresholds, auc
fpr, tpr, thresholds, auc = calc_fpr_trp(y_true, y_scores)
# The 'fpr' variable now contains the False Positive Rates at different thresholds.
print("False Positive Rates (FPR):", fpr)
print("True Positive Rates (TPR):", tpr)
print("Thresholds:", thresholds)
def plot_ROC(fpr, tpr, thresholds, auc):
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--') # Diagonal line for random classifier
plt.plot(fpr, thresholds, label="thresholds")
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()
plot_ROC(fpr, tpr, thresholds, auc)