Model Evaluation

NovaML provides a range of tools and metrics for evaluating the performance of machine learning models. This page covers the main evaluation techniques and metrics available in NovaML.

Classification Metrics

Accuracy Score

Accuracy is the ratio of correct predictions to total predictions.

using NovaML.Metrics

y = [0, 1, 1, 0, 1];
ŷ = [0, 1, 0, 0, 1]

accuracy = accuracy_score(y, ŷ);
println("Accuracy: $accuracy")

Following example load the Wisconsin Breast Cancer dataset, splits it to training and test sets and traing a logistic regression model with the training set. Then it calculates the accuracy_score for training and test sets.

using NovaML.Datasets: load_breast_cancer
X, y = load_breast_cancer(return_X_y=true)

using NovaML.ModelSelection
Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.2,
                                          stratify=y, random_state=1)

using NovaML.LinearModel
lr = LogisticRegression()

# train the model
lr(Xtrn, ytrn)

using NovaML.Metrics
ŷtrn, ŷtst = lr(Xtrn), lr(Xtst)

# training data accuracy
accuracy_score(ŷtrn, ytrn)
# test data accuracy
accuracy_score(ŷtst, ytst) 

Confusion Matrix

The confusion matrix provides a detailed breakdown of correct and incorrect classifications for each class. You can use confusion_matrix to create the confusion matrix and display_confusion_matrix to display it with labels.

using NovaML.Metrics
confmat = confusion_matrix(ytst, ŷtst)
# 2×2 Matrix{Int64}:
#  71   1
#   5  37
display_confusion_matrix(confmat)
#        1    2
#    ----------
# 1 | 71.0  1.0
# 2 |  5.0 37.0

We can create a better looking confusion matrix plot using the following function:

using Plots
using Plots.PlotMeasures

function plot_confusion_matrix(confmat::Matrix)
    n = size(confmat, 1)
    
    heatmap(confmat, 
            c=:Blues, 
            alpha=0.3, 
            aspect_ratio=:equal, 
            size=(300, 300),
            xrotation=0,
            xticks=1:n, 
            yticks=1:n,
            xlims=(0.5, n+0.5), 
            ylims=(0.5, n+0.5),
            right_margin=5mm,
            xlabel="Predicted label",
            ylabel="True label",
            xmirror=true, 
            framestyle=:box, 
            legend=nothing)
    
    for i in 1:n, j in 1:n
        annotate!(j, i, text(string(confmat[i,j]), :center, 10))
    end
    
    plot!(yflip=true)
    
    display(current())
end
plot_confusion_matrix(confmat)

Confusion matrix plot

Precision, Recall, and F1 Score

These metrics provide more detailed insights into model performance, especially for imbalanced datasets.

using NovaML.Metrics

precision_score(ytst, ŷtst)
# 0.9736842105263158
recall_score(ytst, ŷtst)
# 0.8809523809523809
f1_score(ytst, ŷtst)
# 0.925

Matthews Correlation Coefficient

The Matthews Correlation Coefficient (MCC) is a balanced measure for binary classification problems. It takes into account true and false positives and negatives, providing a balanced measure even for classes of different sizes.

\[MCC = \frac{TN \times TP - FN \times FP}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]

where

  • TP: True Positive
  • FP: False Positive
  • TN: True Negative
  • FN: False Negative

MCC ranges from -1 to +1:

  • +1 represents a perfect prediction
  • 0 represents no better than random prediction
  • 1 indicates total disagreement between prediction and observation
using NovaML.Metrics
matthews_corrcoef(ytst, ŷ)
#0.8872442622820285

ROC Curve and AUC

For binary classification problems, you can compute the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC).

ŷprobs = lr(Xtst, type=:probs)[:, 2]
fpr, tpr, _ = roc_curve(ytst, ŷprobs);
# auc score
roc_auc = auc(fpr, tpr)
# 0.9923941798941799

We can also plot the receiver operating characteristic curve using fpr and tpr values.

plot(fpr, tpr, color=:blue, linestyle=:solid, 
     label="Logistic Regression (auc = $(round(roc_auc, digits=2)))",
     xlabel="False Positive",
     ylabel="True Positive",
     title="Receiver Operating Characteristic Curve")     
plot!([0, 1], [0, 1], color=:gray, linestyle=:dash, linewidth=2, label="Random")

ROC Curve

Regression Metrics

We will use the Boston Housing Data in the section.

First let's import and prepare the data.

using NovaML.Datasets
X, y = load_boston(return_X_y=true)

Prepare the training and test data sets.

using NovaML.ModelSelection
Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.3, random_state=123)

Create and fit the linear regression model.

lr = LinearRegression()
lr(Xtrn, ytrn)
ŷtrn = lr(Xtrn)
ŷtst = lr(Xtst)

Mean Absolute Error (MAE)

MAE measures the average magnitude of errors in a set of predictions, without considering their direction.

using NovaML.Metrics
maetrn = mae(ytrn, ŷtrn)
maetst = mae(ytst, ŷtst)
maetrn, martst
#(0.07, 0.09)

Mean Squared Error (MSE)

MSE measures the average squared difference between the estimated values and the actual value.

msetrn = mse(ytrn, ŷtrn)
msetst = mse(ytst, ŷtst)
msetrn, msetst
#(0.0093, 0.0128)

R-squared Score

R-squared (R²) provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

r2_score(ytrn, ŷtrn)
# 0.9992871136620213
r2_score(ytst, ŷtst)
# 0.9991476060812577

Clustering Metrics

Silhouette Score

The Silhouette Score is used to evaluate the quality of clusters in clustering algorithms.

Let's first create an artificial dataset.

using NovaML.Datasets
X, y = make_blobs(
    n_samples=150,
    n_features=2,
    centers=3,
    cluster_std=0.5,
    shuffle=true,
    random_state=123)

using Plots
begin
    scatter(X[:, 1], X[:, 2],
            color=:white,
            markerstrokecolor=:black,
            markersize=6,
            xlabel="Feature 1",
            ylabel="Feature 2",
            grid=true,
            legend=false)
    
    plot!(size=(600, 400), margin=5Plots.mm)
end

Cluster Data

Next, create the clustering algorithm.

using NovaML.Cluster
km = KMeans(
    n_clusters=3,
    init="k-means++",
    n_init=10,
    max_iter=300,
    tol=1e-04,
    random_state=0)

km(X)

Now we can create the Silhouette Plot using the silhouette_samples from NovaML.Metrics.

using Statistics
using ColorSchemes
using NovaML.Metrics

function plot_silhoutte(km::KMeans)
    ykm = km.labels_
    cluster_labels = sort(unique(km.labels_))
    n_clusters = length(cluster_labels)
    silhouette_vals = silhouette_samples(X, km.labels_, metric="euclidean")
    δ = 1. / (length(silhouette_vals)+20)
    yval = 10δ
    
    p = plot(xlabel="Silhouette coefficient", label="Cluster", title="Silhouette Plot", legend=false, ylims=(0.0, 1.0), xlims=(0.0, 1.0), ylabel="Cluster");
    for (i, c) in enumerate(cluster_labels)
        c_silhouette_vals = silhouette_vals[ykm.==c]
        sort!(c_silhouette_vals)
        color = get(ColorSchemes.jet, i/n_clusters)
        for xval in c_silhouette_vals
            plot!(p, [0, xval], [yval, yval], color=color)
            yval += δ
        end
    end
    silhouette_avg = mean(silhouette_vals)
    vline!([silhouette_avg], color=:red, linestyle=:dash, lw=2)
    
    start = (1-20δ)/6
    stop = (1 - 10δ) - (1-20δ)/6
    
    yticks!(p, range(start, stop, length=n_clusters), string.(cluster_labels))    
end

silhouette plot