Classification Metrics¶
F1
¶
Bases: BaseMetric
F1 score metric for classification tasks.
This class calculates the F1 score, which is the harmonic mean of precision and recall. It's particularly useful when you need a balance between precision and recall.
Key Features
- Calculates F1 score for binary and multiclass classification
- Supports different averaging methods (micro, macro, weighted, etc.)
- Integrates with framework3's BaseMetric interface
Usage
The F1 metric can be used to evaluate classification models:
from framework3.plugins.metrics.classification import F1
from framework3.base.base_types import XYData
import numpy as np
# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))
# Create and use the F1 metric
f1_metric = F1(average='macro')
score = f1_metric.evaluate(x_data, y_true, y_pred)
print(f"F1 Score: {score}")
Attributes:
Name | Type | Description |
---|---|---|
average |
str
|
The type of averaging performed on the data. Default is 'weighted'. |
Methods:
Name | Description |
---|---|
evaluate |
XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the F1 score for the given predictions and true values. |
Note
This metric uses scikit-learn's f1_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.
Source code in framework3/plugins/metrics/classification.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
__init__(average='weighted')
¶
Initialize a new F1 metric instance.
This constructor sets up the F1 metric with the specified averaging method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
average
|
Literal['micro', 'macro', 'samples', 'weighted', 'binary']
|
The type of averaging performed on the data. Default is 'weighted'. Other options include 'micro', 'macro', 'samples', 'binary', or None. |
'weighted'
|
Note
The 'average' parameter is passed directly to scikit-learn's f1_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.
Source code in framework3/plugins/metrics/classification.py
evaluate(x_data, y_true, y_pred, **kwargs)
¶
Calculate the F1 score for the given predictions and true values.
This method computes the F1 score, which is the harmonic mean of precision and recall.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_data
|
XYData
|
The input data (not used in this metric, but required by the interface). |
required |
y_true
|
XYData | None
|
The ground truth (correct) target values. |
required |
y_pred
|
XYData
|
The estimated targets as returned by a classifier. |
required |
**kwargs
|
Unpack[PrecissionKwargs]
|
Additional keyword arguments passed to sklearn's f1_score function. |
{}
|
Returns:
Type | Description |
---|---|
Float | ndarray
|
Float | np.ndarray: The F1 score or array of F1 scores if average is None. |
Raises:
Type | Description |
---|---|
ValueError
|
If y_true is None. |
Note
This method uses scikit-learn's f1_score function internally with zero_division=0.
Source code in framework3/plugins/metrics/classification.py
Precission
¶
Bases: BaseMetric
Precision metric for classification tasks.
This class calculates the precision score, which is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives.
Key Features
- Calculates precision score for binary and multiclass classification
- Supports different averaging methods (micro, macro, weighted, etc.)
- Integrates with framework3's BaseMetric interface
Usage
The Precission metric can be used to evaluate classification models:
from framework3.plugins.metrics.classification import Precission
from framework3.base.base_types import XYData
import numpy as np
# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))
# Create and use the Precission metric
precision_metric = Precission(average='macro')
score = precision_metric.evaluate(x_data, y_true, y_pred)
print(f"Precision Score: {score}")
Attributes:
Name | Type | Description |
---|---|---|
average |
Literal['micro', 'macro', 'samples', 'weighted', 'binary'] | None
|
The type of averaging performed on the data. Default is 'weighted'. |
Methods:
Name | Description |
---|---|
evaluate |
XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the precision score for the given predictions and true values. |
Note
This metric uses scikit-learn's precision_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.
Source code in framework3/plugins/metrics/classification.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
|
__init__(average='weighted')
¶
Initialize a new Precission metric instance.
This constructor sets up the Precission metric with the specified averaging method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
average
|
Literal['micro', 'macro', 'samples', 'weighted', 'binary'] | None
|
The type of averaging performed on the data. Default is 'weighted'. Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None. |
'weighted'
|
Note
The 'average' parameter is passed directly to scikit-learn's precision_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.
Source code in framework3/plugins/metrics/classification.py
evaluate(x_data, y_true, y_pred, **kwargs)
¶
Calculate the precision score for the given predictions and true values.
This method computes the precision score, which is the ratio of true positives to the sum of true and false positives.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_data
|
XYData
|
The input data (not used in this metric, but required by the interface). |
required |
y_true
|
XYData | None
|
The ground truth (correct) target values. |
required |
y_pred
|
XYData
|
The estimated targets as returned by a classifier. |
required |
**kwargs
|
Unpack[PrecissionKwargs]
|
Additional keyword arguments passed to sklearn's precision_score function. |
{}
|
Returns:
Type | Description |
---|---|
Float | ndarray
|
Float | np.ndarray: The precision score or array of precision scores if average is None. |
Raises:
Type | Description |
---|---|
ValueError
|
If y_true is None. |
Note
This method uses scikit-learn's precision_score function internally with zero_division=0.
Source code in framework3/plugins/metrics/classification.py
Recall
¶
Bases: BaseMetric
Recall metric for classification tasks.
This class calculates the recall score, which is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives.
Key Features
- Calculates recall score for binary and multiclass classification
- Supports different averaging methods (micro, macro, weighted, etc.)
- Integrates with framework3's BaseMetric interface
Usage
The Recall metric can be used to evaluate classification models:
from framework3.plugins.metrics.classification import Recall
from framework3.base.base_types import XYData
import numpy as np
# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))
# Create and use the Recall metric
recall_metric = Recall(average='macro')
score = recall_metric.evaluate(x_data, y_true, y_pred)
print(f"Recall Score: {score}")
Attributes:
Name | Type | Description |
---|---|---|
average |
str | None
|
The type of averaging performed on the data. Default is 'weighted'. |
Methods:
Name | Description |
---|---|
evaluate |
XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the recall score for the given predictions and true values. |
Note
This metric uses scikit-learn's recall_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.
Source code in framework3/plugins/metrics/classification.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
|
__init__(average='weighted')
¶
Initialize a new Recall metric instance.
This constructor sets up the Recall metric with the specified averaging method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
average
|
str | None
|
The type of averaging performed on the data. Default is 'weighted'. Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None. |
'weighted'
|
Note
The 'average' parameter is passed directly to scikit-learn's recall_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.
Source code in framework3/plugins/metrics/classification.py
evaluate(x_data, y_true, y_pred, **kwargs)
¶
Calculate the recall score for the given predictions and true values.
This method computes the recall score, which is the ratio of true positives to the sum of true positives and false negatives.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_data
|
XYData
|
The input data (not used in this metric, but required by the interface). |
required |
y_true
|
XYData | None
|
The ground truth (correct) target values. |
required |
y_pred
|
XYData
|
The estimated targets as returned by a classifier. |
required |
**kwargs
|
Unpack[PrecissionKwargs]
|
Additional keyword arguments passed to sklearn's recall_score function. |
{}
|
Returns:
Type | Description |
---|---|
Float | ndarray
|
Float | np.ndarray: The recall score or array of recall scores if average is None. |
Raises:
Type | Description |
---|---|
ValueError
|
If y_true is None. |
Note
This method uses scikit-learn's recall_score function internally with zero_division=0.
Source code in framework3/plugins/metrics/classification.py
Overview¶
The Classification Metrics module in framework3 provides a set of evaluation metrics specifically designed for assessing the performance of classification models. These metrics help in understanding various aspects of a classifier's performance, such as accuracy, precision, recall, and F1-score.
Available Classification Metrics¶
Accuracy Score¶
The Accuracy Score is implemented in the AccuracyScoreMetric
. It computes the accuracy of a classification model by comparing the predicted labels with the true labels.
Usage¶
from framework3.plugins.metrics.classification.accuracy_score import AccuracyScoreMetric
accuracy_metric = AccuracyScoreMetric()
score = accuracy_metric.compute(y_true, y_pred)
Precision Score¶
The Precision Score is implemented in the PrecisionScoreMetric
. It computes the precision of a classification model, which is the ratio of true positive predictions to the total number of positive predictions.
Usage¶
from framework3.plugins.metrics.classification.precision_score import PrecisionScoreMetric
precision_metric = PrecisionScoreMetric(average='weighted')
score = precision_metric.compute(y_true, y_pred)
Parameters¶
average
(str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.
Recall Score¶
The Recall Score is implemented in the RecallScoreMetric
. It computes the recall of a classification model, which is the ratio of true positive predictions to the total number of actual positive instances.
Usage¶
from framework3.plugins.metrics.classification.recall_score import RecallScoreMetric
recall_metric = RecallScoreMetric(average='weighted')
score = recall_metric.compute(y_true, y_pred)
Parameters¶
average
(str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.
F1 Score¶
The F1 Score is implemented in the F1ScoreMetric
. It computes the F1 score, which is the harmonic mean of precision and recall.
Usage¶
from framework3.plugins.metrics.classification.f1_score import F1ScoreMetric
f1_metric = F1ScoreMetric(average='weighted')
score = f1_metric.compute(y_true, y_pred)
Parameters¶
average
(str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.
Comprehensive Example: Evaluating a Classification Model¶
In this example, we'll demonstrate how to use the Classification Metrics to evaluate the performance of a classification model.
from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin
from framework3.plugins.metrics.classification.accuracy_score import AccuracyScoreMetric
from framework3.plugins.metrics.classification.precision_score import PrecisionScoreMetric
from framework3.plugins.metrics.classification.recall_score import RecallScoreMetric
from framework3.plugins.metrics.classification.f1_score import F1ScoreMetric
from framework3.base.base_types import XYData
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create XYData objects
X_train_data = XYData(_hash='X_train', _path='/tmp', _value=X_train)
y_train_data = XYData(_hash='y_train', _path='/tmp', _value=y_train)
X_test_data = XYData(_hash='X_test', _path='/tmp', _value=X_test)
y_test_data = XYData(_hash='y_test', _path='/tmp', _value=y_test)
# Create and train the classifier
classifier = ClassifierSVMPlugin(kernel='rbf')
classifier.fit(X_train_data, y_train_data)
# Make predictions
predictions = classifier.predict(X_test_data)
# Initialize metrics
accuracy_metric = AccuracyScoreMetric()
precision_metric = PrecisionScoreMetric(average='weighted')
recall_metric = RecallScoreMetric(average='weighted')
f1_metric = F1ScoreMetric(average='weighted')
# Compute metrics
accuracy = accuracy_metric.compute(y_test_data, predictions)
precision = precision_metric.compute(y_test_data, predictions)
recall = recall_metric.compute(y_test_data, predictions)
f1 = f1_metric.compute(y_test_data, predictions)
# Print results
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
This example demonstrates how to:
- Load and prepare the Iris dataset
- Create XYData objects for use with framework3
- Train an SVM classifier
- Make predictions on the test set
- Initialize and compute various classification metrics
- Print the evaluation results
Best Practices¶
-
Multiple Metrics: Use multiple metrics to get a comprehensive view of your model's performance. Different metrics capture different aspects of classification performance.
-
Class Imbalance: Be aware of class imbalance in your dataset. In such cases, accuracy alone might not be a good metric. Consider using precision, recall, and F1-score.
-
Averaging Methods: When dealing with multi-class classification, pay attention to the averaging method used in metrics like precision, recall, and F1-score. 'Weighted' average is often a good choice for imbalanced datasets.
-
Cross-Validation: Use cross-validation to get a more robust estimate of your model's performance, especially with smaller datasets.
-
Confusion Matrix: Consider using a confusion matrix in addition to these metrics for a more detailed view of your model's performance across different classes.
-
ROC AUC: For binary classification problems, consider using the ROC AUC score as an additional metric.
-
Threshold Adjustment: Remember that metrics like precision and recall can be affected by adjusting the classification threshold. Consider exploring different thresholds if needed.
-
Domain-Specific Metrics: Depending on your specific problem, you might need to implement custom metrics that are more relevant to your domain.
Conclusion¶
The Classification Metrics module in framework3 provides essential tools for evaluating the performance of classification models. By using these metrics in combination with other framework3 components, you can gain valuable insights into your model's strengths and weaknesses. The example demonstrates how easy it is to compute and interpret these metrics within the framework3 ecosystem, enabling you to make informed decisions about your classification models.