Skip to content

Classification Metrics

F1

Bases: BaseMetric

F1 score metric for classification tasks.

This class calculates the F1 score, which is the harmonic mean of precision and recall. It's particularly useful when you need a balance between precision and recall.

Key Features
  • Calculates F1 score for binary and multiclass classification
  • Supports different averaging methods (micro, macro, weighted, etc.)
  • Integrates with framework3's BaseMetric interface
Usage

The F1 metric can be used to evaluate classification models:

from framework3.plugins.metrics.classification import F1
from framework3.base.base_types import XYData
import numpy as np

# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

# Create and use the F1 metric
f1_metric = F1(average='macro')
score = f1_metric.evaluate(x_data, y_true, y_pred)
print(f"F1 Score: {score}")

Attributes:

Name Type Description
average str

The type of averaging performed on the data. Default is 'weighted'.

Methods:

Name Description
evaluate

XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the F1 score for the given predictions and true values.

Note

This metric uses scikit-learn's f1_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.

Source code in framework3/plugins/metrics/classification.py
@Container.bind()
class F1(BaseMetric):
    """
    F1 score metric for classification tasks.

    This class calculates the F1 score, which is the harmonic mean of precision and recall.
    It's particularly useful when you need a balance between precision and recall.

    Key Features:
        - Calculates F1 score for binary and multiclass classification
        - Supports different averaging methods (micro, macro, weighted, etc.)
        - Integrates with framework3's BaseMetric interface

    Usage:
        The F1 metric can be used to evaluate classification models:

        ```python
        from framework3.plugins.metrics.classification import F1
        from framework3.base.base_types import XYData
        import numpy as np

        # Create sample data
        y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
        y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
        x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

        # Create and use the F1 metric
        f1_metric = F1(average='macro')
        score = f1_metric.evaluate(x_data, y_true, y_pred)
        print(f"F1 Score: {score}")
        ```

    Attributes:
        average (str): The type of averaging performed on the data. Default is 'weighted'.

    Methods:
        evaluate(x_data: XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray:
            Calculate the F1 score for the given predictions and true values.

    Note:
        This metric uses scikit-learn's f1_score function internally. Ensure that scikit-learn
        is properly installed and compatible with your environment.
    """

    def __init__(
        self,
        average: Literal[
            "micro", "macro", "samples", "weighted", "binary"
        ] = "weighted",
    ):
        """
        Initialize a new F1 metric instance.

        This constructor sets up the F1 metric with the specified averaging method.

        Args:
            average (Literal['micro', 'macro', 'samples', 'weighted', 'binary']): The type of averaging performed on the data. Default is 'weighted'.
                           Other options include 'micro', 'macro', 'samples', 'binary', or None.

        Note:
            The 'average' parameter is passed directly to scikit-learn's f1_score function.
            Refer to scikit-learn's documentation for detailed information on averaging methods.
        """
        super().__init__(average=average)
        self.average = average

    def evaluate(
        self,
        x_data: XYData,
        y_true: XYData | None,
        y_pred: XYData,
        **kwargs: Unpack[PrecissionKwargs],
    ) -> Float | np.ndarray:
        """
        Calculate the F1 score for the given predictions and true values.

        This method computes the F1 score, which is the harmonic mean of precision and recall.

        Args:
            x_data (XYData): The input data (not used in this metric, but required by the interface).
            y_true (XYData | None): The ground truth (correct) target values.
            y_pred (XYData): The estimated targets as returned by a classifier.
            **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's f1_score function.

        Returns:
            Float | np.ndarray: The F1 score or array of F1 scores if average is None.

        Raises:
            ValueError: If y_true is None.

        Note:
            This method uses scikit-learn's f1_score function internally with zero_division=0.
        """
        if y_true is None:
            raise ValueError("Ground truth (y_true) must be provided.")

        kwargs.setdefault(
            "average",
            cast(
                Literal["micro", "macro", "samples", "weighted", "binary"], self.average
            ),
        )
        kwargs.setdefault("zero_division", 0)

        return f1_score(
            y_true.value,
            y_pred.value,
            **kwargs,
        )  # type: ignore

__init__(average='weighted')

Initialize a new F1 metric instance.

This constructor sets up the F1 metric with the specified averaging method.

Parameters:

Name Type Description Default
average Literal['micro', 'macro', 'samples', 'weighted', 'binary']

The type of averaging performed on the data. Default is 'weighted'. Other options include 'micro', 'macro', 'samples', 'binary', or None.

'weighted'
Note

The 'average' parameter is passed directly to scikit-learn's f1_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.

Source code in framework3/plugins/metrics/classification.py
def __init__(
    self,
    average: Literal[
        "micro", "macro", "samples", "weighted", "binary"
    ] = "weighted",
):
    """
    Initialize a new F1 metric instance.

    This constructor sets up the F1 metric with the specified averaging method.

    Args:
        average (Literal['micro', 'macro', 'samples', 'weighted', 'binary']): The type of averaging performed on the data. Default is 'weighted'.
                       Other options include 'micro', 'macro', 'samples', 'binary', or None.

    Note:
        The 'average' parameter is passed directly to scikit-learn's f1_score function.
        Refer to scikit-learn's documentation for detailed information on averaging methods.
    """
    super().__init__(average=average)
    self.average = average

evaluate(x_data, y_true, y_pred, **kwargs)

Calculate the F1 score for the given predictions and true values.

This method computes the F1 score, which is the harmonic mean of precision and recall.

Parameters:

Name Type Description Default
x_data XYData

The input data (not used in this metric, but required by the interface).

required
y_true XYData | None

The ground truth (correct) target values.

required
y_pred XYData

The estimated targets as returned by a classifier.

required
**kwargs Unpack[PrecissionKwargs]

Additional keyword arguments passed to sklearn's f1_score function.

{}

Returns:

Type Description
Float | ndarray

Float | np.ndarray: The F1 score or array of F1 scores if average is None.

Raises:

Type Description
ValueError

If y_true is None.

Note

This method uses scikit-learn's f1_score function internally with zero_division=0.

Source code in framework3/plugins/metrics/classification.py
def evaluate(
    self,
    x_data: XYData,
    y_true: XYData | None,
    y_pred: XYData,
    **kwargs: Unpack[PrecissionKwargs],
) -> Float | np.ndarray:
    """
    Calculate the F1 score for the given predictions and true values.

    This method computes the F1 score, which is the harmonic mean of precision and recall.

    Args:
        x_data (XYData): The input data (not used in this metric, but required by the interface).
        y_true (XYData | None): The ground truth (correct) target values.
        y_pred (XYData): The estimated targets as returned by a classifier.
        **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's f1_score function.

    Returns:
        Float | np.ndarray: The F1 score or array of F1 scores if average is None.

    Raises:
        ValueError: If y_true is None.

    Note:
        This method uses scikit-learn's f1_score function internally with zero_division=0.
    """
    if y_true is None:
        raise ValueError("Ground truth (y_true) must be provided.")

    kwargs.setdefault(
        "average",
        cast(
            Literal["micro", "macro", "samples", "weighted", "binary"], self.average
        ),
    )
    kwargs.setdefault("zero_division", 0)

    return f1_score(
        y_true.value,
        y_pred.value,
        **kwargs,
    )  # type: ignore

Precission

Bases: BaseMetric

Precision metric for classification tasks.

This class calculates the precision score, which is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives.

Key Features
  • Calculates precision score for binary and multiclass classification
  • Supports different averaging methods (micro, macro, weighted, etc.)
  • Integrates with framework3's BaseMetric interface
Usage

The Precission metric can be used to evaluate classification models:

from framework3.plugins.metrics.classification import Precission
from framework3.base.base_types import XYData
import numpy as np

# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

# Create and use the Precission metric
precision_metric = Precission(average='macro')
score = precision_metric.evaluate(x_data, y_true, y_pred)
print(f"Precision Score: {score}")

Attributes:

Name Type Description
average Literal['micro', 'macro', 'samples', 'weighted', 'binary'] | None

The type of averaging performed on the data. Default is 'weighted'.

Methods:

Name Description
evaluate

XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the precision score for the given predictions and true values.

Note

This metric uses scikit-learn's precision_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.

Source code in framework3/plugins/metrics/classification.py
@Container.bind()
class Precission(BaseMetric):
    """
    Precision metric for classification tasks.

    This class calculates the precision score, which is the ratio tp / (tp + fp) where tp is
    the number of true positives and fp the number of false positives.

    Key Features:
        - Calculates precision score for binary and multiclass classification
        - Supports different averaging methods (micro, macro, weighted, etc.)
        - Integrates with framework3's BaseMetric interface

    Usage:
        The Precission metric can be used to evaluate classification models:

        ```python
        from framework3.plugins.metrics.classification import Precission
        from framework3.base.base_types import XYData
        import numpy as np

        # Create sample data
        y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
        y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
        x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

        # Create and use the Precission metric
        precision_metric = Precission(average='macro')
        score = precision_metric.evaluate(x_data, y_true, y_pred)
        print(f"Precision Score: {score}")
        ```

    Attributes:
        average (Literal["micro", "macro", "samples", "weighted", "binary"]|None): The type of averaging performed on the data. Default is 'weighted'.

    Methods:
        evaluate (x_data: XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray:
            Calculate the precision score for the given predictions and true values.

    Note:
        This metric uses scikit-learn's precision_score function internally. Ensure that scikit-learn
        is properly installed and compatible with your environment.
    """

    def __init__(
        self,
        average: Literal["micro", "macro", "samples", "weighted", "binary"]
        | None = "weighted",
    ):
        """
        Initialize a new Precission metric instance.

        This constructor sets up the Precission metric with the specified averaging method.

        Args:
            average (Literal["micro", "macro", "samples", "weighted", "binary"]|None): The type of averaging performed on the data. Default is 'weighted'.
                                  Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

        Note:
            The 'average' parameter is passed directly to scikit-learn's precision_score function.
            Refer to scikit-learn's documentation for detailed information on averaging methods.
        """
        super().__init__(average=average)

    def evaluate(
        self,
        x_data: XYData,
        y_true: XYData | None,
        y_pred: XYData,
        **kwargs: Unpack[PrecissionKwargs],
    ) -> Float | np.ndarray:
        """
        Calculate the precision score for the given predictions and true values.

        This method computes the precision score, which is the ratio of true positives to the
        sum of true and false positives.

        Args:
            x_data (XYData): The input data (not used in this metric, but required by the interface).
            y_true (XYData | None): The ground truth (correct) target values.
            y_pred (XYData): The estimated targets as returned by a classifier.
            **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's precision_score function.

        Returns:
            Float | np.ndarray: The precision score or array of precision scores if average is None.

        Raises:
            ValueError: If y_true is None.

        Note:
            This method uses scikit-learn's precision_score function internally with zero_division=0.
        """
        if y_true is None:
            raise ValueError("Ground truth (y_true) must be provided.")
        return precision_score(
            y_true.value,
            y_pred.value,
            zero_division=0,
            average=self.average,
            **kwargs,  # type: ignore
        )  # type: ignore

__init__(average='weighted')

Initialize a new Precission metric instance.

This constructor sets up the Precission metric with the specified averaging method.

Parameters:

Name Type Description Default
average Literal['micro', 'macro', 'samples', 'weighted', 'binary'] | None

The type of averaging performed on the data. Default is 'weighted'. Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

'weighted'
Note

The 'average' parameter is passed directly to scikit-learn's precision_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.

Source code in framework3/plugins/metrics/classification.py
def __init__(
    self,
    average: Literal["micro", "macro", "samples", "weighted", "binary"]
    | None = "weighted",
):
    """
    Initialize a new Precission metric instance.

    This constructor sets up the Precission metric with the specified averaging method.

    Args:
        average (Literal["micro", "macro", "samples", "weighted", "binary"]|None): The type of averaging performed on the data. Default is 'weighted'.
                              Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

    Note:
        The 'average' parameter is passed directly to scikit-learn's precision_score function.
        Refer to scikit-learn's documentation for detailed information on averaging methods.
    """
    super().__init__(average=average)

evaluate(x_data, y_true, y_pred, **kwargs)

Calculate the precision score for the given predictions and true values.

This method computes the precision score, which is the ratio of true positives to the sum of true and false positives.

Parameters:

Name Type Description Default
x_data XYData

The input data (not used in this metric, but required by the interface).

required
y_true XYData | None

The ground truth (correct) target values.

required
y_pred XYData

The estimated targets as returned by a classifier.

required
**kwargs Unpack[PrecissionKwargs]

Additional keyword arguments passed to sklearn's precision_score function.

{}

Returns:

Type Description
Float | ndarray

Float | np.ndarray: The precision score or array of precision scores if average is None.

Raises:

Type Description
ValueError

If y_true is None.

Note

This method uses scikit-learn's precision_score function internally with zero_division=0.

Source code in framework3/plugins/metrics/classification.py
def evaluate(
    self,
    x_data: XYData,
    y_true: XYData | None,
    y_pred: XYData,
    **kwargs: Unpack[PrecissionKwargs],
) -> Float | np.ndarray:
    """
    Calculate the precision score for the given predictions and true values.

    This method computes the precision score, which is the ratio of true positives to the
    sum of true and false positives.

    Args:
        x_data (XYData): The input data (not used in this metric, but required by the interface).
        y_true (XYData | None): The ground truth (correct) target values.
        y_pred (XYData): The estimated targets as returned by a classifier.
        **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's precision_score function.

    Returns:
        Float | np.ndarray: The precision score or array of precision scores if average is None.

    Raises:
        ValueError: If y_true is None.

    Note:
        This method uses scikit-learn's precision_score function internally with zero_division=0.
    """
    if y_true is None:
        raise ValueError("Ground truth (y_true) must be provided.")
    return precision_score(
        y_true.value,
        y_pred.value,
        zero_division=0,
        average=self.average,
        **kwargs,  # type: ignore
    )  # type: ignore

Recall

Bases: BaseMetric

Recall metric for classification tasks.

This class calculates the recall score, which is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives.

Key Features
  • Calculates recall score for binary and multiclass classification
  • Supports different averaging methods (micro, macro, weighted, etc.)
  • Integrates with framework3's BaseMetric interface
Usage

The Recall metric can be used to evaluate classification models:

from framework3.plugins.metrics.classification import Recall
from framework3.base.base_types import XYData
import numpy as np

# Create sample data
y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

# Create and use the Recall metric
recall_metric = Recall(average='macro')
score = recall_metric.evaluate(x_data, y_true, y_pred)
print(f"Recall Score: {score}")

Attributes:

Name Type Description
average str | None

The type of averaging performed on the data. Default is 'weighted'.

Methods:

Name Description
evaluate

XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray: Calculate the recall score for the given predictions and true values.

Note

This metric uses scikit-learn's recall_score function internally. Ensure that scikit-learn is properly installed and compatible with your environment.

Source code in framework3/plugins/metrics/classification.py
@Container.bind()
class Recall(BaseMetric):
    """
    Recall metric for classification tasks.

    This class calculates the recall score, which is the ratio tp / (tp + fn) where tp is
    the number of true positives and fn the number of false negatives.

    Key Features:
        - Calculates recall score for binary and multiclass classification
        - Supports different averaging methods (micro, macro, weighted, etc.)
        - Integrates with framework3's BaseMetric interface

    Usage:
        The Recall metric can be used to evaluate classification models:

        ```python
        from framework3.plugins.metrics.classification import Recall
        from framework3.base.base_types import XYData
        import numpy as np

        # Create sample data
        y_true = XYData(value=np.array([0, 1, 2, 0, 1, 2]))
        y_pred = XYData(value=np.array([0, 2, 1, 0, 0, 1]))
        x_data = XYData(value=np.array([1, 2, 3, 4, 5, 6]))

        # Create and use the Recall metric
        recall_metric = Recall(average='macro')
        score = recall_metric.evaluate(x_data, y_true, y_pred)
        print(f"Recall Score: {score}")
        ```

    Attributes:
        average (str | None): The type of averaging performed on the data. Default is 'weighted'.

    Methods:
        evaluate(x_data: XYData, y_true: XYData | None, y_pred: XYData, **kwargs) -> Float | np.ndarray:
            Calculate the recall score for the given predictions and true values.

    Note:
        This metric uses scikit-learn's recall_score function internally. Ensure that scikit-learn
        is properly installed and compatible with your environment.
    """

    def __init__(
        self,
        average: Literal["micro", "macro", "samples", "weighted", "binary"]
        | None = "weighted",
    ):
        """
        Initialize a new Recall metric instance.

        This constructor sets up the Recall metric with the specified averaging method.

        Args:
            average (str | None): The type of averaging performed on the data. Default is 'weighted'.
                                  Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

        Note:
            The 'average' parameter is passed directly to scikit-learn's recall_score function.
            Refer to scikit-learn's documentation for detailed information on averaging methods.
        """
        super().__init__(average=average)

    def evaluate(
        self,
        x_data: XYData,
        y_true: XYData | None,
        y_pred: XYData,
        **kwargs: Unpack[PrecissionKwargs],
    ) -> Float | np.ndarray:
        """
        Calculate the recall score for the given predictions and true values.

        This method computes the recall score, which is the ratio of true positives to the
        sum of true positives and false negatives.

        Args:
            x_data (XYData): The input data (not used in this metric, but required by the interface).
            y_true (XYData | None): The ground truth (correct) target values.
            y_pred (XYData): The estimated targets as returned by a classifier.
            **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's recall_score function.

        Returns:
            Float | np.ndarray: The recall score or array of recall scores if average is None.

        Raises:
            ValueError: If y_true is None.

        Note:
            This method uses scikit-learn's recall_score function internally with zero_division=0.
        """
        if y_true is None:
            raise ValueError("Ground truth (y_true) must be provided.")
        return recall_score(
            y_true.value,
            y_pred.value,
            zero_division=0,
            average=self.average,
            **kwargs,  # type: ignore
        )  # type: ignore

__init__(average='weighted')

Initialize a new Recall metric instance.

This constructor sets up the Recall metric with the specified averaging method.

Parameters:

Name Type Description Default
average str | None

The type of averaging performed on the data. Default is 'weighted'. Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

'weighted'
Note

The 'average' parameter is passed directly to scikit-learn's recall_score function. Refer to scikit-learn's documentation for detailed information on averaging methods.

Source code in framework3/plugins/metrics/classification.py
def __init__(
    self,
    average: Literal["micro", "macro", "samples", "weighted", "binary"]
    | None = "weighted",
):
    """
    Initialize a new Recall metric instance.

    This constructor sets up the Recall metric with the specified averaging method.

    Args:
        average (str | None): The type of averaging performed on the data. Default is 'weighted'.
                              Options are 'micro', 'macro', 'samples', 'weighted', 'binary', or None.

    Note:
        The 'average' parameter is passed directly to scikit-learn's recall_score function.
        Refer to scikit-learn's documentation for detailed information on averaging methods.
    """
    super().__init__(average=average)

evaluate(x_data, y_true, y_pred, **kwargs)

Calculate the recall score for the given predictions and true values.

This method computes the recall score, which is the ratio of true positives to the sum of true positives and false negatives.

Parameters:

Name Type Description Default
x_data XYData

The input data (not used in this metric, but required by the interface).

required
y_true XYData | None

The ground truth (correct) target values.

required
y_pred XYData

The estimated targets as returned by a classifier.

required
**kwargs Unpack[PrecissionKwargs]

Additional keyword arguments passed to sklearn's recall_score function.

{}

Returns:

Type Description
Float | ndarray

Float | np.ndarray: The recall score or array of recall scores if average is None.

Raises:

Type Description
ValueError

If y_true is None.

Note

This method uses scikit-learn's recall_score function internally with zero_division=0.

Source code in framework3/plugins/metrics/classification.py
def evaluate(
    self,
    x_data: XYData,
    y_true: XYData | None,
    y_pred: XYData,
    **kwargs: Unpack[PrecissionKwargs],
) -> Float | np.ndarray:
    """
    Calculate the recall score for the given predictions and true values.

    This method computes the recall score, which is the ratio of true positives to the
    sum of true positives and false negatives.

    Args:
        x_data (XYData): The input data (not used in this metric, but required by the interface).
        y_true (XYData | None): The ground truth (correct) target values.
        y_pred (XYData): The estimated targets as returned by a classifier.
        **kwargs (Unpack[PrecissionKwargs]): Additional keyword arguments passed to sklearn's recall_score function.

    Returns:
        Float | np.ndarray: The recall score or array of recall scores if average is None.

    Raises:
        ValueError: If y_true is None.

    Note:
        This method uses scikit-learn's recall_score function internally with zero_division=0.
    """
    if y_true is None:
        raise ValueError("Ground truth (y_true) must be provided.")
    return recall_score(
        y_true.value,
        y_pred.value,
        zero_division=0,
        average=self.average,
        **kwargs,  # type: ignore
    )  # type: ignore

Overview

The Classification Metrics module in framework3 provides a set of evaluation metrics specifically designed for assessing the performance of classification models. These metrics help in understanding various aspects of a classifier's performance, such as accuracy, precision, recall, and F1-score.

Available Classification Metrics

Accuracy Score

The Accuracy Score is implemented in the AccuracyScoreMetric. It computes the accuracy of a classification model by comparing the predicted labels with the true labels.

Usage

from framework3.plugins.metrics.classification.accuracy_score import AccuracyScoreMetric

accuracy_metric = AccuracyScoreMetric()
score = accuracy_metric.compute(y_true, y_pred)

Precision Score

The Precision Score is implemented in the PrecisionScoreMetric. It computes the precision of a classification model, which is the ratio of true positive predictions to the total number of positive predictions.

Usage

from framework3.plugins.metrics.classification.precision_score import PrecisionScoreMetric

precision_metric = PrecisionScoreMetric(average='weighted')
score = precision_metric.compute(y_true, y_pred)

Parameters

  • average (str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.

Recall Score

The Recall Score is implemented in the RecallScoreMetric. It computes the recall of a classification model, which is the ratio of true positive predictions to the total number of actual positive instances.

Usage

from framework3.plugins.metrics.classification.recall_score import RecallScoreMetric

recall_metric = RecallScoreMetric(average='weighted')
score = recall_metric.compute(y_true, y_pred)

Parameters

  • average (str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.

F1 Score

The F1 Score is implemented in the F1ScoreMetric. It computes the F1 score, which is the harmonic mean of precision and recall.

Usage

from framework3.plugins.metrics.classification.f1_score import F1ScoreMetric

f1_metric = F1ScoreMetric(average='weighted')
score = f1_metric.compute(y_true, y_pred)

Parameters

  • average (str): The averaging method. Options include 'micro', 'macro', 'weighted', 'samples', and None.

Comprehensive Example: Evaluating a Classification Model

In this example, we'll demonstrate how to use the Classification Metrics to evaluate the performance of a classification model.

from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin
from framework3.plugins.metrics.classification.accuracy_score import AccuracyScoreMetric
from framework3.plugins.metrics.classification.precision_score import PrecisionScoreMetric
from framework3.plugins.metrics.classification.recall_score import RecallScoreMetric
from framework3.plugins.metrics.classification.f1_score import F1ScoreMetric
from framework3.base.base_types import XYData
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create XYData objects
X_train_data = XYData(_hash='X_train', _path='/tmp', _value=X_train)
y_train_data = XYData(_hash='y_train', _path='/tmp', _value=y_train)
X_test_data = XYData(_hash='X_test', _path='/tmp', _value=X_test)
y_test_data = XYData(_hash='y_test', _path='/tmp', _value=y_test)

# Create and train the classifier
classifier = ClassifierSVMPlugin(kernel='rbf')
classifier.fit(X_train_data, y_train_data)

# Make predictions
predictions = classifier.predict(X_test_data)

# Initialize metrics
accuracy_metric = AccuracyScoreMetric()
precision_metric = PrecisionScoreMetric(average='weighted')
recall_metric = RecallScoreMetric(average='weighted')
f1_metric = F1ScoreMetric(average='weighted')

# Compute metrics
accuracy = accuracy_metric.compute(y_test_data, predictions)
precision = precision_metric.compute(y_test_data, predictions)
recall = recall_metric.compute(y_test_data, predictions)
f1 = f1_metric.compute(y_test_data, predictions)

# Print results
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")

This example demonstrates how to:

  1. Load and prepare the Iris dataset
  2. Create XYData objects for use with framework3
  3. Train an SVM classifier
  4. Make predictions on the test set
  5. Initialize and compute various classification metrics
  6. Print the evaluation results

Best Practices

  1. Multiple Metrics: Use multiple metrics to get a comprehensive view of your model's performance. Different metrics capture different aspects of classification performance.

  2. Class Imbalance: Be aware of class imbalance in your dataset. In such cases, accuracy alone might not be a good metric. Consider using precision, recall, and F1-score.

  3. Averaging Methods: When dealing with multi-class classification, pay attention to the averaging method used in metrics like precision, recall, and F1-score. 'Weighted' average is often a good choice for imbalanced datasets.

  4. Cross-Validation: Use cross-validation to get a more robust estimate of your model's performance, especially with smaller datasets.

  5. Confusion Matrix: Consider using a confusion matrix in addition to these metrics for a more detailed view of your model's performance across different classes.

  6. ROC AUC: For binary classification problems, consider using the ROC AUC score as an additional metric.

  7. Threshold Adjustment: Remember that metrics like precision and recall can be affected by adjusting the classification threshold. Consider exploring different thresholds if needed.

  8. Domain-Specific Metrics: Depending on your specific problem, you might need to implement custom metrics that are more relevant to your domain.

Conclusion

The Classification Metrics module in framework3 provides essential tools for evaluating the performance of classification models. By using these metrics in combination with other framework3 components, you can gain valuable insights into your model's strengths and weaknesses. The example demonstrates how easy it is to compute and interpret these metrics within the framework3 ecosystem, enabling you to make informed decisions about your classification models.