Skip to content

Classification Filters

ClassifierSVMPlugin

Bases: BaseFilter, BasePlugin

A plugin for Support Vector Machine (SVM) classification using scikit-learn's SVC.

This plugin integrates the SVC (Support Vector Classification) implementation from scikit-learn into the framework3 ecosystem, allowing for seamless use of SVM classification in pipelines and supporting hyperparameter tuning through grid search.

Key Features
  • Wraps scikit-learn's SVC for use within framework3
  • Supports various kernel types: linear, polynomial, RBF, and sigmoid
  • Allows customization of regularization parameter (C) and kernel coefficient (gamma)
  • Provides methods for fitting the model, making predictions, and generating parameter grids
Usage

The ClassifierSVMPlugin can be used to perform SVM classification on your data:

from framework3.base.base_types import XYData
import numpy as np

# Create sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 0, 1, 1])
X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
y_data = XYData(_hash='y_data', _path='/tmp', _value=y)

# Create and fit the SVM classifier
svm_plugin = ClassifierSVMPlugin(C=1.0, kernel='rbf', gamma='scale')
svm_plugin.fit(X_data, y_data)

# Make predictions
X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
predictions = svm_plugin.predict(X_test)
print(predictions.value)

# Generate parameter grid for hyperparameter tuning
grid_params = ClassifierSVMPlugin.item_grid(C=[0.1, 1, 10], kernel=['linear', 'rbf'], gamma=['scale', 'auto'])
print(grid_params)

Attributes:

Name Type Description
_model SVC

The underlying scikit-learn SVC model.

Methods:

Name Description
fit

XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]: Fit the SVM model to the given data.

predict

XYData) -> XYData: Make predictions using the fitted SVM model.

item_grid

List[float], kernel: List[L], gamma: List[float | Literal['scale', 'auto']]) -> Dict[str, List[Any]]: Generate a parameter grid for hyperparameter tuning.

Note

This plugin uses scikit-learn's implementation of SVM, which may have its own dependencies and requirements. Ensure that scikit-learn is properly installed and compatible with your environment.

Source code in framework3/plugins/filters/classification/svm.py
@Container.bind()
class ClassifierSVMPlugin(BaseFilter, BasePlugin):
    """
    A plugin for Support Vector Machine (SVM) classification using scikit-learn's SVC.

    This plugin integrates the SVC (Support Vector Classification) implementation from scikit-learn
    into the framework3 ecosystem, allowing for seamless use of SVM classification in pipelines
    and supporting hyperparameter tuning through grid search.

    Key Features:
        - Wraps scikit-learn's SVC for use within framework3
        - Supports various kernel types: linear, polynomial, RBF, and sigmoid
        - Allows customization of regularization parameter (C) and kernel coefficient (gamma)
        - Provides methods for fitting the model, making predictions, and generating parameter grids

    Usage:
        The ClassifierSVMPlugin can be used to perform SVM classification on your data:

        ```python
        from framework3.base.base_types import XYData
        import numpy as np

        # Create sample data
        X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
        y = np.array([0, 0, 1, 1])
        X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
        y_data = XYData(_hash='y_data', _path='/tmp', _value=y)

        # Create and fit the SVM classifier
        svm_plugin = ClassifierSVMPlugin(C=1.0, kernel='rbf', gamma='scale')
        svm_plugin.fit(X_data, y_data)

        # Make predictions
        X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
        predictions = svm_plugin.predict(X_test)
        print(predictions.value)

        # Generate parameter grid for hyperparameter tuning
        grid_params = ClassifierSVMPlugin.item_grid(C=[0.1, 1, 10], kernel=['linear', 'rbf'], gamma=['scale', 'auto'])
        print(grid_params)
        ```

    Attributes:
        _model (SVC): The underlying scikit-learn SVC model.

    Methods:
        fit(x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]:
            Fit the SVM model to the given data.
        predict(x: XYData) -> XYData:
            Make predictions using the fitted SVM model.
        item_grid(C: List[float], kernel: List[L], gamma: List[float | Literal['scale', 'auto']]) -> Dict[str, List[Any]]:
            Generate a parameter grid for hyperparameter tuning.

    Note:
        This plugin uses scikit-learn's implementation of SVM, which may have its own dependencies and requirements.
        Ensure that scikit-learn is properly installed and compatible with your environment.
    """

    def __init__(
        self,
        C: float = 1.0,
        gamma: float | Literal["scale", "auto"] = "scale",
        kernel: L = "linear",
    ) -> None:
        """
        Initialize a new ClassifierSVMPlugin instance.

        This constructor sets up the ClassifierSVMPlugin with the specified parameters and
        initializes the underlying scikit-learn SVC model.

        Args:
            C (float): Regularization parameter. Defaults to 1.0.
            gamma (float | Literal["scale", "auto"]): Kernel coefficient. Defaults to "scale".
            kernel (L): Specifies the kernel type to be used in the algorithm.
                        Can be 'linear', 'poly', 'rbf', or 'sigmoid'. Defaults to "linear".

        Note:
            The parameters are passed directly to scikit-learn's SVC.
            Refer to scikit-learn's documentation for detailed information on these parameters.
        """
        super().__init__(C=C, kernel=kernel, gamma=gamma)
        self._model = SVC(C=C, kernel=kernel, gamma=gamma)

    def fit(
        self, x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None
    ) -> Optional[float]:
        """
        Fit the SVM model to the given data.

        This method trains the SVM classifier on the provided input features and target values.

        Args:
            x (XYData): The input features for training.
            y (Optional[XYData]): The target values for training.
            evaluator (BaseMetric | None): An optional evaluator for the model. Not used in this method.

        Returns:
            Optional[float]: The score of the fitted model on the training data, or None if y is None.

        Note:
            This method uses scikit-learn's fit method internally.
            The score is calculated using scikit-learn's score method, which computes the mean accuracy.
        """
        if y is not None:
            self._model.fit(x.value, y.value)  # type: ignore
            return self._model.score(x.value, y.value)  # type: ignore
        return None

    def predict(self, x: XYData) -> XYData:
        """
        Make predictions using the fitted SVM model.

        This method uses the trained SVM classifier to make predictions on new input data.

        Args:
            x (XYData): The input features to predict.

        Returns:
            (XYData): The predicted values wrapped in an XYData object.

        Note:
            This method uses scikit-learn's predict method internally.
            The predictions are wrapped in an XYData object for consistency with the framework.
        """
        return XYData.mock(self._model.predict(x.value))

    @staticmethod
    def item_grid(
        C: List[float],
        kernel: List[L],
        gamma: List[float] | List[Literal["scale", "auto"]] = ["scale"],  # type: ignore[assignment]
    ) -> Dict[str, List[Any]]:
        """
        Generate a parameter grid for hyperparameter tuning.

        This static method provides a way to generate a grid of parameters for use in
        hyperparameter optimization techniques like grid search.

        Args:
            C (List[float]): List of regularization parameter values to try.
            kernel (List[L]): List of kernel types to try.
            gamma (List[float] | List[Literal['scale', 'auto']]): List of gamma values to try. Defaults to ["scale"].

        Returns:
            Dict[str, List[Any]]: A dictionary of parameter names and their possible values.

        Note:
            The returned dictionary can be used directly with hyperparameter tuning tools
            that accept parameter grids, such as scikit-learn's GridSearchCV.
        """
        return {
            "ClassifierSVMPlugin__C": C,
            "ClassifierSVMPlugin__kernel": kernel,
            "ClassifierSVMPlugin__gamma": gamma,
        }

__init__(C=1.0, gamma='scale', kernel='linear')

Initialize a new ClassifierSVMPlugin instance.

This constructor sets up the ClassifierSVMPlugin with the specified parameters and initializes the underlying scikit-learn SVC model.

Parameters:

Name Type Description Default
C float

Regularization parameter. Defaults to 1.0.

1.0
gamma float | Literal['scale', 'auto']

Kernel coefficient. Defaults to "scale".

'scale'
kernel L

Specifies the kernel type to be used in the algorithm. Can be 'linear', 'poly', 'rbf', or 'sigmoid'. Defaults to "linear".

'linear'
Note

The parameters are passed directly to scikit-learn's SVC. Refer to scikit-learn's documentation for detailed information on these parameters.

Source code in framework3/plugins/filters/classification/svm.py
def __init__(
    self,
    C: float = 1.0,
    gamma: float | Literal["scale", "auto"] = "scale",
    kernel: L = "linear",
) -> None:
    """
    Initialize a new ClassifierSVMPlugin instance.

    This constructor sets up the ClassifierSVMPlugin with the specified parameters and
    initializes the underlying scikit-learn SVC model.

    Args:
        C (float): Regularization parameter. Defaults to 1.0.
        gamma (float | Literal["scale", "auto"]): Kernel coefficient. Defaults to "scale".
        kernel (L): Specifies the kernel type to be used in the algorithm.
                    Can be 'linear', 'poly', 'rbf', or 'sigmoid'. Defaults to "linear".

    Note:
        The parameters are passed directly to scikit-learn's SVC.
        Refer to scikit-learn's documentation for detailed information on these parameters.
    """
    super().__init__(C=C, kernel=kernel, gamma=gamma)
    self._model = SVC(C=C, kernel=kernel, gamma=gamma)

fit(x, y, evaluator=None)

Fit the SVM model to the given data.

This method trains the SVM classifier on the provided input features and target values.

Parameters:

Name Type Description Default
x XYData

The input features for training.

required
y Optional[XYData]

The target values for training.

required
evaluator BaseMetric | None

An optional evaluator for the model. Not used in this method.

None

Returns:

Type Description
Optional[float]

Optional[float]: The score of the fitted model on the training data, or None if y is None.

Note

This method uses scikit-learn's fit method internally. The score is calculated using scikit-learn's score method, which computes the mean accuracy.

Source code in framework3/plugins/filters/classification/svm.py
def fit(
    self, x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None
) -> Optional[float]:
    """
    Fit the SVM model to the given data.

    This method trains the SVM classifier on the provided input features and target values.

    Args:
        x (XYData): The input features for training.
        y (Optional[XYData]): The target values for training.
        evaluator (BaseMetric | None): An optional evaluator for the model. Not used in this method.

    Returns:
        Optional[float]: The score of the fitted model on the training data, or None if y is None.

    Note:
        This method uses scikit-learn's fit method internally.
        The score is calculated using scikit-learn's score method, which computes the mean accuracy.
    """
    if y is not None:
        self._model.fit(x.value, y.value)  # type: ignore
        return self._model.score(x.value, y.value)  # type: ignore
    return None

item_grid(C, kernel, gamma=['scale']) staticmethod

Generate a parameter grid for hyperparameter tuning.

This static method provides a way to generate a grid of parameters for use in hyperparameter optimization techniques like grid search.

Parameters:

Name Type Description Default
C List[float]

List of regularization parameter values to try.

required
kernel List[L]

List of kernel types to try.

required
gamma List[float] | List[Literal['scale', 'auto']]

List of gamma values to try. Defaults to ["scale"].

['scale']

Returns:

Type Description
Dict[str, List[Any]]

Dict[str, List[Any]]: A dictionary of parameter names and their possible values.

Note

The returned dictionary can be used directly with hyperparameter tuning tools that accept parameter grids, such as scikit-learn's GridSearchCV.

Source code in framework3/plugins/filters/classification/svm.py
@staticmethod
def item_grid(
    C: List[float],
    kernel: List[L],
    gamma: List[float] | List[Literal["scale", "auto"]] = ["scale"],  # type: ignore[assignment]
) -> Dict[str, List[Any]]:
    """
    Generate a parameter grid for hyperparameter tuning.

    This static method provides a way to generate a grid of parameters for use in
    hyperparameter optimization techniques like grid search.

    Args:
        C (List[float]): List of regularization parameter values to try.
        kernel (List[L]): List of kernel types to try.
        gamma (List[float] | List[Literal['scale', 'auto']]): List of gamma values to try. Defaults to ["scale"].

    Returns:
        Dict[str, List[Any]]: A dictionary of parameter names and their possible values.

    Note:
        The returned dictionary can be used directly with hyperparameter tuning tools
        that accept parameter grids, such as scikit-learn's GridSearchCV.
    """
    return {
        "ClassifierSVMPlugin__C": C,
        "ClassifierSVMPlugin__kernel": kernel,
        "ClassifierSVMPlugin__gamma": gamma,
    }

predict(x)

Make predictions using the fitted SVM model.

This method uses the trained SVM classifier to make predictions on new input data.

Parameters:

Name Type Description Default
x XYData

The input features to predict.

required

Returns:

Type Description
XYData

The predicted values wrapped in an XYData object.

Note

This method uses scikit-learn's predict method internally. The predictions are wrapped in an XYData object for consistency with the framework.

Source code in framework3/plugins/filters/classification/svm.py
def predict(self, x: XYData) -> XYData:
    """
    Make predictions using the fitted SVM model.

    This method uses the trained SVM classifier to make predictions on new input data.

    Args:
        x (XYData): The input features to predict.

    Returns:
        (XYData): The predicted values wrapped in an XYData object.

    Note:
        This method uses scikit-learn's predict method internally.
        The predictions are wrapped in an XYData object for consistency with the framework.
    """
    return XYData.mock(self._model.predict(x.value))

KnnFilter

Bases: BaseFilter

A wrapper for scikit-learn's KNeighborsClassifier using the framework3 BaseFilter interface.

This filter implements the K-Nearest Neighbors algorithm for classification within the framework3 ecosystem.

Key Features
  • Integrates scikit-learn's KNeighborsClassifier with framework3
  • Supports various KNN parameters like number of neighbors, weights, and distance metrics
  • Provides methods for fitting the model and making predictions
  • Includes a static method for generating parameter grids for hyperparameter tuning
Usage

The KnnFilter can be used to perform K-Nearest Neighbors classification on your data:

from framework3.plugins.filters.classification.knn import KnnFilter
from framework3.base.base_types import XYData
import numpy as np

# Create sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 0, 1, 1])
X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
y_data = XYData(_hash='y_data', _path='/tmp', _value=y)

# Create and fit the KNN filter
knn = KnnFilter(n_neighbors=3, weights='uniform')
knn.fit(X_data, y_data)

# Make predictions
X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
predictions = knn.predict(X_test)
print(predictions.value)

Attributes:

Name Type Description
_clf KNeighborsClassifier

The underlying scikit-learn KNN classifier.

Methods:

Name Description
fit

XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]: Fit the KNN model to the given data.

predict

XYData) -> XYData: Make predictions using the fitted KNN model.

item_grid

Generate a parameter grid for hyperparameter tuning.

Note

This filter uses scikit-learn's implementation of KNN, which may have its own dependencies and requirements. Ensure that scikit-learn is properly installed and compatible with your environment.

Source code in framework3/plugins/filters/classification/knn.py
@Container.bind()
class KnnFilter(BaseFilter):
    """
    A wrapper for scikit-learn's KNeighborsClassifier using the framework3 BaseFilter interface.

    This filter implements the K-Nearest Neighbors algorithm for classification within the framework3 ecosystem.

    Key Features:
        - Integrates scikit-learn's KNeighborsClassifier with framework3
        - Supports various KNN parameters like number of neighbors, weights, and distance metrics
        - Provides methods for fitting the model and making predictions
        - Includes a static method for generating parameter grids for hyperparameter tuning

    Usage:
        The KnnFilter can be used to perform K-Nearest Neighbors classification on your data:

        ```python
        from framework3.plugins.filters.classification.knn import KnnFilter
        from framework3.base.base_types import XYData
        import numpy as np

        # Create sample data
        X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
        y = np.array([0, 0, 1, 1])
        X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
        y_data = XYData(_hash='y_data', _path='/tmp', _value=y)

        # Create and fit the KNN filter
        knn = KnnFilter(n_neighbors=3, weights='uniform')
        knn.fit(X_data, y_data)

        # Make predictions
        X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
        predictions = knn.predict(X_test)
        print(predictions.value)
        ```

    Attributes:
        _clf (KNeighborsClassifier): The underlying scikit-learn KNN classifier.

    Methods:
        fit(x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]:
            Fit the KNN model to the given data.
        predict(x: XYData) -> XYData:
            Make predictions using the fitted KNN model.
        item_grid(**kwargs) -> tuple[type[BaseFilter], Dict[str, List[Any]]]:
            Generate a parameter grid for hyperparameter tuning.

    Note:
        This filter uses scikit-learn's implementation of KNN, which may have its own dependencies and requirements.
        Ensure that scikit-learn is properly installed and compatible with your environment.
    """

    def __init__(
        self,
        n_neighbors: int = 5,
        weights: Literal["uniform", "distance"] = "uniform",
        algorithm: Literal["auto", "ball_tree", "kd_tree", "brute"] = "auto",
        leaf_size: int = 30,
        p: int = 2,
        metric: str = "minkowski",
        metric_params: Optional[Dict[str, Any]] = None,
        n_jobs: Optional[int] = None,
    ):
        """
        Initialize a new KnnFilter instance.

        This constructor sets up the KnnFilter with the specified parameters and
        initializes the underlying scikit-learn KNeighborsClassifier.

        Args:
            n_neighbors (int): Number of neighbors to use for knn. Defaults to 5.
            weights (Literal["uniform", "distance"]): Weight function used in prediction. Defaults to "uniform".
            algorithm (Literal["auto", "ball_tree", "kd_tree", "brute"]): Algorithm used to compute nearest neighbors. Defaults to "auto".
            leaf_size (int): Leaf size passed to BallTree or KDTree. Defaults to 30.
            p (int): Power parameter for the Minkowski metric. Defaults to 2 (Euclidean distance).
            metric (str): The distance metric to use for the tree. Defaults to "minkowski".
            metric_params (Optional[Dict[str, Any]]): Additional keyword arguments for the metric function. Defaults to None.
            n_jobs (Optional[int]): The number of parallel jobs to run for neighbors search. Defaults to None.

        Note:
            The parameters are passed directly to scikit-learn's KNeighborsClassifier.
            Refer to scikit-learn's documentation for detailed information on these parameters.
        """
        super().__init__(
            n_neighbors=n_neighbors,
            weights=weights,
            algorithm=algorithm,
            leaf_size=leaf_size,
            p=p,
            metric=metric,
            metric_params=metric_params,
            n_jobs=n_jobs,
        )
        self._clf = KNeighborsClassifier(
            n_neighbors=n_neighbors,
            weights=weights,
            algorithm=algorithm,
            leaf_size=leaf_size,
            p=p,
            metric=metric,
            metric_params=metric_params,
            n_jobs=n_jobs,
        )

    def fit(
        self, x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None
    ) -> Optional[float]:
        """
        Fit the KNN model to the given data.

        This method trains the KNN classifier on the provided input features and target values.

        Args:
            x (XYData): The input features for training.
            y (Optional[XYData]): The target values for training.
            evaluator (BaseMetric | None): An optional evaluator for the model. Not used in this method.

        Returns:
            Optional[float]: The score of the fitted model on the training data.

        Note:
            This method uses scikit-learn's fit method internally.
            The score is calculated using scikit-learn's score method, which computes the mean accuracy.
        """
        self._clf.fit(x.value, y.value)  # type: ignore
        return self._clf.score(x.value, y.value)  # type: ignore

    def predict(self, x: XYData) -> XYData:
        """
        Make predictions using the fitted KNN model.

        This method uses the trained KNN classifier to make predictions on new input data.

        Args:
            x (XYData): The input features to predict.

        Returns:
            XYData: The predicted values wrapped in an XYData object.

        Note:
            This method uses scikit-learn's predict method internally.
            The predictions are wrapped in an XYData object for consistency with the framework.
        """
        predictions = self._clf.predict(x.value)
        return XYData.mock(predictions)

    @staticmethod
    def item_grid(
        **kwargs: Dict[str, List[Any]],
    ) -> tuple[type[BaseFilter], Dict[str, List[Any]]]:
        """
        Generate a parameter grid for hyperparameter tuning.

        This static method provides a way to generate a grid of parameters for use in
        hyperparameter optimization techniques like grid search.

        Args:
            **kwargs (Dict[str, List[Any]]): Keyword arguments to override default parameter ranges.

        Returns:
            tuple[type[BaseFilter], Dict[str, List[Any]]]: A tuple containing the KnnFilter class
            and a dictionary of parameter names and their possible values.

        Note:
            The returned dictionary can be used directly with hyperparameter tuning tools
            that accept parameter grids, such as scikit-learn's GridSearchCV.
        """

        return KnnFilter, kwargs  # type: ignore

__init__(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)

Initialize a new KnnFilter instance.

This constructor sets up the KnnFilter with the specified parameters and initializes the underlying scikit-learn KNeighborsClassifier.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use for knn. Defaults to 5.

5
weights Literal['uniform', 'distance']

Weight function used in prediction. Defaults to "uniform".

'uniform'
algorithm Literal['auto', 'ball_tree', 'kd_tree', 'brute']

Algorithm used to compute nearest neighbors. Defaults to "auto".

'auto'
leaf_size int

Leaf size passed to BallTree or KDTree. Defaults to 30.

30
p int

Power parameter for the Minkowski metric. Defaults to 2 (Euclidean distance).

2
metric str

The distance metric to use for the tree. Defaults to "minkowski".

'minkowski'
metric_params Optional[Dict[str, Any]]

Additional keyword arguments for the metric function. Defaults to None.

None
n_jobs Optional[int]

The number of parallel jobs to run for neighbors search. Defaults to None.

None
Note

The parameters are passed directly to scikit-learn's KNeighborsClassifier. Refer to scikit-learn's documentation for detailed information on these parameters.

Source code in framework3/plugins/filters/classification/knn.py
def __init__(
    self,
    n_neighbors: int = 5,
    weights: Literal["uniform", "distance"] = "uniform",
    algorithm: Literal["auto", "ball_tree", "kd_tree", "brute"] = "auto",
    leaf_size: int = 30,
    p: int = 2,
    metric: str = "minkowski",
    metric_params: Optional[Dict[str, Any]] = None,
    n_jobs: Optional[int] = None,
):
    """
    Initialize a new KnnFilter instance.

    This constructor sets up the KnnFilter with the specified parameters and
    initializes the underlying scikit-learn KNeighborsClassifier.

    Args:
        n_neighbors (int): Number of neighbors to use for knn. Defaults to 5.
        weights (Literal["uniform", "distance"]): Weight function used in prediction. Defaults to "uniform".
        algorithm (Literal["auto", "ball_tree", "kd_tree", "brute"]): Algorithm used to compute nearest neighbors. Defaults to "auto".
        leaf_size (int): Leaf size passed to BallTree or KDTree. Defaults to 30.
        p (int): Power parameter for the Minkowski metric. Defaults to 2 (Euclidean distance).
        metric (str): The distance metric to use for the tree. Defaults to "minkowski".
        metric_params (Optional[Dict[str, Any]]): Additional keyword arguments for the metric function. Defaults to None.
        n_jobs (Optional[int]): The number of parallel jobs to run for neighbors search. Defaults to None.

    Note:
        The parameters are passed directly to scikit-learn's KNeighborsClassifier.
        Refer to scikit-learn's documentation for detailed information on these parameters.
    """
    super().__init__(
        n_neighbors=n_neighbors,
        weights=weights,
        algorithm=algorithm,
        leaf_size=leaf_size,
        p=p,
        metric=metric,
        metric_params=metric_params,
        n_jobs=n_jobs,
    )
    self._clf = KNeighborsClassifier(
        n_neighbors=n_neighbors,
        weights=weights,
        algorithm=algorithm,
        leaf_size=leaf_size,
        p=p,
        metric=metric,
        metric_params=metric_params,
        n_jobs=n_jobs,
    )

fit(x, y, evaluator=None)

Fit the KNN model to the given data.

This method trains the KNN classifier on the provided input features and target values.

Parameters:

Name Type Description Default
x XYData

The input features for training.

required
y Optional[XYData]

The target values for training.

required
evaluator BaseMetric | None

An optional evaluator for the model. Not used in this method.

None

Returns:

Type Description
Optional[float]

Optional[float]: The score of the fitted model on the training data.

Note

This method uses scikit-learn's fit method internally. The score is calculated using scikit-learn's score method, which computes the mean accuracy.

Source code in framework3/plugins/filters/classification/knn.py
def fit(
    self, x: XYData, y: Optional[XYData], evaluator: BaseMetric | None = None
) -> Optional[float]:
    """
    Fit the KNN model to the given data.

    This method trains the KNN classifier on the provided input features and target values.

    Args:
        x (XYData): The input features for training.
        y (Optional[XYData]): The target values for training.
        evaluator (BaseMetric | None): An optional evaluator for the model. Not used in this method.

    Returns:
        Optional[float]: The score of the fitted model on the training data.

    Note:
        This method uses scikit-learn's fit method internally.
        The score is calculated using scikit-learn's score method, which computes the mean accuracy.
    """
    self._clf.fit(x.value, y.value)  # type: ignore
    return self._clf.score(x.value, y.value)  # type: ignore

item_grid(**kwargs) staticmethod

Generate a parameter grid for hyperparameter tuning.

This static method provides a way to generate a grid of parameters for use in hyperparameter optimization techniques like grid search.

Parameters:

Name Type Description Default
**kwargs Dict[str, List[Any]]

Keyword arguments to override default parameter ranges.

{}

Returns:

Type Description
type[BaseFilter]

tuple[type[BaseFilter], Dict[str, List[Any]]]: A tuple containing the KnnFilter class

Dict[str, List[Any]]

and a dictionary of parameter names and their possible values.

Note

The returned dictionary can be used directly with hyperparameter tuning tools that accept parameter grids, such as scikit-learn's GridSearchCV.

Source code in framework3/plugins/filters/classification/knn.py
@staticmethod
def item_grid(
    **kwargs: Dict[str, List[Any]],
) -> tuple[type[BaseFilter], Dict[str, List[Any]]]:
    """
    Generate a parameter grid for hyperparameter tuning.

    This static method provides a way to generate a grid of parameters for use in
    hyperparameter optimization techniques like grid search.

    Args:
        **kwargs (Dict[str, List[Any]]): Keyword arguments to override default parameter ranges.

    Returns:
        tuple[type[BaseFilter], Dict[str, List[Any]]]: A tuple containing the KnnFilter class
        and a dictionary of parameter names and their possible values.

    Note:
        The returned dictionary can be used directly with hyperparameter tuning tools
        that accept parameter grids, such as scikit-learn's GridSearchCV.
    """

    return KnnFilter, kwargs  # type: ignore

predict(x)

Make predictions using the fitted KNN model.

This method uses the trained KNN classifier to make predictions on new input data.

Parameters:

Name Type Description Default
x XYData

The input features to predict.

required

Returns:

Name Type Description
XYData XYData

The predicted values wrapped in an XYData object.

Note

This method uses scikit-learn's predict method internally. The predictions are wrapped in an XYData object for consistency with the framework.

Source code in framework3/plugins/filters/classification/knn.py
def predict(self, x: XYData) -> XYData:
    """
    Make predictions using the fitted KNN model.

    This method uses the trained KNN classifier to make predictions on new input data.

    Args:
        x (XYData): The input features to predict.

    Returns:
        XYData: The predicted values wrapped in an XYData object.

    Note:
        This method uses scikit-learn's predict method internally.
        The predictions are wrapped in an XYData object for consistency with the framework.
    """
    predictions = self._clf.predict(x.value)
    return XYData.mock(predictions)

Overview

The Classification Filters module in framework3 provides a collection of powerful classification algorithms that can be easily integrated into your machine learning pipelines. These filters are designed to work seamlessly with the framework3 ecosystem, providing a consistent interface and enhanced functionality.

Available Classifiers

SVM Classifier

The Support Vector Machine (SVM) classifier is implemented in the ClassifierSVMPlugin. This versatile classifier is effective for both linear and non-linear classification tasks.

Usage

from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin

svm_classifier = ClassifierSVMPlugin(C=1.0, kernel='rbf', gamma='scale')

Parameters

  • C (float): Regularization parameter. The strength of the regularization is inversely proportional to C.
  • kernel (str): The kernel type to be used in the algorithm. Options include 'linear', 'poly', 'rbf', and 'sigmoid'.
  • gamma (str or float): Kernel coefficient for 'rbf', 'poly' and 'sigmoid' kernels.

K-Nearest Neighbors Classifier

The K-Nearest Neighbors (KNN) classifier is implemented in the KnnFilter. This simple yet effective classifier is based on the principle of finding the K nearest neighbors to make predictions.

Usage

from framework3.plugins.filters.classification.knn import KnnFilter

knn_classifier = KnnFilter(n_neighbors=5, weights='uniform')

Parameters

  • n_neighbors (int): Number of neighbors to use for kneighbors queries.
  • weights (str): Weight function used in prediction. Options are 'uniform' (all points in each neighborhood are weighted equally) or 'distance' (weight points by the inverse of their distance).

Comprehensive Example: Iris Dataset Classification

In this example, we'll demonstrate how to use the Classification Filters with the Iris dataset, showcasing both SVM and KNN classifiers, as well as integration with GridSearchCV.

from framework3.plugins.pipelines.gs_cv_pipeline import GridSearchCVPipeline
from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin
from framework3.plugins.filters.classification.knn import KnnFilter
from framework3.base.base_types import XYData
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create XYData objects
X_train_data = XYData(_hash='X_train', _path='/tmp', _value=X_train)
y_train_data = XYData(_hash='y_train', _path='/tmp', _value=y_train)
X_test_data = XYData(_hash='X_test', _path='/tmp', _value=X_test)
y_test_data = XYData(_hash='y_test', _path='/tmp', _value=y_test)

# Create a pipeline with SVM classifier
svm_pipeline = GridSearchCVPipeline(
    filterx=[ClassifierSVMPlugin],
    param_grid=ClassifierSVMPlugin.item_grid(C=[0.1, 1, 10], kernel=['linear', 'rbf']),
    scoring='accuracy',
    cv=5
)

# Fit the SVM pipeline
svm_pipeline.fit(X_train_data, y_train_data)

# Make predictions with SVM
svm_predictions = svm_pipeline.predict(X_test_data)
print("SVM Predictions:", svm_predictions.value)

# Create a pipeline with KNN classifier
knn_pipeline = GridSearchCVPipeline(
    filterx=[KnnFilter],
    param_grid=KnnFilter.item_grid(n_neighbors=[3, 5, 7], weights=['uniform', 'distance']),
    scoring='accuracy',
    cv=5
)

# Fit the KNN pipeline
knn_pipeline.fit(X_train_data, y_train_data)

# Make predictions with KNN
knn_predictions = knn_pipeline.predict(X_test_data)
print("KNN Predictions:", knn_predictions.value)

# Evaluate the models
from sklearn.metrics import accuracy_score

svm_accuracy = accuracy_score(y_test, svm_predictions.value)
knn_accuracy = accuracy_score(y_test, knn_predictions.value)

print("SVM Accuracy:", svm_accuracy)
print("KNN Accuracy:", knn_accuracy)

This example demonstrates how to:

  1. Load and prepare the Iris dataset
  2. Create XYData objects for use with framework3
  3. Set up GridSearchCV pipelines for both SVM and KNN classifiers
  4. Fit the models and make predictions
  5. Evaluate the models using accuracy scores

Best Practices

  1. Data Preprocessing: Ensure your data is properly preprocessed before applying classification filters. This may include scaling, normalization, or handling missing values.

  2. Hyperparameter Tuning: Use GridSearchCVPipeline to find the optimal hyperparameters for your chosen classifier, as demonstrated in the example.

  3. Model Evaluation: Always evaluate your model's performance using appropriate metrics and cross-validation techniques. In the example, we used accuracy, but consider other metrics like precision, recall, or F1-score depending on your specific problem.

  4. Feature Selection: Consider applying feature selection techniques to improve model performance and reduce overfitting, especially when dealing with high-dimensional datasets.

  5. Ensemble Methods: Experiment with combining multiple classifiers to create ensemble models, which can often lead to improved performance.

Conclusion

The Classification Filters module in framework3 provides a robust set of tools for tackling various classification tasks. By leveraging these filters in combination with other framework3 components, you can build powerful and efficient machine learning pipelines. The example with the Iris dataset demonstrates how easy it is to use these classifiers and integrate them with GridSearchCV for hyperparameter tuning.