Classification Filters¶
ClassifierSVMPlugin
¶
Bases: BaseFilter
, BasePlugin
A plugin for Support Vector Machine (SVM) classification using scikit-learn's SVC.
This plugin integrates the SVC (Support Vector Classification) implementation from scikit-learn into the framework3 ecosystem, allowing for seamless use of SVM classification in pipelines and supporting hyperparameter tuning through grid search.
Key Features
- Wraps scikit-learn's SVC for use within framework3
- Supports various kernel types: linear, polynomial, RBF, and sigmoid
- Allows customization of regularization parameter (C) and kernel coefficient (gamma)
- Provides methods for fitting the model, making predictions, and generating parameter grids
Usage
The ClassifierSVMPlugin can be used to perform SVM classification on your data:
from framework3.base.base_types import XYData
import numpy as np
# Create sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 0, 1, 1])
X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
y_data = XYData(_hash='y_data', _path='/tmp', _value=y)
# Create and fit the SVM classifier
svm_plugin = ClassifierSVMPlugin(C=1.0, kernel='rbf', gamma='scale')
svm_plugin.fit(X_data, y_data)
# Make predictions
X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
predictions = svm_plugin.predict(X_test)
print(predictions.value)
# Generate parameter grid for hyperparameter tuning
grid_params = ClassifierSVMPlugin.item_grid(C=[0.1, 1, 10], kernel=['linear', 'rbf'], gamma=['scale', 'auto'])
print(grid_params)
Attributes:
Name | Type | Description |
---|---|---|
_model |
SVC
|
The underlying scikit-learn SVC model. |
Methods:
Name | Description |
---|---|
fit |
XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]: Fit the SVM model to the given data. |
predict |
XYData) -> XYData: Make predictions using the fitted SVM model. |
item_grid |
List[float], kernel: List[L], gamma: List[float | Literal['scale', 'auto']]) -> Dict[str, List[Any]]: Generate a parameter grid for hyperparameter tuning. |
Note
This plugin uses scikit-learn's implementation of SVM, which may have its own dependencies and requirements. Ensure that scikit-learn is properly installed and compatible with your environment.
Source code in framework3/plugins/filters/classification/svm.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
__init__(C=1.0, gamma='scale', kernel='linear')
¶
Initialize a new ClassifierSVMPlugin instance.
This constructor sets up the ClassifierSVMPlugin with the specified parameters and initializes the underlying scikit-learn SVC model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
C
|
float
|
Regularization parameter. Defaults to 1.0. |
1.0
|
gamma
|
float | Literal['scale', 'auto']
|
Kernel coefficient. Defaults to "scale". |
'scale'
|
kernel
|
L
|
Specifies the kernel type to be used in the algorithm. Can be 'linear', 'poly', 'rbf', or 'sigmoid'. Defaults to "linear". |
'linear'
|
Note
The parameters are passed directly to scikit-learn's SVC. Refer to scikit-learn's documentation for detailed information on these parameters.
Source code in framework3/plugins/filters/classification/svm.py
fit(x, y, evaluator=None)
¶
Fit the SVM model to the given data.
This method trains the SVM classifier on the provided input features and target values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input features for training. |
required |
y
|
Optional[XYData]
|
The target values for training. |
required |
evaluator
|
BaseMetric | None
|
An optional evaluator for the model. Not used in this method. |
None
|
Returns:
Type | Description |
---|---|
Optional[float]
|
Optional[float]: The score of the fitted model on the training data, or None if y is None. |
Note
This method uses scikit-learn's fit method internally. The score is calculated using scikit-learn's score method, which computes the mean accuracy.
Source code in framework3/plugins/filters/classification/svm.py
item_grid(C, kernel, gamma=['scale'])
staticmethod
¶
Generate a parameter grid for hyperparameter tuning.
This static method provides a way to generate a grid of parameters for use in hyperparameter optimization techniques like grid search.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
C
|
List[float]
|
List of regularization parameter values to try. |
required |
kernel
|
List[L]
|
List of kernel types to try. |
required |
gamma
|
List[float] | List[Literal['scale', 'auto']]
|
List of gamma values to try. Defaults to ["scale"]. |
['scale']
|
Returns:
Type | Description |
---|---|
Dict[str, List[Any]]
|
Dict[str, List[Any]]: A dictionary of parameter names and their possible values. |
Note
The returned dictionary can be used directly with hyperparameter tuning tools that accept parameter grids, such as scikit-learn's GridSearchCV.
Source code in framework3/plugins/filters/classification/svm.py
predict(x)
¶
Make predictions using the fitted SVM model.
This method uses the trained SVM classifier to make predictions on new input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input features to predict. |
required |
Returns:
Type | Description |
---|---|
XYData
|
The predicted values wrapped in an XYData object. |
Note
This method uses scikit-learn's predict method internally. The predictions are wrapped in an XYData object for consistency with the framework.
Source code in framework3/plugins/filters/classification/svm.py
KnnFilter
¶
Bases: BaseFilter
A wrapper for scikit-learn's KNeighborsClassifier using the framework3 BaseFilter interface.
This filter implements the K-Nearest Neighbors algorithm for classification within the framework3 ecosystem.
Key Features
- Integrates scikit-learn's KNeighborsClassifier with framework3
- Supports various KNN parameters like number of neighbors, weights, and distance metrics
- Provides methods for fitting the model and making predictions
- Includes a static method for generating parameter grids for hyperparameter tuning
Usage
The KnnFilter can be used to perform K-Nearest Neighbors classification on your data:
from framework3.plugins.filters.classification.knn import KnnFilter
from framework3.base.base_types import XYData
import numpy as np
# Create sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 0, 1, 1])
X_data = XYData(_hash='X_data', _path='/tmp', _value=X)
y_data = XYData(_hash='y_data', _path='/tmp', _value=y)
# Create and fit the KNN filter
knn = KnnFilter(n_neighbors=3, weights='uniform')
knn.fit(X_data, y_data)
# Make predictions
X_test = XYData(_hash='X_test', _path='/tmp', _value=np.array([[2.5, 3.5]]))
predictions = knn.predict(X_test)
print(predictions.value)
Attributes:
Name | Type | Description |
---|---|---|
_clf |
KNeighborsClassifier
|
The underlying scikit-learn KNN classifier. |
Methods:
Name | Description |
---|---|
fit |
XYData, y: Optional[XYData], evaluator: BaseMetric | None = None) -> Optional[float]: Fit the KNN model to the given data. |
predict |
XYData) -> XYData: Make predictions using the fitted KNN model. |
item_grid |
Generate a parameter grid for hyperparameter tuning. |
Note
This filter uses scikit-learn's implementation of KNN, which may have its own dependencies and requirements. Ensure that scikit-learn is properly installed and compatible with your environment.
Source code in framework3/plugins/filters/classification/knn.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
__init__(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)
¶
Initialize a new KnnFilter instance.
This constructor sets up the KnnFilter with the specified parameters and initializes the underlying scikit-learn KNeighborsClassifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_neighbors
|
int
|
Number of neighbors to use for knn. Defaults to 5. |
5
|
weights
|
Literal['uniform', 'distance']
|
Weight function used in prediction. Defaults to "uniform". |
'uniform'
|
algorithm
|
Literal['auto', 'ball_tree', 'kd_tree', 'brute']
|
Algorithm used to compute nearest neighbors. Defaults to "auto". |
'auto'
|
leaf_size
|
int
|
Leaf size passed to BallTree or KDTree. Defaults to 30. |
30
|
p
|
int
|
Power parameter for the Minkowski metric. Defaults to 2 (Euclidean distance). |
2
|
metric
|
str
|
The distance metric to use for the tree. Defaults to "minkowski". |
'minkowski'
|
metric_params
|
Optional[Dict[str, Any]]
|
Additional keyword arguments for the metric function. Defaults to None. |
None
|
n_jobs
|
Optional[int]
|
The number of parallel jobs to run for neighbors search. Defaults to None. |
None
|
Note
The parameters are passed directly to scikit-learn's KNeighborsClassifier. Refer to scikit-learn's documentation for detailed information on these parameters.
Source code in framework3/plugins/filters/classification/knn.py
fit(x, y, evaluator=None)
¶
Fit the KNN model to the given data.
This method trains the KNN classifier on the provided input features and target values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input features for training. |
required |
y
|
Optional[XYData]
|
The target values for training. |
required |
evaluator
|
BaseMetric | None
|
An optional evaluator for the model. Not used in this method. |
None
|
Returns:
Type | Description |
---|---|
Optional[float]
|
Optional[float]: The score of the fitted model on the training data. |
Note
This method uses scikit-learn's fit method internally. The score is calculated using scikit-learn's score method, which computes the mean accuracy.
Source code in framework3/plugins/filters/classification/knn.py
item_grid(**kwargs)
staticmethod
¶
Generate a parameter grid for hyperparameter tuning.
This static method provides a way to generate a grid of parameters for use in hyperparameter optimization techniques like grid search.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
Dict[str, List[Any]]
|
Keyword arguments to override default parameter ranges. |
{}
|
Returns:
Type | Description |
---|---|
type[BaseFilter]
|
tuple[type[BaseFilter], Dict[str, List[Any]]]: A tuple containing the KnnFilter class |
Dict[str, List[Any]]
|
and a dictionary of parameter names and their possible values. |
Note
The returned dictionary can be used directly with hyperparameter tuning tools that accept parameter grids, such as scikit-learn's GridSearchCV.
Source code in framework3/plugins/filters/classification/knn.py
predict(x)
¶
Make predictions using the fitted KNN model.
This method uses the trained KNN classifier to make predictions on new input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input features to predict. |
required |
Returns:
Name | Type | Description |
---|---|---|
XYData |
XYData
|
The predicted values wrapped in an XYData object. |
Note
This method uses scikit-learn's predict method internally. The predictions are wrapped in an XYData object for consistency with the framework.
Source code in framework3/plugins/filters/classification/knn.py
Overview¶
The Classification Filters module in framework3 provides a collection of powerful classification algorithms that can be easily integrated into your machine learning pipelines. These filters are designed to work seamlessly with the framework3 ecosystem, providing a consistent interface and enhanced functionality.
Available Classifiers¶
SVM Classifier¶
The Support Vector Machine (SVM) classifier is implemented in the ClassifierSVMPlugin
. This versatile classifier is effective for both linear and non-linear classification tasks.
Usage¶
from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin
svm_classifier = ClassifierSVMPlugin(C=1.0, kernel='rbf', gamma='scale')
Parameters¶
C
(float): Regularization parameter. The strength of the regularization is inversely proportional to C.kernel
(str): The kernel type to be used in the algorithm. Options include 'linear', 'poly', 'rbf', and 'sigmoid'.gamma
(str or float): Kernel coefficient for 'rbf', 'poly' and 'sigmoid' kernels.
K-Nearest Neighbors Classifier¶
The K-Nearest Neighbors (KNN) classifier is implemented in the KnnFilter
. This simple yet effective classifier is based on the principle of finding the K nearest neighbors to make predictions.
Usage¶
from framework3.plugins.filters.classification.knn import KnnFilter
knn_classifier = KnnFilter(n_neighbors=5, weights='uniform')
Parameters¶
n_neighbors
(int): Number of neighbors to use for kneighbors queries.weights
(str): Weight function used in prediction. Options are 'uniform' (all points in each neighborhood are weighted equally) or 'distance' (weight points by the inverse of their distance).
Comprehensive Example: Iris Dataset Classification¶
In this example, we'll demonstrate how to use the Classification Filters with the Iris dataset, showcasing both SVM and KNN classifiers, as well as integration with GridSearchCV.
from framework3.plugins.pipelines.gs_cv_pipeline import GridSearchCVPipeline
from framework3.plugins.filters.classification.svm import ClassifierSVMPlugin
from framework3.plugins.filters.classification.knn import KnnFilter
from framework3.base.base_types import XYData
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create XYData objects
X_train_data = XYData(_hash='X_train', _path='/tmp', _value=X_train)
y_train_data = XYData(_hash='y_train', _path='/tmp', _value=y_train)
X_test_data = XYData(_hash='X_test', _path='/tmp', _value=X_test)
y_test_data = XYData(_hash='y_test', _path='/tmp', _value=y_test)
# Create a pipeline with SVM classifier
svm_pipeline = GridSearchCVPipeline(
filterx=[ClassifierSVMPlugin],
param_grid=ClassifierSVMPlugin.item_grid(C=[0.1, 1, 10], kernel=['linear', 'rbf']),
scoring='accuracy',
cv=5
)
# Fit the SVM pipeline
svm_pipeline.fit(X_train_data, y_train_data)
# Make predictions with SVM
svm_predictions = svm_pipeline.predict(X_test_data)
print("SVM Predictions:", svm_predictions.value)
# Create a pipeline with KNN classifier
knn_pipeline = GridSearchCVPipeline(
filterx=[KnnFilter],
param_grid=KnnFilter.item_grid(n_neighbors=[3, 5, 7], weights=['uniform', 'distance']),
scoring='accuracy',
cv=5
)
# Fit the KNN pipeline
knn_pipeline.fit(X_train_data, y_train_data)
# Make predictions with KNN
knn_predictions = knn_pipeline.predict(X_test_data)
print("KNN Predictions:", knn_predictions.value)
# Evaluate the models
from sklearn.metrics import accuracy_score
svm_accuracy = accuracy_score(y_test, svm_predictions.value)
knn_accuracy = accuracy_score(y_test, knn_predictions.value)
print("SVM Accuracy:", svm_accuracy)
print("KNN Accuracy:", knn_accuracy)
This example demonstrates how to:
- Load and prepare the Iris dataset
- Create XYData objects for use with framework3
- Set up GridSearchCV pipelines for both SVM and KNN classifiers
- Fit the models and make predictions
- Evaluate the models using accuracy scores
Best Practices¶
-
Data Preprocessing: Ensure your data is properly preprocessed before applying classification filters. This may include scaling, normalization, or handling missing values.
-
Hyperparameter Tuning: Use
GridSearchCVPipeline
to find the optimal hyperparameters for your chosen classifier, as demonstrated in the example. -
Model Evaluation: Always evaluate your model's performance using appropriate metrics and cross-validation techniques. In the example, we used accuracy, but consider other metrics like precision, recall, or F1-score depending on your specific problem.
-
Feature Selection: Consider applying feature selection techniques to improve model performance and reduce overfitting, especially when dealing with high-dimensional datasets.
-
Ensemble Methods: Experiment with combining multiple classifiers to create ensemble models, which can often lead to improved performance.
Conclusion¶
The Classification Filters module in framework3 provides a robust set of tools for tackling various classification tasks. By leveraging these filters in combination with other framework3 components, you can build powerful and efficient machine learning pipelines. The example with the Iris dataset demonstrates how easy it is to use these classifiers and integrate them with GridSearchCV for hyperparameter tuning.