F3Pipeline¶
F3Pipeline
¶
Bases: SequentialPipeline
A flexible sequential pipeline implementation for machine learning workflows.
F3Pipeline allows chaining multiple filters together and applying metrics for evaluation. It supports fitting, predicting, and evaluating data through the pipeline.
Key Features
- Sequential application of multiple filters
- Support for various metrics for evaluation
- Configurable data storage and logging options
Usage
from framework3.plugins.pipelines.sequential import F3Pipeline
from framework3.plugins.filters.transformation import PCAPlugin
from framework3.plugins.filters.classification import ClassifierSVMPlugin
from framework3.plugins.metrics.classification import F1, Precision, Recall
from framework3.base import XYData
import numpy as np
# Create a pipeline with PCA and SVM
pipeline = F3Pipeline(
filters=[
PCAPlugin(n_components=2),
ClassifierSVMPlugin(kernel='rbf', C=1.0)
],
metrics=[F1(), Precision(), Recall()]
)
# Prepare some dummy data
X = XYData(value=np.random.rand(100, 10))
y = XYData(value=np.random.randint(0, 2, 100))
# Fit the pipeline
pipeline.fit(X, y)
# Make predictions
y_pred = pipeline.predict(X)
# Evaluate the pipeline
results = pipeline.evaluate(X, y, y_pred)
print(results)
Attributes:
Name | Type | Description |
---|---|---|
filters |
List[BaseFilter]
|
List of filters to be applied in the pipeline. |
metrics |
List[BaseMetric]
|
List of metrics for evaluation. |
overwrite |
bool
|
Whether to overwrite existing data in storage. |
store |
bool
|
Whether to store intermediate results. |
log |
bool
|
Whether to log pipeline operations. |
Methods:
Name | Description |
---|---|
fit |
XYData, y: Optional[XYData]) -> None | float: Fit the pipeline to the input data. |
predict |
XYData) -> XYData: Make predictions using the fitted pipeline. |
evaluate |
XYData, y_true: XYData | None, y_pred: XYData) -> Dict[str, float]: Evaluate the pipeline using specified metrics. |
start |
XYData, y: Optional[XYData], X_: Optional[XYData]) -> Optional[XYData]: Start the pipeline execution by fitting and optionally predicting. |
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
|
__init__(filters, metrics=[], overwrite=False, store=False, log=False)
¶
Initialize the F3Pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters
|
List[BaseFilter]
|
List of filters to be applied in the pipeline. |
required |
metrics
|
List[BaseMetric]
|
List of metrics for evaluation. Defaults to []. |
[]
|
overwrite
|
bool
|
Whether to overwrite existing data. Defaults to False. |
False
|
store
|
bool
|
Whether to store intermediate results. Defaults to False. |
False
|
log
|
bool
|
Whether to log pipeline operations. Defaults to False. |
False
|
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
evaluate(x_data, y_true, y_pred)
¶
Evaluate the pipeline using the specified metrics.
This method applies each metric in the pipeline to the predicted and true values, returning a dictionary of evaluation results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_data
|
XYData
|
The input data used for evaluation. |
required |
y_true
|
XYData | None
|
The true target values, if available. |
required |
y_pred
|
XYData
|
The predicted values. |
required |
Returns:
Type | Description |
---|---|
Dict[str, float]
|
Dict[str, float]: A dictionary containing the evaluation results for each metric. |
Example
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
fit(x, y)
¶
Fit the pipeline to the input data.
This method applies each filter in the pipeline sequentially to the input data, fitting each filter that requires training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data to fit the pipeline on. |
required |
y
|
Optional[XYData]
|
The target data, if available. |
required |
Returns:
Type | Description |
---|---|
None | float
|
None | float: The loss value from the last fitted filter, if any. |
Note
Filters that raise NotTrainableFilterError will be initialized instead of fitted.
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
inner()
¶
Get the list of filters in the pipeline.
Returns:
Type | Description |
---|---|
List[BaseFilter]
|
List[BaseFilter]: The list of filters in the pipeline. |
predict(x)
¶
Make predictions using the fitted pipeline.
This method applies each filter in the pipeline sequentially to the input data to generate predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data to make predictions on. |
required |
Returns:
Name | Type | Description |
---|---|---|
XYData |
XYData
|
The predictions made by the pipeline. |
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
start(x, y, X_)
¶
Start the pipeline execution by fitting the model and making predictions.
This method initiates the pipeline process by fitting the model to the input data and then making predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data for fitting and prediction. |
required |
y
|
Optional[XYData]
|
The target data for fitting, if available. |
required |
X_
|
Optional[XYData]
|
Additional input data for prediction, if different from x. |
required |
Returns:
Type | Description |
---|---|
Optional[XYData]
|
Optional[XYData]: The predictions made by the pipeline, or None if an error occurs. |
Raises:
Type | Description |
---|---|
Exception
|
If an error occurs during the fitting or prediction process. |
Source code in framework3/plugins/pipelines/sequential/f3_pipeline.py
Overview¶
The F3Pipeline is a flexible and powerful pipeline implementation in the framework3 ecosystem. It allows you to chain multiple data processing steps, machine learning models, and evaluation metrics into a single, cohesive workflow.
Key Features¶
- Seamless integration of multiple plugins (filters, transformers, models)
- Built-in support for various metrics
- Caching capabilities for improved performance
- Nested pipeline support for complex workflows
Basic Usage¶
Here's a simple example of how to create and use an F3Pipeline:
from framework3.plugins.pipelines import F3Pipeline
from framework3.plugins.filters.transformation import PCAPlugin
from framework3.plugins.filters.classification import SVMClassifier
from framework3.plugins.metrics import F1Score, Accuracy
from framework3.base.base_types import XYData
import numpy as np
# Create a pipeline
pipeline = F3Pipeline(
plugins=[
PCAPlugin(n_components=2),
SVMClassifier(kernel='rbf')
],
metrics=[F1Score(), Accuracy()]
)
# Generate some dummy data
X = XYData(value=np.random.rand(100, 10))
y = XYData(value=np.random.randint(0, 2, 100))
# Fit the pipeline
pipeline.fit(X, y)
# Make predictions
y_pred = pipeline.predict(X)
# Evaluate the pipeline
results = pipeline.evaluate(X, y, y_pred)
print(results)
Advanced Usage¶
Nested Pipelines¶
F3Pipeline supports nesting, allowing you to create more complex workflows:
from framework3.plugins.pipelines import F3Pipeline
from framework3.plugins.filters.transformation import NormalizationPlugin
from framework3.plugins.filters.feature_selection import VarianceThresholdPlugin
# Create a sub-pipeline
feature_engineering = F3Pipeline(
plugins=[
NormalizationPlugin(),
VarianceThresholdPlugin(threshold=0.1)
],
metrics=[]
)
# Create the main pipeline
main_pipeline = F3Pipeline(
plugins=[
feature_engineering,
SVMClassifier(kernel='linear')
],
metrics=[F1Score(), Accuracy()]
)
# Use the main pipeline as before
main_pipeline.fit(X, y)
y_pred = main_pipeline.predict(X)
results = main_pipeline.evaluate(X, y, y_pred)
Caching¶
F3Pipeline supports caching of intermediate results and fitted models for improved performance:
from framework3.plugins.filters.cached_filter import Cached
from framework3.plugins.filters.transformation import PCAPlugin
pipeline = F3Pipeline(
plugins=[
Cached(
filter=PCAPlugin(n_components=2),
cache_data=True,
cache_filter=True,
overwrite=False
),
SVMClassifier()
],
metrics=[F1Score()]
)
# The PCA transformation will be cached after the first run
pipeline.fit(X, y)
API Reference¶
F3Pipeline¶
class F3Pipeline(BasePipeline):
def __init__(self, plugins: List[BasePlugin], metrics: List[BaseMetric], overwrite: bool = False, store: bool = False, log: bool = False) -> None:
"""
Initialize the F3Pipeline.
Args:
plugins (List[BasePlugin]): List of plugins to be applied in the pipeline.
metrics (List[BaseMetric]): List of metrics for evaluation.
overwrite (bool, optional): Whether to overwrite existing data. Defaults to False.
store (bool, optional): Whether to store intermediate results. Defaults to False.
log (bool, optional): Whether to log pipeline operations. Defaults to False.
"""
Methods¶
fit(self, x: XYData, y: Optional[XYData])
: Fit the pipeline to the input data.predict(self, x: XYData) -> XYData
: Make predictions using the fitted pipeline.evaluate(self, x_data: XYData, y_true: XYData|None, y_pred: XYData) -> Dict[str, float]
: Evaluate the pipeline using the specified metrics.
Best Practices¶
-
Order Matters: The order of plugins in the pipeline is crucial. Ensure that your data preprocessing steps come before your model.
-
Caching: Use caching for computationally expensive steps, especially when you're iterating on your pipeline design.
-
Nested Pipelines: Use nested pipelines to organize complex workflows into logical sub-components.
-
Metrics: Include multiple relevant metrics to get a comprehensive view of your pipeline's performance.
-
Cross-Validation: Consider using cross-validation techniques in conjunction with F3Pipeline for more robust model evaluation.
-
Logging: Enable logging to get insights into the pipeline's operation and to help with debugging.
-
Parameter Tuning: Use F3Pipeline in conjunction with hyperparameter tuning techniques to optimize your entire workflow.
Conclusion¶
F3Pipeline provides a powerful and flexible way to build complex data processing and machine learning workflows in framework3. By combining multiple plugins, nested pipelines, and caching capabilities, you can create efficient and maintainable pipelines for a wide range of tasks.