If you have seen the tutorial Reuse Data you mai noticed that we've use a standard sklean optimizer for hyperparameter tuning. This is fine for many uses cases, but it might not be the best choice for somo others. For those how need a more advanced optimization strategy, Wandb is a great choice.
We will use a simple pipeline for the iris dataset.¶
In [1]:
Copied!
import wandb
wandb.login()
import wandb
wandb.login()
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: Currently logged in as: manu-couto1k (citius-irlab) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
Out[1]:
True
In [2]:
Copied!
from framework3.utils.patch_type_guard import patch_inspect_for_notebooks
patch_inspect_for_notebooks()
from framework3.utils.patch_type_guard import patch_inspect_for_notebooks
patch_inspect_for_notebooks()
✅ Patched inspect.getsource using dill.
In [3]:
Copied!
from sklearn import datasets
from framework3.base.base_clases import XYData
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data,
iris.target,
test_size=0.2,
random_state=42, # type: ignore
)
X_train = XYData(
_hash="Iris X train data",
_path="/datasets",
_value=X_train,
)
y_train = XYData(
_hash="Iris y train data",
_path="/datasets",
_value=y_train, # type: ignore
)
X_test = XYData(
_hash="Iris X train data",
_path="/datasets",
_value=X_test,
)
y_test = XYData(
_hash="Iris y train data",
_path="/datasets",
_value=y_test, # type: ignore
)
from sklearn import datasets
from framework3.base.base_clases import XYData
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data,
iris.target,
test_size=0.2,
random_state=42, # type: ignore
)
X_train = XYData(
_hash="Iris X train data",
_path="/datasets",
_value=X_train,
)
y_train = XYData(
_hash="Iris y train data",
_path="/datasets",
_value=y_train, # type: ignore
)
X_test = XYData(
_hash="Iris X train data",
_path="/datasets",
_value=X_test,
)
y_test = XYData(
_hash="Iris y train data",
_path="/datasets",
_value=y_test, # type: ignore
)
Then we will configure wandb for hyperparameter tuning and a Sklearn splitter for cross validation.¶
Wandb provides a dashboard to visualize the results of the experiments. For this to work, you need to define project name and login to the wandb services.
In [4]:
Copied!
from framework3 import F1, F3Pipeline, KnnFilter, Precission, StandardScalerPlugin
from framework3.plugins.metrics.classification import Recall, XYData
from framework3.plugins.optimizer.wandb_optimizer import WandbOptimizer
from framework3.plugins.splitter.cross_validation_splitter import KFoldSplitter
wandb_pipeline = (
F3Pipeline(
filters=[
StandardScalerPlugin(),
KnnFilter().grid({"n_neighbors": [2, 3, 4, 5, 6]}),
],
metrics=[F1(), Precission(), Recall()],
)
.splitter(
KFoldSplitter(
n_splits=2,
shuffle=True,
random_state=42,
)
)
.optimizer(
WandbOptimizer(
project="test_project",
sweep_id=None,
scorer=F1(),
)
)
)
from framework3 import F1, F3Pipeline, KnnFilter, Precission, StandardScalerPlugin
from framework3.plugins.metrics.classification import Recall, XYData
from framework3.plugins.optimizer.wandb_optimizer import WandbOptimizer
from framework3.plugins.splitter.cross_validation_splitter import KFoldSplitter
wandb_pipeline = (
F3Pipeline(
filters=[
StandardScalerPlugin(),
KnnFilter().grid({"n_neighbors": [2, 3, 4, 5, 6]}),
],
metrics=[F1(), Precission(), Recall()],
)
.splitter(
KFoldSplitter(
n_splits=2,
shuffle=True,
random_state=42,
)
)
.optimizer(
WandbOptimizer(
project="test_project",
sweep_id=None,
scorer=F1(),
)
)
)
In [5]:
Copied!
wandb_pipeline.fit(X_train, y_train)
_y = wandb_pipeline.predict(x=X_test)
wandb_pipeline.fit(X_train, y_train)
_y = wandb_pipeline.predict(x=X_test)
categorical param: n_neighbors: [2, 3, 4, 5, 6]
______________________SWEE CONFIG_____________________
{ 'parameters': { 'filters': {'parameters': {'KnnFilter': {'parameters': {'n_neighbors': {'values': [2, 3, 4, 5, 6]}}}}}, 'pipeline': { 'value': { 'clazz': 'KFoldSplitter', 'params': { 'n_splits': 2, 'shuffle': True, 'random_state': 42, 'pipeline': { 'clazz': 'F3Pipeline', 'params': { 'filters': [ {'clazz': 'StandardScalerPlugin', 'params': {}}, { 'clazz': 'KnnFilter', 'params': { 'n_neighbors': [2, 3, 4, 5, 6], 'weights': 'uniform', 'algorithm': 'auto', 'leaf_size': 30, 'p': 2, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None }, '_grid': {'n_neighbors': [2, 3, 4, 5, 6]} } ], 'metrics': [ {'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}} ], 'overwrite': False, 'store': False, 'log': False } } } } }, 'x_dataset': {'value': 'Iris X train data'}, 'y_dataset': {'value': 'Iris y train data'} }, 'method': 'grid', 'metric': {'name': 'F1', 'goal': 'maximize'} }
_____________________________________________________
Create sweep with ID: kr0p2w24 Sweep URL: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
wandb: Agent Starting Run: qvhp71sa with config: wandb: filters: {'KnnFilter': {'n_neighbors': 2}} wandb: pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}} wandb: x_dataset: Iris X train data wandb: y_dataset: Iris y train data
Tracking run with wandb version 0.19.9
Run data is saved locally in
/home/manuel.couto.pintos/Documents/code/framework3/docs/examples/notebooks/wandb/run-20250416_170907-qvhp71sa
Syncing run swift-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
View project at https://wandb.ai/citius-irlab/test_project
Run history:
F1 | ▁ |
Run summary:
F1 | 0.90865 |
View run swift-sweep-1 at: https://wandb.ai/citius-irlab/test_project/runs/qvhp71sa
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at:
./wandb/run-20250416_170907-qvhp71sa/logs
wandb: Agent Starting Run: bv8epurg with config: wandb: filters: {'KnnFilter': {'n_neighbors': 3}} wandb: pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}} wandb: x_dataset: Iris X train data wandb: y_dataset: Iris y train data
Tracking run with wandb version 0.19.9
Run data is saved locally in
/home/manuel.couto.pintos/Documents/code/framework3/docs/examples/notebooks/wandb/run-20250416_170913-bv8epurg
Syncing run driven-sweep-2 to Weights & Biases (docs)
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
View project at https://wandb.ai/citius-irlab/test_project
Run history:
F1 | ▁ |
Run summary:
F1 | 0.92541 |
View run driven-sweep-2 at: https://wandb.ai/citius-irlab/test_project/runs/bv8epurg
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at:
./wandb/run-20250416_170913-bv8epurg/logs
wandb: Agent Starting Run: y6ebmuh1 with config: wandb: filters: {'KnnFilter': {'n_neighbors': 4}} wandb: pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}} wandb: x_dataset: Iris X train data wandb: y_dataset: Iris y train data
Tracking run with wandb version 0.19.9
Run data is saved locally in
/home/manuel.couto.pintos/Documents/code/framework3/docs/examples/notebooks/wandb/run-20250416_170918-y6ebmuh1
Syncing run vague-sweep-3 to Weights & Biases (docs)
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
View project at https://wandb.ai/citius-irlab/test_project
Run history:
F1 | ▁ |
Run summary:
F1 | 0.93372 |
View run vague-sweep-3 at: https://wandb.ai/citius-irlab/test_project/runs/y6ebmuh1
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at:
./wandb/run-20250416_170918-y6ebmuh1/logs
wandb: Agent Starting Run: srq9t0tk with config: wandb: filters: {'KnnFilter': {'n_neighbors': 5}} wandb: pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}} wandb: x_dataset: Iris X train data wandb: y_dataset: Iris y train data
Tracking run with wandb version 0.19.9
Run data is saved locally in
/home/manuel.couto.pintos/Documents/code/framework3/docs/examples/notebooks/wandb/run-20250416_170924-srq9t0tk
Syncing run apricot-sweep-4 to Weights & Biases (docs)
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
View project at https://wandb.ai/citius-irlab/test_project
Run history:
F1 | ▁ |
Run summary:
F1 | 0.91695 |
View run apricot-sweep-4 at: https://wandb.ai/citius-irlab/test_project/runs/srq9t0tk
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at:
./wandb/run-20250416_170924-srq9t0tk/logs
wandb: Agent Starting Run: za54vh64 with config: wandb: filters: {'KnnFilter': {'n_neighbors': 6}} wandb: pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}} wandb: x_dataset: Iris X train data wandb: y_dataset: Iris y train data
Tracking run with wandb version 0.19.9
Run data is saved locally in
/home/manuel.couto.pintos/Documents/code/framework3/docs/examples/notebooks/wandb/run-20250416_170929-za54vh64
Syncing run silvery-sweep-5 to Weights & Biases (docs)
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
Sweep page: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24
View project at https://wandb.ai/citius-irlab/test_project
Run history:
F1 | ▁ |
Run summary:
F1 | 0.93284 |
View run silvery-sweep-5 at: https://wandb.ai/citius-irlab/test_project/runs/za54vh64
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
View project at: https://wandb.ai/citius-irlab/test_project
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at:
./wandb/run-20250416_170929-za54vh64/logs
wandb: Sweep Agent: Waiting for job. wandb: Sweep Agent: Exiting. wandb: Sorting runs by -summary_metrics.F1
{ 'filters': {'KnnFilter': {'n_neighbors': 4}}, 'pipeline': { 'clazz': 'KFoldSplitter', 'params': { 'shuffle': True, 'n_splits': 2, 'pipeline': { 'clazz': 'F3Pipeline', 'params': { 'log': False, 'store': False, 'filters': [ {'clazz': 'StandardScalerPlugin', 'params': {}}, { '_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': { 'p': 2, 'metric': 'minkowski', 'n_jobs': None, 'weights': 'uniform', 'algorithm': 'auto', 'leaf_size': 30, 'n_neighbors': [2, 3, 4, 5, 6], 'metric_params': None } } ], 'metrics': [ {'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}} ], 'overwrite': False } }, 'random_state': 42 } }, 'x_dataset': 'Iris X train data', 'y_dataset': 'Iris y train data' }
____________________________________________________________________________________________________
Fitting pipeline...
****************************************************************************************************
*StandardScalerPlugin({})
*KnnFilter({'n_neighbors': 4, 'weights': 'uniform', 'algorithm': 'auto', 'leaf_size': 30, 'p': 2, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None})
____________________________________________________________________________________________________
Predicting with KFold Splitter......
****************************************************************************************************
F3Pipeline( filters=[ StandardScalerPlugin(), KnnFilter( n_neighbors=4, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None ) ], metrics=[F1(average='weighted'), Precission(average='weighted'), Recall(average='weighted')], overwrite=False, store=False, log=False )
____________________________________________________________________________________________________
Predicting pipeline...
****************************************************************************************************
*StandardScalerPlugin({})
*KnnFilter({'n_neighbors': 4, 'weights': 'uniform', 'algorithm': 'auto', 'leaf_size': 30, 'p': 2, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None})
In [6]:
Copied!
wandb_pipeline.evaluate(X_test, y_test, _y)
wandb_pipeline.evaluate(X_test, y_test, _y)
____________________________________________________________________________________________________
Evaluating pipeline......
****************************************************************************************************
Out[6]:
{'F1': 1.0, 'Precission': 1.0, 'Recall': 1.0}
Wandb dashboard¶
Similar to Optuna, we can analyze the influence of each parameter on the selected metric. However, unlike Optuna, WandB offers a paid version with additional and more advanced features.