OptunaOptimizer

`framework3.plugins.optimizer.optuna_optimizer` ¶

`OptunaOptimizer` ¶

Bases: BaseOptimizer

Optuna-based optimizer for hyperparameter tuning.

This class implements hyperparameter optimization using the Optuna framework. It allows for efficient searching of hyperparameter spaces for machine learning models.

Key Features

Supports various types of hyperparameters (categorical, integer, float)
Allows for customizable optimization direction (minimize or maximize)
Can resume previous studies or start new ones
Integrates with the Framework3 pipeline system

Usage

The OptunaOptimizer can be used to optimize hyperparameters of a machine learning pipeline:

from framework3.plugins.optimizer import OptunaOptimizer
from framework3.base import XYData

# Assuming you have a pipeline and data
pipeline = ...
x_data = XYData(...)
y_data = XYData(...)

optimizer = OptunaOptimizer(direction="minimize", n_trials=100)
optimizer.optimize(pipeline)
optimizer.fit(x_data, y_data)

best_pipeline = optimizer.pipeline

Attributes:

Name	Type	Description
`direction`	`str`	The direction of optimization ("minimize" or "maximize").
`n_trials`	`int`	The number of trials for the optimization process.
`load_if_exists`	`bool`	Whether to load an existing study if one exists.
`reset_study`	`bool`	Whether to reset the study before optimization.
`pipeline`	`BaseFilter \| None`	The pipeline to be optimized.
`study_name`	`str \| None`	The name of the Optuna study.
`storage`	`str \| None`	The storage URL for the Optuna study.

Methods:

Name	Description
`optimize`	BaseFilter): Set up the optimization process for a given pipeline.
`fit`	XYData, y: XYData \| None): Perform the optimization and fit the best pipeline.
`predict`	XYData) -> XYData: Make predictions using the optimized pipeline.
`evaluate`	XYData, y_true: XYData \| None, y_pred: XYData) -> Dict[str, Any]: Evaluate the optimized pipeline.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

@Container.bind()
class OptunaOptimizer(BaseOptimizer):
    """
    Optuna-based optimizer for hyperparameter tuning.

    This class implements hyperparameter optimization using the Optuna framework.
    It allows for efficient searching of hyperparameter spaces for machine learning models.

    Key Features:
        - Supports various types of hyperparameters (categorical, integer, float)
        - Allows for customizable optimization direction (minimize or maximize)
        - Can resume previous studies or start new ones
        - Integrates with the Framework3 pipeline system

    Usage:
        The OptunaOptimizer can be used to optimize hyperparameters of a machine learning pipeline:

        ```python
        from framework3.plugins.optimizer import OptunaOptimizer
        from framework3.base import XYData

        # Assuming you have a pipeline and data
        pipeline = ...
        x_data = XYData(...)
        y_data = XYData(...)

        optimizer = OptunaOptimizer(direction="minimize", n_trials=100)
        optimizer.optimize(pipeline)
        optimizer.fit(x_data, y_data)

        best_pipeline = optimizer.pipeline
        ```

    Attributes:
        direction (str): The direction of optimization ("minimize" or "maximize").
        n_trials (int): The number of trials for the optimization process.
        load_if_exists (bool): Whether to load an existing study if one exists.
        reset_study (bool): Whether to reset the study before optimization.
        pipeline (BaseFilter | None): The pipeline to be optimized.
        study_name (str | None): The name of the Optuna study.
        storage (str | None): The storage URL for the Optuna study.

    Methods:
        optimize(pipeline: BaseFilter): Set up the optimization process for a given pipeline.
        fit(x: XYData, y: XYData | None): Perform the optimization and fit the best pipeline.
        predict(x: XYData) -> XYData: Make predictions using the optimized pipeline.
        evaluate(x_data: XYData, y_true: XYData | None, y_pred: XYData) -> Dict[str, Any]:
            Evaluate the optimized pipeline.
    """

    def __init__(
        self,
        direction: str,
        n_trials: int = 2,
        load_if_exists: bool = False,
        reset_study: bool = False,
        pipeline: BaseFilter | None = None,
        study_name: str | None = None,
        storage: str | None = None,
    ):
        """
        Initialize the OptunaOptimizer.

        Args:
            direction (str): The direction of optimization ("minimize" or "maximize").
            n_trials (int): The number of trials for the optimization process.
            load_if_exists (bool): Whether to load an existing study if one exists.
            reset_study (bool): Whether to reset the study before optimization.
            pipeline (BaseFilter | None): The pipeline to be optimized.
            study_name (str | None): The name of the Optuna study.
            storage (str | None): The storage URL for the Optuna study.
        """
        super().__init__(direction=direction, study_name=study_name, storage=storage)
        self.direction = direction
        self.study_name = study_name
        self.storage = storage
        self.pipeline = pipeline
        self.n_trials = n_trials
        self.load_if_exists = load_if_exists
        self.reset_study = reset_study

    def optimize(self, pipeline: BaseFilter):
        """
        Set up the optimization process for a given pipeline.

        This method prepares the Optuna study for optimization.

        Args:
            pipeline (BaseFilter): The pipeline to be optimized.
        """
        self.pipeline = pipeline
        self.pipeline._verbose = False

        if (
            self.reset_study
            and self.study_name is not None
            and self.storage is not None
        ):
            studies = optuna.get_all_study_summaries(storage=self.storage)

            if self.study_name in [study.study_name for study in studies]:
                optuna.delete_study(study_name=self.study_name, storage=self.storage)

        self._study = optuna.create_study(
            study_name=self.study_name,
            direction=self.direction,
            storage=self.storage,
            load_if_exists=self.load_if_exists,
        )

    def get_grid(self, aux: Dict[str, Any], f: Callable):
        """
        Recursively process the grid configuration of a pipeline or filter.

        This method traverses the configuration dictionary and applies the provided
        callable to each grid parameter.

        Args:
            aux (Dict[str, Any]): The configuration dictionary to process.
            f (Callable): A function to apply to each grid parameter.

        Note:
            This method modifies the input dictionary in-place.
        """
        match aux["params"]:
            case {"filters": filters, **r}:
                for filter_config in filters:
                    self.get_grid(filter_config, f)
            case {"pipeline": pipeline, **r}:  # noqa: F841
                self.get_grid(pipeline, f)
            case p_params:
                if "_grid" in aux:
                    for param, value in aux["_grid"].items():
                        value = f(param, value)
                        p_params.update({param: value})

    def build_pipeline(
        self, dumped_pipeline: Dict[str, Any], f: Callable
    ) -> BaseFilter:
        """
        Build a pipeline from a dumped configuration.

        This method processes the dumped pipeline configuration, applies the provided
        callable to the grid parameters, and constructs a new BaseFilter object.

        Args:
            dumped_pipeline (Dict[str, Any]): The dumped pipeline configuration.
            f (Callable): A function to apply to each grid parameter.

        Returns:
            BaseFilter: The constructed pipeline.

        Note:
            This method uses the Container.pif for dependency injection when building
            the pipeline components.
        """
        self.get_grid(dumped_pipeline, f)

        pipeline: BaseFilter = cast(
            BaseFilter, BasePlugin.build_from_dump(dumped_pipeline, Container.pif)
        )
        return pipeline

    def fit(self, x: XYData, y: XYData | None = None):
        """
        Perform the optimization and fit the best pipeline.

        This method runs the Optuna optimization process and fits the best found pipeline.

        Args:
            x (XYData): The input features.
            y (XYData | None): The target values (if applicable).

        Raises:
            ValueError: If the pipeline is not defined before fitting.
        """
        self._print_acction("Fitting with OptunaOptimizer...")
        if self._verbose:
            print(self.pipeline)

        if self.pipeline is not None:
            dumped_pipeline = self.pipeline.item_dump(include=["_grid"])
            print(dumped_pipeline)

            def objective(trial) -> Union[float, Sequence[float]]:
                def matcher(k, v):
                    match v:
                        case list():
                            return trial.suggest_categorical(k, v)
                        case dict():
                            if type(v["low"]) is int and type(v["high"]) is int:
                                return trial.suggest_int(k, v["low"], v["high"])
                            elif type(v["low"]) is float and type(v["high"]) is float:
                                return trial.suggest_float(k, v["low"], v["high"])
                            else:
                                raise ValueError(
                                    f"Inconsistent types in tuple: {k}: {v}"
                                )
                        case (min_v, max_v):
                            if type(min_v) is int and type(max_v) is int:
                                return trial.suggest_int(k, min_v, max_v)
                            elif type(min_v) is float and type(max_v) is float:
                                return trial.suggest_float(k, min_v, max_v)
                            else:
                                raise ValueError(
                                    f"Inconsistent types in tuple: {k}: {v}"
                                )
                        case _:
                            raise ValueError(f"Unsupported type in grid: {k}: {v}")

                pipeline: BaseFilter = self.build_pipeline(dumped_pipeline, matcher)
                pipeline.verbose(False)

                match pipeline.fit(x, y):
                    case None:
                        return float(
                            next(
                                iter(
                                    pipeline.evaluate(
                                        x, y, pipeline.predict(x)
                                    ).values()
                                )
                            )
                        )
                    case float() as loss:
                        return loss
                    case _:
                        raise ValueError("Unsupported type in pipeline.fit")

            self._study.optimize(
                objective, n_trials=self.n_trials, show_progress_bar=True
            )

            best_params = self._study.best_params
            if best_params:
                print(f"Best params: {best_params}")
                pipeline = self.build_pipeline(
                    dumped_pipeline, lambda k, _: best_params[k]
                ).unwrap()
                pipeline.fit(x, y)
                self.pipeline = pipeline
            else:
                self.pipeline.unwrap().fit(x, y)
        else:
            raise ValueError("Pipeline must be defined before fitting")

    def predict(self, x: XYData) -> XYData:
        """
        Make predictions using the optimized pipeline.

        Args:
            x (XYData): The input features.

        Returns:
            XYData: The predictions.

        Raises:
            ValueError: If the pipeline is not fitted before predicting.
        """
        self._print_acction("Predicting with OptunaOptimizer...")
        if self._verbose:
            print(self.pipeline)

        if self.pipeline is not None:
            return self.pipeline.predict(x)
        else:
            raise ValueError("Pipeline must be fitted before predicting")

    def evaluate(
        self, x_data: XYData, y_true: XYData | None, y_pred: XYData
    ) -> Dict[str, Any]:
        """
        Evaluate the optimized pipeline.

        Args:
            x_data (XYData): The input features.
            y_true (XYData | None): The true target values (if applicable).
            y_pred (XYData): The predicted values.

        Returns:
            Dict[str, Any]: A dictionary containing evaluation metrics.

        Raises:
            ValueError: If the pipeline is not fitted before evaluating.
        """
        if self.pipeline is not None:
            return self.pipeline.evaluate(x_data, y_true, y_pred)
        else:
            raise ValueError("Pipeline must be fitted before evaluating")

    def start(
        self, x: XYData, y: XYData | None, X_: XYData | None
    ) -> XYData | None: ...

    def finish(self) -> None: ...

`direction = direction` `instance-attribute` ¶

`load_if_exists = load_if_exists` `instance-attribute` ¶

`n_trials = n_trials` `instance-attribute` ¶

`pipeline = pipeline` `instance-attribute` ¶

`reset_study = reset_study` `instance-attribute` ¶

`storage = storage` `instance-attribute` ¶

`study_name = study_name` `instance-attribute` ¶

`init(direction, n_trials=2, load_if_exists=False, reset_study=False, pipeline=None, study_name=None, storage=None)` ¶

Initialize the OptunaOptimizer.

Parameters:

Name	Type	Description	Default
`direction`	`str`	The direction of optimization ("minimize" or "maximize").	required
`n_trials`	`int`	The number of trials for the optimization process.	`2`
`load_if_exists`	`bool`	Whether to load an existing study if one exists.	`False`
`reset_study`	`bool`	Whether to reset the study before optimization.	`False`
`pipeline`	`BaseFilter \| None`	The pipeline to be optimized.	`None`
`study_name`	`str \| None`	The name of the Optuna study.	`None`
`storage`	`str \| None`	The storage URL for the Optuna study.	`None`

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def __init__(
    self,
    direction: str,
    n_trials: int = 2,
    load_if_exists: bool = False,
    reset_study: bool = False,
    pipeline: BaseFilter | None = None,
    study_name: str | None = None,
    storage: str | None = None,
):
    """
    Initialize the OptunaOptimizer.

    Args:
        direction (str): The direction of optimization ("minimize" or "maximize").
        n_trials (int): The number of trials for the optimization process.
        load_if_exists (bool): Whether to load an existing study if one exists.
        reset_study (bool): Whether to reset the study before optimization.
        pipeline (BaseFilter | None): The pipeline to be optimized.
        study_name (str | None): The name of the Optuna study.
        storage (str | None): The storage URL for the Optuna study.
    """
    super().__init__(direction=direction, study_name=study_name, storage=storage)
    self.direction = direction
    self.study_name = study_name
    self.storage = storage
    self.pipeline = pipeline
    self.n_trials = n_trials
    self.load_if_exists = load_if_exists
    self.reset_study = reset_study

`build_pipeline(dumped_pipeline, f)` ¶

Build a pipeline from a dumped configuration.

This method processes the dumped pipeline configuration, applies the provided callable to the grid parameters, and constructs a new BaseFilter object.

Parameters:

Name	Type	Description	Default
`dumped_pipeline`	`Dict[str, Any]`	The dumped pipeline configuration.	required
`f`	`Callable`	A function to apply to each grid parameter.	required

Returns:

Name	Type	Description
`BaseFilter`	`BaseFilter`	The constructed pipeline.

Note

This method uses the Container.pif for dependency injection when building the pipeline components.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def build_pipeline(
    self, dumped_pipeline: Dict[str, Any], f: Callable
) -> BaseFilter:
    """
    Build a pipeline from a dumped configuration.

    This method processes the dumped pipeline configuration, applies the provided
    callable to the grid parameters, and constructs a new BaseFilter object.

    Args:
        dumped_pipeline (Dict[str, Any]): The dumped pipeline configuration.
        f (Callable): A function to apply to each grid parameter.

    Returns:
        BaseFilter: The constructed pipeline.

    Note:
        This method uses the Container.pif for dependency injection when building
        the pipeline components.
    """
    self.get_grid(dumped_pipeline, f)

    pipeline: BaseFilter = cast(
        BaseFilter, BasePlugin.build_from_dump(dumped_pipeline, Container.pif)
    )
    return pipeline

`evaluate(x_data, y_true, y_pred)` ¶

Evaluate the optimized pipeline.

Parameters:

Name	Type	Description	Default
`x_data`	`XYData`	The input features.	required
`y_true`	`XYData \| None`	The true target values (if applicable).	required
`y_pred`	`XYData`	The predicted values.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: A dictionary containing evaluation metrics.

Raises:

Type	Description
`ValueError`	If the pipeline is not fitted before evaluating.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def evaluate(
    self, x_data: XYData, y_true: XYData | None, y_pred: XYData
) -> Dict[str, Any]:
    """
    Evaluate the optimized pipeline.

    Args:
        x_data (XYData): The input features.
        y_true (XYData | None): The true target values (if applicable).
        y_pred (XYData): The predicted values.

    Returns:
        Dict[str, Any]: A dictionary containing evaluation metrics.

    Raises:
        ValueError: If the pipeline is not fitted before evaluating.
    """
    if self.pipeline is not None:
        return self.pipeline.evaluate(x_data, y_true, y_pred)
    else:
        raise ValueError("Pipeline must be fitted before evaluating")

`finish()` ¶

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def finish(self) -> None: ...

`fit(x, y=None)` ¶

Perform the optimization and fit the best pipeline.

This method runs the Optuna optimization process and fits the best found pipeline.

Parameters:

Name	Type	Description	Default
`x`	`XYData`	The input features.	required
`y`	`XYData \| None`	The target values (if applicable).	`None`

Raises:

Type	Description
`ValueError`	If the pipeline is not defined before fitting.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def fit(self, x: XYData, y: XYData | None = None):
    """
    Perform the optimization and fit the best pipeline.

    This method runs the Optuna optimization process and fits the best found pipeline.

    Args:
        x (XYData): The input features.
        y (XYData | None): The target values (if applicable).

    Raises:
        ValueError: If the pipeline is not defined before fitting.
    """
    self._print_acction("Fitting with OptunaOptimizer...")
    if self._verbose:
        print(self.pipeline)

    if self.pipeline is not None:
        dumped_pipeline = self.pipeline.item_dump(include=["_grid"])
        print(dumped_pipeline)

        def objective(trial) -> Union[float, Sequence[float]]:
            def matcher(k, v):
                match v:
                    case list():
                        return trial.suggest_categorical(k, v)
                    case dict():
                        if type(v["low"]) is int and type(v["high"]) is int:
                            return trial.suggest_int(k, v["low"], v["high"])
                        elif type(v["low"]) is float and type(v["high"]) is float:
                            return trial.suggest_float(k, v["low"], v["high"])
                        else:
                            raise ValueError(
                                f"Inconsistent types in tuple: {k}: {v}"
                            )
                    case (min_v, max_v):
                        if type(min_v) is int and type(max_v) is int:
                            return trial.suggest_int(k, min_v, max_v)
                        elif type(min_v) is float and type(max_v) is float:
                            return trial.suggest_float(k, min_v, max_v)
                        else:
                            raise ValueError(
                                f"Inconsistent types in tuple: {k}: {v}"
                            )
                    case _:
                        raise ValueError(f"Unsupported type in grid: {k}: {v}")

            pipeline: BaseFilter = self.build_pipeline(dumped_pipeline, matcher)
            pipeline.verbose(False)

            match pipeline.fit(x, y):
                case None:
                    return float(
                        next(
                            iter(
                                pipeline.evaluate(
                                    x, y, pipeline.predict(x)
                                ).values()
                            )
                        )
                    )
                case float() as loss:
                    return loss
                case _:
                    raise ValueError("Unsupported type in pipeline.fit")

        self._study.optimize(
            objective, n_trials=self.n_trials, show_progress_bar=True
        )

        best_params = self._study.best_params
        if best_params:
            print(f"Best params: {best_params}")
            pipeline = self.build_pipeline(
                dumped_pipeline, lambda k, _: best_params[k]
            ).unwrap()
            pipeline.fit(x, y)
            self.pipeline = pipeline
        else:
            self.pipeline.unwrap().fit(x, y)
    else:
        raise ValueError("Pipeline must be defined before fitting")

`get_grid(aux, f)` ¶

Recursively process the grid configuration of a pipeline or filter.

This method traverses the configuration dictionary and applies the provided callable to each grid parameter.

Parameters:

Name	Type	Description	Default
`aux`	`Dict[str, Any]`	The configuration dictionary to process.	required
`f`	`Callable`	A function to apply to each grid parameter.	required

Note

This method modifies the input dictionary in-place.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def get_grid(self, aux: Dict[str, Any], f: Callable):
    """
    Recursively process the grid configuration of a pipeline or filter.

    This method traverses the configuration dictionary and applies the provided
    callable to each grid parameter.

    Args:
        aux (Dict[str, Any]): The configuration dictionary to process.
        f (Callable): A function to apply to each grid parameter.

    Note:
        This method modifies the input dictionary in-place.
    """
    match aux["params"]:
        case {"filters": filters, **r}:
            for filter_config in filters:
                self.get_grid(filter_config, f)
        case {"pipeline": pipeline, **r}:  # noqa: F841
            self.get_grid(pipeline, f)
        case p_params:
            if "_grid" in aux:
                for param, value in aux["_grid"].items():
                    value = f(param, value)
                    p_params.update({param: value})

`optimize(pipeline)` ¶

Set up the optimization process for a given pipeline.

This method prepares the Optuna study for optimization.

Parameters:

Name	Type	Description	Default
`pipeline`	`BaseFilter`	The pipeline to be optimized.	required

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def optimize(self, pipeline: BaseFilter):
    """
    Set up the optimization process for a given pipeline.

    This method prepares the Optuna study for optimization.

    Args:
        pipeline (BaseFilter): The pipeline to be optimized.
    """
    self.pipeline = pipeline
    self.pipeline._verbose = False

    if (
        self.reset_study
        and self.study_name is not None
        and self.storage is not None
    ):
        studies = optuna.get_all_study_summaries(storage=self.storage)

        if self.study_name in [study.study_name for study in studies]:
            optuna.delete_study(study_name=self.study_name, storage=self.storage)

    self._study = optuna.create_study(
        study_name=self.study_name,
        direction=self.direction,
        storage=self.storage,
        load_if_exists=self.load_if_exists,
    )

`predict(x)` ¶

Make predictions using the optimized pipeline.

Parameters:

Name	Type	Description	Default
`x`	`XYData`	The input features.	required

Returns:

Name	Type	Description
`XYData`	`XYData`	The predictions.

Raises:

Type	Description
`ValueError`	If the pipeline is not fitted before predicting.

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def predict(self, x: XYData) -> XYData:
    """
    Make predictions using the optimized pipeline.

    Args:
        x (XYData): The input features.

    Returns:
        XYData: The predictions.

    Raises:
        ValueError: If the pipeline is not fitted before predicting.
    """
    self._print_acction("Predicting with OptunaOptimizer...")
    if self._verbose:
        print(self.pipeline)

    if self.pipeline is not None:
        return self.pipeline.predict(x)
    else:
        raise ValueError("Pipeline must be fitted before predicting")

`start(x, y, X_)` ¶

Source code in framework3/plugins/optimizer/optuna_optimizer.py

def start(
    self, x: XYData, y: XYData | None, X_: XYData | None
) -> XYData | None: ...

OptunaOptimizer

framework3.plugins.optimizer.optuna_optimizer ¶

OptunaOptimizer ¶

direction = direction instance-attribute ¶

load_if_exists = load_if_exists instance-attribute ¶

n_trials = n_trials instance-attribute ¶

pipeline = pipeline instance-attribute ¶

reset_study = reset_study instance-attribute ¶

storage = storage instance-attribute ¶

study_name = study_name instance-attribute ¶

__init__(direction, n_trials=2, load_if_exists=False, reset_study=False, pipeline=None, study_name=None, storage=None) ¶

build_pipeline(dumped_pipeline, f) ¶

evaluate(x_data, y_true, y_pred) ¶

finish() ¶

fit(x, y=None) ¶

get_grid(aux, f) ¶

optimize(pipeline) ¶

predict(x) ¶

start(x, y, X_) ¶

`framework3.plugins.optimizer.optuna_optimizer` ¶

`OptunaOptimizer` ¶

`direction = direction` `instance-attribute` ¶

`load_if_exists = load_if_exists` `instance-attribute` ¶

`n_trials = n_trials` `instance-attribute` ¶

`pipeline = pipeline` `instance-attribute` ¶

`reset_study = reset_study` `instance-attribute` ¶

`storage = storage` `instance-attribute` ¶

`study_name = study_name` `instance-attribute` ¶

`init(direction, n_trials=2, load_if_exists=False, reset_study=False, pipeline=None, study_name=None, storage=None)` ¶

`build_pipeline(dumped_pipeline, f)` ¶

`evaluate(x_data, y_true, y_pred)` ¶

`finish()` ¶

`fit(x, y=None)` ¶

`get_grid(aux, f)` ¶

`optimize(pipeline)` ¶

`predict(x)` ¶

`start(x, y, X_)` ¶