LabChain API Documentation¶

Welcome to the API documentation for LabChain. This comprehensive guide details the modules, classes, and functions that form the backbone of LabChain, enabling you to build, extend, and customize ML experimentation workflows efficiently.

Table of Contents¶

Base Classes
Container & Dependency Injection
Persistent Storage & Remote Injection ⚡ New
Plugins
Pipelines
Filters
Metrics
Optimizers
Splitters
Storage
Utilities
Using the API

Base Classes¶

The foundation of LabChain is built on these abstract base classes:

Types - Core data structures and type definitions.
Classes - Abstract base class for all components.
Pipeline - Base class for creating pipelines.
Filter - Abstract class for all filter implementations.
Metric - Base class for metric implementations.
Optimizer - Abstract base for optimization algorithms.
Splitter - Base class for data splitting strategies.
Factory - Factory classes for component creation.
Storage - Abstract base for storage implementations.

Container & Dependency Injection¶

The core of LabChain's component management:

Container - Main class for dependency injection and component management.
Overload - Utilities for method overloading in the container.

Persistent Storage & Remote Injection¶

Experimental Feature

Remote Injection is currently an experimental feature. See the Remote Injection Guide for important limitations and best practices.

Classes and systems for persistent class storage with version control:

Remote Injection Guide ⚡ - Complete guide to deploying pipelines without source code
PetClassManager - Manager for class serialization and storage operations
PetFactory - Persistent factory with automatic version tracking and lazy loading

Quick Example¶

from labchain import Container
from labchain.base import BaseFilter

# Enable persistence for custom classes
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
    def predict(self, x):
        return x * 2

# Push to storage
Container.ppif.push_all()

# On remote server (no source code needed!)
from labchain.base import BasePlugin
pipeline = BasePlugin.build_from_dump(config, Container.ppif)

Plugins¶

Pipelines¶

Pipelines orchestrate the data flow through various processing steps:

Parallel Pipelines
MonoPipeline - For parallel processing of independent tasks.
HPCPipeline - Optimized for high-performance computing environments.
Sequential Pipeline
F3Pipeline - The basic sequential pipeline.

Filters¶

Modular processing units that can be composed together within pipelines:

Metrics¶

Metrics evaluate model performance across various tasks:

Optimizers¶

Optimizers help fine-tune hyperparameters for optimal performance:

Splitters¶

Splitters divide the dataset for cross-validation and evaluation:

Storage¶

Storage plugins for data persistence and remote class storage:

Local Storage - Local filesystem storage
S3 Storage - Amazon S3 cloud storage

Utilities¶

Additional utility functions and helpers that support the framework:

Using the API¶

Standard Component Registration¶

To utilize any component of LabChain, import it from the respective module and register it with the Container:

from labchain.container import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric

@Container.bind()
class MyFilter(BaseFilter):
    def fit(self, x, y):
        pass

    def predict(self, x):
        return x

@Container.bind()
class MyPipeline(BasePipeline):
    # Custom pipeline implementation
    pass

@Container.bind()
class MyMetric(BaseMetric):
    def evaluate(self, x_data, y_true, y_pred):
        return 0.95

# Retrieve components
my_filter = Container.ff["MyFilter"]()
my_pipeline = Container.pf["MyPipeline"]()
my_metric = Container.mf["MyMetric"]()

Persistent Component Registration (Experimental)¶

For components that need to be deployed remotely or shared across environments:

from labchain.container import Container
from labchain.base import BaseFilter
from labchain.storage import S3Storage

# Configure shared storage
Container.storage = S3Storage(bucket="my-ml-models")

# Register with persistence enabled
@Container.bind(persist=True)
class MyPersistentFilter(BaseFilter):
    def __init__(self, threshold: float = 0.5):
        super().__init__(threshold=threshold)

    def predict(self, x):
        return x.value > self.threshold

# Push to storage
Container.ppif.push_all()

# Later, on any machine with access to the same storage
my_filter = Container.ppif["MyPersistentFilter"]()
# Class automatically loaded from storage if not in memory

API Organization¶

By Functionality¶

Data Processing: Filters, Transformations
Model Evaluation: Metrics, Splitters
Workflow Orchestration: Pipelines, Optimizers
Persistence & Storage: Storage Backends, Remote Injection
Infrastructure: Container, Base Classes

By Use Case¶

Classification Tasks: Classification Filters, Classification Metrics
Clustering Tasks: Clustering Filters, Clustering Metrics
Hyperparameter Tuning: GridOptimizer, OptunaOptimizer
Distributed Computing: HPCPipeline, PySpark Utilities
Remote Deployment: Remote Injection, S3 Storage

Quick Reference¶

Most Common Operations¶

Operation	Code
Register a filter	`@Container.bind()` `class MyFilter(BaseFilter): ...`
Create a pipeline	`F3Pipeline(filters=[...], metrics=[...])`
Enable persistence	`@Container.bind(persist=True)`
Push to storage	`Container.ppif.push_all()`
Load from storage	`Container.ppif["ClassName"]`
Reconstruct pipeline	`BasePlugin.build_from_dump(config, Container.ppif)`
Check version status	`Container.pcm.check_status(MyClass)`
Get class hash	`Container.pcm.get_class_hash(MyClass)`

Import Shortcuts¶

# Core functionality
from labchain import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric, XYData

# Common pipelines
from labchain.pipeline import F3Pipeline, MonoPipeline, HPCPipeline

# Storage
from labchain.storage import LocalStorage, S3Storage

# Common filters
from labchain.plugins.filters import (
    StandardScalerPlugin,
    PCAPlugin,
    KnnFilter,
    ClassifierSVMPlugin
)

# Common metrics
from labchain.plugins.metrics import F1, Precision, Recall

Additional Resources¶

Contributing to the Documentation

Found an error or want to improve the documentation? Contributions are welcome!

📝 Edit on GitHub
🐛 Report issues on GitHub Issues