Skip to content

LabChain API Documentation

Welcome to the API documentation for LabChain. This comprehensive guide details the modules, classes, and functions that form the backbone of LabChain, enabling you to build, extend, and customize ML experimentation workflows efficiently.


Table of Contents


Base Classes

The foundation of LabChain is built on these abstract base classes:

  • Types - Core data structures and type definitions.
  • Classes - Abstract base class for all components.
  • Pipeline - Base class for creating pipelines.
  • Filter - Abstract class for all filter implementations.
  • Metric - Base class for metric implementations.
  • Optimizer - Abstract base for optimization algorithms.
  • Splitter - Base class for data splitting strategies.
  • Factory - Factory classes for component creation.
  • Storage - Abstract base for storage implementations.

Container & Dependency Injection

The core of LabChain's component management:

  • Container - Main class for dependency injection and component management.
  • Overload - Utilities for method overloading in the container.

Persistent Storage & Remote Injection

Experimental Feature

Remote Injection is currently an experimental feature. See the Remote Injection Guide for important limitations and best practices.

Classes and systems for persistent class storage with version control:

  • Remote Injection Guide ⚡ - Complete guide to deploying pipelines without source code
  • PetClassManager - Manager for class serialization and storage operations
  • PetFactory - Persistent factory with automatic version tracking and lazy loading

Quick Example

from labchain import Container
from labchain.base import BaseFilter

# Enable persistence for custom classes
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
    def predict(self, x):
        return x * 2

# Push to storage
Container.ppif.push_all()

# On remote server (no source code needed!)
from labchain.base import BasePlugin
pipeline = BasePlugin.build_from_dump(config, Container.ppif)

Plugins

Pipelines

Pipelines orchestrate the data flow through various processing steps:

  • Parallel Pipelines
  • MonoPipeline - For parallel processing of independent tasks.
  • HPCPipeline - Optimized for high-performance computing environments.
  • Sequential Pipeline
  • F3Pipeline - The basic sequential pipeline.

Filters

Modular processing units that can be composed together within pipelines:

Metrics

Metrics evaluate model performance across various tasks:

Optimizers

Optimizers help fine-tune hyperparameters for optimal performance:

Splitters

Splitters divide the dataset for cross-validation and evaluation:

Storage

Storage plugins for data persistence and remote class storage:

Utilities

Additional utility functions and helpers that support the framework:


Using the API

Standard Component Registration

To utilize any component of LabChain, import it from the respective module and register it with the Container:

from labchain.container import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric

@Container.bind()
class MyFilter(BaseFilter):
    def fit(self, x, y):
        pass

    def predict(self, x):
        return x

@Container.bind()
class MyPipeline(BasePipeline):
    # Custom pipeline implementation
    pass

@Container.bind()
class MyMetric(BaseMetric):
    def evaluate(self, x_data, y_true, y_pred):
        return 0.95

# Retrieve components
my_filter = Container.ff["MyFilter"]()
my_pipeline = Container.pf["MyPipeline"]()
my_metric = Container.mf["MyMetric"]()

Persistent Component Registration (Experimental)

For components that need to be deployed remotely or shared across environments:

from labchain.container import Container
from labchain.base import BaseFilter
from labchain.storage import S3Storage

# Configure shared storage
Container.storage = S3Storage(bucket="my-ml-models")

# Register with persistence enabled
@Container.bind(persist=True)
class MyPersistentFilter(BaseFilter):
    def __init__(self, threshold: float = 0.5):
        super().__init__(threshold=threshold)

    def predict(self, x):
        return x.value > self.threshold

# Push to storage
Container.ppif.push_all()

# Later, on any machine with access to the same storage
my_filter = Container.ppif["MyPersistentFilter"]()
# Class automatically loaded from storage if not in memory


API Organization

By Functionality

By Use Case


Quick Reference

Most Common Operations

Operation Code
Register a filter @Container.bind()
class MyFilter(BaseFilter): ...
Create a pipeline F3Pipeline(filters=[...], metrics=[...])
Enable persistence @Container.bind(persist=True)
Push to storage Container.ppif.push_all()
Load from storage Container.ppif["ClassName"]
Reconstruct pipeline BasePlugin.build_from_dump(config, Container.ppif)
Check version status Container.pcm.check_status(MyClass)
Get class hash Container.pcm.get_class_hash(MyClass)

Import Shortcuts

# Core functionality
from labchain import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric, XYData

# Common pipelines
from labchain.pipeline import F3Pipeline, MonoPipeline, HPCPipeline

# Storage
from labchain.storage import LocalStorage, S3Storage

# Common filters
from labchain.plugins.filters import (
    StandardScalerPlugin,
    PCAPlugin,
    KnnFilter,
    ClassifierSVMPlugin
)

# Common metrics
from labchain.plugins.metrics import F1, Precision, Recall

Additional Resources


Contributing to the Documentation

Found an error or want to improve the documentation? Contributions are welcome!