LabChain API Documentation¶
Welcome to the API documentation for LabChain. This comprehensive guide details the modules, classes, and functions that form the backbone of LabChain, enabling you to build, extend, and customize ML experimentation workflows efficiently.
Table of Contents¶
- Base Classes
- Container & Dependency Injection
- Persistent Storage & Remote Injection ⚡ New
- Plugins
- Pipelines
- Filters
- Metrics
- Optimizers
- Splitters
- Storage
- Utilities
- Using the API
Base Classes¶
The foundation of LabChain is built on these abstract base classes:
- Types - Core data structures and type definitions.
- Classes - Abstract base class for all components.
- Pipeline - Base class for creating pipelines.
- Filter - Abstract class for all filter implementations.
- Metric - Base class for metric implementations.
- Optimizer - Abstract base for optimization algorithms.
- Splitter - Base class for data splitting strategies.
- Factory - Factory classes for component creation.
- Storage - Abstract base for storage implementations.
Container & Dependency Injection¶
The core of LabChain's component management:
- Container - Main class for dependency injection and component management.
- Overload - Utilities for method overloading in the container.
Persistent Storage & Remote Injection¶
Experimental Feature
Remote Injection is currently an experimental feature. See the Remote Injection Guide for important limitations and best practices.
Classes and systems for persistent class storage with version control:
- Remote Injection Guide ⚡ - Complete guide to deploying pipelines without source code
- PetClassManager - Manager for class serialization and storage operations
- PetFactory - Persistent factory with automatic version tracking and lazy loading
Quick Example¶
from labchain import Container
from labchain.base import BaseFilter
# Enable persistence for custom classes
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
def predict(self, x):
return x * 2
# Push to storage
Container.ppif.push_all()
# On remote server (no source code needed!)
from labchain.base import BasePlugin
pipeline = BasePlugin.build_from_dump(config, Container.ppif)
Plugins¶
Pipelines¶
Pipelines orchestrate the data flow through various processing steps:
- Parallel Pipelines
- MonoPipeline - For parallel processing of independent tasks.
- HPCPipeline - Optimized for high-performance computing environments.
- Sequential Pipeline
- F3Pipeline - The basic sequential pipeline.
Filters¶
Modular processing units that can be composed together within pipelines:
- Classification Filters
- Clustering Filters
- Regression Filters
- Transformation Filters
- Text Processing Filters
- Cache Filters
- CachedFilter
- Grid Search Filters
- GridSearchCVFilter
Metrics¶
Metrics evaluate model performance across various tasks:
Optimizers¶
Optimizers help fine-tune hyperparameters for optimal performance:
Splitters¶
Splitters divide the dataset for cross-validation and evaluation:
Storage¶
Storage plugins for data persistence and remote class storage:
- Local Storage - Local filesystem storage
- S3 Storage - Amazon S3 cloud storage
Utilities¶
Additional utility functions and helpers that support the framework:
- PySpark Utilities
- Weights & Biases Integration
- Typeguard for Notebooks
- Scikit-learn Estimator Utilities
- General Utilities
Using the API¶
Standard Component Registration¶
To utilize any component of LabChain, import it from the respective module and register it with the Container:
from labchain.container import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric
@Container.bind()
class MyFilter(BaseFilter):
def fit(self, x, y):
pass
def predict(self, x):
return x
@Container.bind()
class MyPipeline(BasePipeline):
# Custom pipeline implementation
pass
@Container.bind()
class MyMetric(BaseMetric):
def evaluate(self, x_data, y_true, y_pred):
return 0.95
# Retrieve components
my_filter = Container.ff["MyFilter"]()
my_pipeline = Container.pf["MyPipeline"]()
my_metric = Container.mf["MyMetric"]()
Persistent Component Registration (Experimental)¶
For components that need to be deployed remotely or shared across environments:
from labchain.container import Container
from labchain.base import BaseFilter
from labchain.storage import S3Storage
# Configure shared storage
Container.storage = S3Storage(bucket="my-ml-models")
# Register with persistence enabled
@Container.bind(persist=True)
class MyPersistentFilter(BaseFilter):
def __init__(self, threshold: float = 0.5):
super().__init__(threshold=threshold)
def predict(self, x):
return x.value > self.threshold
# Push to storage
Container.ppif.push_all()
# Later, on any machine with access to the same storage
my_filter = Container.ppif["MyPersistentFilter"]()
# Class automatically loaded from storage if not in memory
API Organization¶
By Functionality¶
- Data Processing: Filters, Transformations
- Model Evaluation: Metrics, Splitters
- Workflow Orchestration: Pipelines, Optimizers
- Persistence & Storage: Storage Backends, Remote Injection
- Infrastructure: Container, Base Classes
By Use Case¶
- Classification Tasks: Classification Filters, Classification Metrics
- Clustering Tasks: Clustering Filters, Clustering Metrics
- Hyperparameter Tuning: GridOptimizer, OptunaOptimizer
- Distributed Computing: HPCPipeline, PySpark Utilities
- Remote Deployment: Remote Injection, S3 Storage
Quick Reference¶
Most Common Operations¶
| Operation | Code |
|---|---|
| Register a filter | @Container.bind()class MyFilter(BaseFilter): ... |
| Create a pipeline | F3Pipeline(filters=[...], metrics=[...]) |
| Enable persistence | @Container.bind(persist=True) |
| Push to storage | Container.ppif.push_all() |
| Load from storage | Container.ppif["ClassName"] |
| Reconstruct pipeline | BasePlugin.build_from_dump(config, Container.ppif) |
| Check version status | Container.pcm.check_status(MyClass) |
| Get class hash | Container.pcm.get_class_hash(MyClass) |
Import Shortcuts¶
# Core functionality
from labchain import Container
from labchain.base import BaseFilter, BasePipeline, BaseMetric, XYData
# Common pipelines
from labchain.pipeline import F3Pipeline, MonoPipeline, HPCPipeline
# Storage
from labchain.storage import LocalStorage, S3Storage
# Common filters
from labchain.plugins.filters import (
StandardScalerPlugin,
PCAPlugin,
KnnFilter,
ClassifierSVMPlugin
)
# Common metrics
from labchain.plugins.metrics import F1, Precision, Recall
Additional Resources¶
- 📘 Quick Start Guide
- 🎓 Tutorials & Examples
- 🏗️ Architecture Overview
- ⚡ Remote Injection Guide (Experimental)
Contributing to the Documentation
Found an error or want to improve the documentation? Contributions are welcome!
- 📝 Edit on GitHub
- 🐛 Report issues on GitHub Issues