HPCPipeline¶
The HPCPipeline
class is part of the framework3.plugins.pipelines.parallel
module and is designed to facilitate high-performance computing (HPC) tasks within the framework. This pipeline is optimized for parallel processing, allowing for efficient execution of complex workflows across distributed systems.
Module Contents¶
framework3.plugins.pipelines.parallel.parallel_hpc_pipeline
¶
HPCPipeline
¶
Bases: ParallelPipeline
High Performance Computing Pipeline using MapReduce for parallel feature extraction.
This pipeline applies a sequence of filters to the input data using a MapReduce approach, enabling parallel processing and potentially improved performance on large datasets.
Key Features
- Parallel processing of filters using MapReduce
- Scalable to large datasets
- Configurable number of partitions for optimization
Usage
from framework3.plugins.pipelines.parallel import HPCPipeline
from framework3.base import XYData
filters = [Filter1(), Filter2(), Filter3()]
pipeline = HPCPipeline(filters, app_name="MyApp", master="local[*]", numSlices=8)
x_data = XYData(...)
y_data = XYData(...)
pipeline.fit(x_data, y_data)
predictions = pipeline.predict(x_data)
Attributes:
Name | Type | Description |
---|---|---|
filters |
Sequence[BaseFilter]
|
A sequence of filters to be applied to the input data. |
numSlices |
int
|
The number of partitions to use in the MapReduce process. |
app_name |
str
|
The name of the Spark application. |
master |
str
|
The Spark master URL. |
Methods:
Name | Description |
---|---|
fit |
XYData, y: Optional[XYData]): Fit the filters in parallel using MapReduce. |
predict |
XYData) -> XYData: Make predictions using the fitted filters in parallel. |
evaluate |
XYData, y_true: XYData | None, y_pred: XYData) -> Dict[str, Any]: Evaluate the pipeline using provided metrics. |
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
__init__(filters, app_name, master='local', numSlices=4)
¶
Initialize the HPCPipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters
|
Sequence[BaseFilter]
|
A sequence of filters to be applied to the input data. |
required |
app_name
|
str
|
The name of the Spark application. |
required |
master
|
str
|
The Spark master URL. Defaults to "local". |
'local'
|
numSlices
|
int
|
The number of partitions to use in the MapReduce process. Defaults to 4. |
4
|
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
evaluate(x_data, y_true, y_pred)
¶
Evaluate the pipeline using the provided metrics.
This method applies each metric in the pipeline to the predicted and true values, returning a dictionary of evaluation results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_data
|
XYData
|
Input data used for evaluation. |
required |
y_true
|
XYData | None
|
True target data, if available. |
required |
y_pred
|
XYData
|
Predicted target data. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the evaluation results for each metric. |
Example
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
finish()
¶
Finish the pipeline's execution.
This method is called to perform any necessary cleanup or finalization steps.
fit(x, y)
¶
Fit the filters in the pipeline to the input data using MapReduce.
This method applies the fit operation to all filters in parallel using MapReduce, allowing for efficient processing of large datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data to fit the filters on. |
required |
y
|
XYData | None
|
The target data, if available. |
required |
Note
This method updates the filters in place with their trained versions.
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
log_metrics()
¶
Log metrics for the pipeline.
This method can be implemented to log any relevant metrics during the pipeline's execution.
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
predict(x)
¶
Make predictions using the fitted filters in parallel.
This method applies the predict operation to all filters in parallel using MapReduce, then combines the results into a single prediction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data to make predictions on. |
required |
Returns:
Name | Type | Description |
---|---|---|
XYData |
XYData
|
The combined predictions from all filters. |
Note
The predictions from each filter are stacked horizontally to form the final output.
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
start(x, y, X_)
¶
Start the pipeline execution by fitting the model and making predictions.
This method initiates the pipeline process by fitting the model to the input data and then making predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
XYData
|
The input data for fitting and prediction. |
required |
y
|
XYData | None
|
The target data for fitting, if available. |
required |
X_
|
XYData | None
|
Additional input data for prediction, if different from x. |
required |
Returns:
Type | Description |
---|---|
XYData | None
|
XYData | None: The predictions made by the pipeline, or None if an error occurs. |
Raises:
Type | Description |
---|---|
Exception
|
If an error occurs during the fitting or prediction process. |
Source code in framework3/plugins/pipelines/parallel/parallel_hpc_pipeline.py
Class Hierarchy¶
- HPCPipeline
HPCPipeline¶
HPCPipeline
extends BasePipeline
and provides functionality for executing tasks in parallel, leveraging HPC resources to improve performance and scalability.
Key Methods:¶
init()
: Initializes the pipeline, setting up necessary resources and configurations for HPC execution.log_metrics()
: Logs relevant metrics during the pipeline's execution to monitor performance and resource utilization.
Usage Examples¶
Creating and Using HPCPipeline¶
from framework3.plugins.pipelines.parallel.hpc_pipeline import HPCPipeline
# Initialize the HPCPipeline
hpc_pipeline = HPCPipeline()
# Perform necessary setup
hpc_pipeline.init()
# Execute the pipeline
# (Assuming tasks and configurations are defined elsewhere)
hpc_pipeline.execute()
# Log metrics
hpc_pipeline.log_metrics()
Best Practices¶
- Ensure that your HPC environment is properly configured and accessible before initializing the pipeline.
- Define tasks and dependencies clearly to maximize parallel execution efficiency.
- Monitor resource utilization and adjust configurations as needed to optimize performance.
Conclusion¶
HPCPipeline
provides a robust solution for executing high-performance computing tasks within the framework3
ecosystem. By following the best practices and examples provided, you can effectively leverage HPC resources to enhance your machine learning workflows.