Introduction to the Pipelines SDK
Beta
This Kubeflow component has beta status. See the Kubeflow versioning policies. The Kubeflow team is interested in your feedback about the usability of the feature.The Kubeflow Pipelines SDK provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.
SDK packages
The Kubeflow Pipelines SDK includes the following packages:
-
kfp.compilerincludes classes and methods for compiling pipeline Python DSL into a workflow yaml spec Methods in this package include, but are not limited to, the following:kfp.compiler.Compiler.compilecompiles your Python DSL code into a single static configuration (in YAML format) that the Kubeflow Pipelines service can process. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
-
kfp.componentsincludes classes and methods for interacting with pipeline components. Methods in this package include, but are not limited to, the following:-
kfp.components.func_to_container_opconverts a Python function to a pipeline component and returns a factory function. You can then call the factory function to construct an instance of a pipeline task (ContainerOp) that runs the original function in a container. -
kfp.components.load_component_from_fileloads a pipeline component from a file and returns a factory function. You can then call the factory function to construct an instance of a pipeline task (ContainerOp) that runs the component container image. -
kfp.components.load_component_from_urlloads a pipeline component from a URL and returns a factory function. You can then call the factory function to construct an instance of a pipeline task (ContainerOp) that runs the component container image.
-
-
kfp.dslcontains the domain-specific language (DSL) that you can use to define and interact with pipelines and components. Methods, classes, and modules in this package include, but are not limited to, the following:-
kfp.dsl.PipelineParamrepresents a pipeline parameter that you can pass from one pipeline component to another. See the guide to pipeline parameters. -
kfp.dsl.componentis a decorator for DSL functions that returns a pipeline component. (ContainerOp). -
kfp.dsl.pipelineis a decorator for Python functions that returns a pipeline. -
kfp.dsl.python_componentis a decorator for Python functions that adds pipeline component metadata to the function object. -
kfp.dsl.typescontains a list of types defined by the Kubeflow Pipelines SDK. Types include basic types likeString,Integer,Float, andBool, as well as domain-specific types likeGCPProjectIDandGCRPath. See the guide to DSL static type checking. -
kfp.dsl.ResourceOprepresents a pipeline task (op) which lets you directly manipulate Kubernetes resources (create,get,apply, …). -
kfp.dsl.VolumeOprepresents a pipeline task (op) which creates a newPersistentVolumeClaim(PVC). It aims to make the common case of creating aPersistentVolumeClaimfast. -
kfp.dsl.VolumeSnapshotOprepresents a pipeline task (op) which creates a newVolumeSnapshot. It aims to make the common case of creating aVolumeSnapshotfast. -
kfp.dsl.PipelineVolumerepresents a volume used to pass data between pipeline steps.ContainerOps can mount aPipelineVolumeeither via the constructor’s argumentpvolumesoradd_pvolumes()method. -
kfp.dsl.ParallelForrepresents a parallel for loop over a static or dynamic set of items in a pipeline. Each iteration of the for loop is executed in parallel. -
kfp.dsl.ExitHandlerrepresents an exit handler that is invoked upon exiting a pipeline. A typical usage ofExitHandleris garbage collection. -
kfp.dsl.Conditionrepresents a group of ops, that will only be executed when a certain condition is met. The condition specified need to be determined at runtime, by incorporating at least one task output, or PipelineParam in the boolean expression.
-
-
kfp.Clientcontains the Python client libraries for the Kubeflow Pipelines API. Methods in this package include, but are not limited to, the following:kfp.Client.create_experimentcreates a pipeline experiment and returns an experiment object.kfp.Client.run_pipelineruns a pipeline and returns a run object.kfp.Client.create_run_from_pipeline_funccompiles a pipeline function and submits it for execution on Kubeflow Pipelines.kfp.Client.create_run_from_pipeline_packageruns a local pipeline package on Kubeflow Pipelines.kfp.Client.upload_pipelineuploads a local file to create a new pipeline in Kubeflow Pipelines.kfp.Client.upload_pipeline_versionuploads a local file to create a pipeline version. Follow an example to learn more about creating a pipeline version
-
Kubeflow Pipelines extension modules include classes and functions for specific platforms on which you can use Kubeflow Pipelines. Examples include utility functions for on premises, Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure.
-
Kubeflow Pipelines diagnose_me modulesinclude classes and functions that help with environment diagnostic tasks.
kfp.cli.diagnose_me.dev_envreports on diagnostic metadata from your development environment, such as your python library version.kfp.cli.diagnose_me.kubernetes_clusterreports on diagnostic data from your Kubernetes cluster, such as Kubernetes secrets.kfp.cli.diagnose_me.gcpreports on diagnostic data related to your GCP environment.
Kubeflow Pipelines CLI tool
The Kubeflow Pipelines CLI tool enables you to use a subset of the Kubeflow Pipelines SDK directly from the command line. The Kubeflow Pipelines CLI tool provides the following commands:
-
kfp diagnose_meruns environment diagnostic with specified parameters.--json- Indicates that this command must return its results as JSON. Otherwise, results are returned in human readable format.--namespace TEXT- Specifies the Kubernetes namespace to use. all-namespaces is the default value.--project-id TEXT- For GCP deployments, this value specifies the GCP project to use. If this value is not specified, the environment default is used.
-
kfp pipeline <COMMAND>provides the following commands to help you manage pipelines.get- Gets detailed information about a Kubeflow pipeline from your Kubeflow Pipelines cluster.list- Lists the pipelines that have been uploaded to your Kubeflow Pipelines cluster.upload- Uploads a pipeline to your Kubeflow Pipelines cluster.
-
kfp run <COMMAND>provides the following commands to help you manage pipeline runs.get- Displays the details of a pipeline run.list- Lists recent pipeline runs.submit- Submits a pipeline run.
-
kfp --endpoint <ENDPOINT>- Specifies the endpoint that the Kubeflow Pipelines CLI should connect to.
Installing the SDK
Follow the guide to installing the Kubeflow Pipelines SDK.
Building pipelines and components
This section summarizes the ways you can use the SDK to build pipelines and components:
- Creating components from existing application code
- Creating components within your application code
- Creating lightweight components
- Using prebuilt, reusuable components in your pipeline
The diagrams provide a conceptual guide to the relationships between the following concepts:
- Your Python code
- A pipeline component
- A Docker container image
- A pipeline
Creating components from existing application code
This section describes how to create a component and a pipeline outside your Python application, by creating components from existing containerized applications. This technique is useful when you have already created a TensorFlow program, for example, and you want to use it in a pipeline.
Below is a more detailed explanation of the above diagram:
-
Write your application code,
my-app-code.py. For example, write code to transform data or train a model. -
Create a Docker container image that packages your program (
my-app-code.py) and upload the container image to a registry. To build a container image based on a given Dockerfile, you can use the Docker command-line interface or thekfp.compiler.build_docker_imagemethod from the Kubeflow Pipelines SDK. -
Write a component function using the Kubeflow Pipelines DSL to define your pipeline’s interactions with the component’s Docker container. Your component function must return a
kfp.dsl.ContainerOp. Optionally, you can use thekfp.dsl.componentdecorator to enable static type checking in the DSL compiler. To use the decorator, you can add the@kfp.dsl.componentannotation to your component function:@kfp.dsl.component def my_component(my_param): ... return kfp.dsl.ContainerOp( name='My component name', image='gcr.io/path/to/container/image' ) -
Write a pipeline function using the Kubeflow Pipelines DSL to define the pipeline and include all the pipeline components. Use the
kfp.dsl.pipelinedecorator to build a pipeline from your pipeline function. To use the decorator, you can add the@kfp.dsl.pipelineannotation to your pipeline function:@kfp.dsl.pipeline( name='My pipeline', description='My machine learning pipeline' ) def my_pipeline(param_1: PipelineParam, param_2: PipelineParam): my_step = my_component(my_param='a') -
Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
To compile the pipeline, you can choose one of the following options:
-
Use the
kfp.compiler.Compiler.compilemethod:kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip') -
Alternatively, use the
dsl-compilecommand on the command line.dsl-compile --py [path/to/python/file] --output my-pipeline.zip
-
-
Use the Kubeflow Pipelines SDK to run the pipeline:
client = kfp.Client() my_experiment = client.create_experiment(name='demo') my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 'my-pipeline.zip')
You can also choose to share your pipeline as follows:
- Upload the pipeline zip file to the Kubeflow Pipelines UI. For more information about the UI, see the Kubeflow Pipelines quickstart guide.
- Upload the pipeline zip file to a shared repository. See the reusable components and other shared resources.
More about the above workflow
For more detailed instructions, see the guide to building components and pipelines.
For an example, see the
xgboost-training-cm.py
pipeline sample on GitHub. The pipeline creates an XGBoost model using
structured data in CSV format.
Creating components within your application code
This section describes how to create a pipeline component inside your Python application, as part of the application. The DSL code for creating a component therefore runs inside your Docker container.
Below is a more detailed explanation of the above diagram:
-
Write your code in a Python function. For example, write code to transform data or train a model:
def my_python_func(a: str, b: str) -> str: ... -
Use the
kfp.dsl.python_componentdecorator to convert your Python function into a pipeline component. To use the decorator, you can add the@kfp.dsl.python_componentannotation to your function:@kfp.dsl.python_component( name='My awesome component', description='Come and play', ) def my_python_func(a: str, b: str) -> str: ... -
Use
kfp.compiler.build_python_componentto create a container image for the component.my_op = compiler.build_python_component( component_func=my_python_func, staging_gcs_path=OUTPUT_DIR, target_image=TARGET_IMAGE) -
Write a pipeline function using the Kubeflow Pipelines DSL to define the pipeline and include all the pipeline components. Use the
kfp.dsl.pipelinedecorator to build a pipeline from your pipeline function, by adding the@kfp.dsl.pipelineannotation to your pipeline function:@kfp.dsl.pipeline( name='My pipeline', description='My machine learning pipeline' ) def my_pipeline(param_1: PipelineParam, param_2: PipelineParam): my_step = my_op(a='a', b='b') -
Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
To compile the pipeline, you can choose one of the following options:
-
Use the
kfp.compiler.Compiler.compilemethod:kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip') -
Alternatively, use the
dsl-compilecommand on the command line.dsl-compile --py [path/to/python/file] --output my-pipeline.zip
-
-
Use the Kubeflow Pipelines SDK to run the pipeline:
client = kfp.Client() my_experiment = client.create_experiment(name='demo') my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 'my-pipeline.zip')
You can also choose to share your pipeline as follows:
- Upload the pipeline zip file to the Kubeflow Pipelines UI. For more information about the UI, see the Kubeflow Pipelines quickstart guide.
- Upload the pipeline zip file to a shared repository. See the reusable components and other shared resources.
More about the above workflow
For an example of the above workflow, see the Jupyter notebook titled KubeFlow Pipelines container building on GitHub.Creating lightweight components
This section describes how to create lightweight Python components that do not require you to build a container image. Lightweight components simplify prototyping and rapid development, especially in a Jupyter notebook environment.
Below is a more detailed explanation of the above diagram:
-
Write your code in a Python function. For example, write code to transform data or train a model:
def my_python_func(a: str, b: str) -> str: ... -
Use
kfp.components.func_to_container_opto convert your Python function into a pipeline component:my_op = kfp.components.func_to_container_op(my_python_func)Optionally, you can write the component to a file that you can share or use in another pipeline:
my_op = kfp.components.func_to_container_op(my_python_func, output_component_file='my-op.component') -
If you stored your lightweight component in a file as described in the previous step, use
kfp.components.load_component_from_fileto load the component:my_op = kfp.components.load_component_from_file('my-op.component') -
Write a pipeline function using the Kubeflow Pipelines DSL to define the pipeline and include all the pipeline components. Use the
kfp.dsl.pipelinedecorator to build a pipeline from your pipeline function, by adding the@kfp.dsl.pipelineannotation to your pipeline function:@kfp.dsl.pipeline( name='My pipeline', description='My machine learning pipeline' ) def my_pipeline(param_1: PipelineParam, param_2: PipelineParam): my_step = my_op(a='a', b='b') -
Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
To compile the pipeline, you can choose one of the following options:
-
Use the
kfp.compiler.Compiler.compilemethod:kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip') -
Alternatively, use the
dsl-compilecommand on the command line.dsl-compile --py [path/to/python/file] --output my-pipeline.zip
-
-
Use the Kubeflow Pipelines SDK to run the pipeline:
client = kfp.Client() my_experiment = client.create_experiment(name='demo') my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 'my-pipeline.zip')
More about the above workflow
For more detailed instructions, see the guide to building lightweight components.
For an example, see the Lightweight Python components - basics notebook on GitHub.
Using prebuilt, reusable components in your pipeline
A reusable component is one that someone has built and made available for others to use. To use the component in your pipeline, you need the YAML file that defines the component.
Below is a more detailed explanation of the above diagram:
-
Find the YAML file that defines the reusable component. For example, take a look at the reusable components and other shared resources.
-
Use
kfp.components.load_component_from_urlto load the component:my_op = kfp.components.load_component_from_url('https://path/to/component.yaml') -
Write a pipeline function using the Kubeflow Pipelines DSL to define the pipeline and include all the pipeline components. Use the
kfp.dsl.pipelinedecorator to build a pipeline from your pipeline function, by adding the@kfp.dsl.pipelineannotation to your pipeline function:@kfp.dsl.pipeline( name='My pipeline', description='My machine learning pipeline' ) def my_pipeline(param_1: PipelineParam, param_2: PipelineParam): my_step = my_op(a='a', b='b') -
Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
To compile the pipeline, you can choose one of the following options:
-
Use the
kfp.compiler.Compiler.compilemethod:kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip') -
Alternatively, use the
dsl-compilecommand on the command line.dsl-compile --py [path/to/python/file] --output my-pipeline.zip
-
-
Use the Kubeflow Pipelines SDK to run the pipeline:
client = kfp.Client() my_experiment = client.create_experiment(name='demo') my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 'my-pipeline.zip')
More about the above workflow
For an example, see thexgboost-training-cm.py
pipeline sample on GitHub. The pipeline creates an XGBoost model using
structured data in CSV format.
Next steps
- Use pipeline parameters to pass data between components.
- Learn how to write recursive functions in the DSL.
- Build a reusable component for sharing in multiple pipelines.
- Find out how to use the DSL to manipulate Kubernetes resources dynamically as steps of your pipeline.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.