MPIJob

Reference documentation for MPIJob

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

Packages:

kubeflow.org

Package v1alpha2 is the v1alpha2 version of the API.

Resource Types:

Represents a MPIJob resource.

Field Description

apiVersion
string kubeflow.org/v1alpha2

kind
string MPIJob

metadata
Kubernetes meta/v1.ObjectMeta

Standard Kubernetes object’s metadata.

Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
MPIJobSpec

Specification of the desired state of the MPIJob.

`activeDeadlineSeconds` int64	(Optional) Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.
`backoffLimit` int32	(Optional) Number of retries before marking this job as failed.
`cleanPodPolicy` common/v1.CleanPodPolicy	Defines the policy for cleaning up pods after the MPIJob completes. Defaults to None.
`slotsPerWorker` int32	(Optional) Specifies the number of slots per worker used in hostfile. Defaults to 1.
`mainContainer` string	(Optional) Specifies name of the main container which executes the MPI code.
`runPolicy` common/v1.RunPolicy	(Optional) Encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`mpiReplicaSpecs` map[github.com/kubeflow/mpi-operator/pkg/apis/kubeflow/v1alpha2.MPIReplicaType]github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec*	A map of MPIReplicaType (type) to ReplicaSpec (value). Specifies the MPI cluster configuration. For example, { “Launcher”: MPIReplicaSpec, “Worker”: MPIReplicaSpec, }

status
common/v1.JobStatus

Most recently observed status of the MPIJob. Read-only (modified by the system).

(Appears on: MPIJob)

MPIJobSpec is a desired state description of the MPIJob.

Field	Description
`activeDeadlineSeconds` int64	(Optional) Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.
`backoffLimit` int32	(Optional) Number of retries before marking this job as failed.
`cleanPodPolicy` common/v1.CleanPodPolicy	Defines the policy for cleaning up pods after the MPIJob completes. Defaults to None.
`slotsPerWorker` int32	(Optional) Specifies the number of slots per worker used in hostfile. Defaults to 1.
`mainContainer` string	(Optional) Specifies name of the main container which executes the MPI code.
`runPolicy` common/v1.RunPolicy	(Optional) Encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`mpiReplicaSpecs` map[github.com/kubeflow/mpi-operator/pkg/apis/kubeflow/v1alpha2.MPIReplicaType]github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec*	A map of MPIReplicaType (type) to ReplicaSpec (value). Specifies the MPI cluster configuration. For example, { “Launcher”: MPIReplicaSpec, “Worker”: MPIReplicaSpec, }

MPIReplicaType is the type for MPIReplica. Can be one of “Launcher” or “Worker”.

Was this page helpful?

Sorry to hear that. Please tell us how we can improve.