Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tools/pytorchjob-generator/chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ customize the Jobs generated by the tool.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| environmentVariables | array | `nil` | List of variables/values to be defined for all the ranks. Values can be literals or references to Kuberetes secrets or configmaps. See [values.yaml](values.yaml) for examples of supported syntaxes. NOTE: The following standard [PyTorch Distributed environment variables](https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization) are set automatically and can be referenced in the commands without being set manually: WORLD_SIZE, RANK, MASTER_ADDR, MASTER_PORT. |
| envFrom | array | `nil` | List of ConfigMaps or Secrets specifying environment variables. See [values.yaml](values.yaml) for examples of supported syntaxes. NOTE: the environmentVariables field takes precedence over envFrom. mlbatch also performs some automatic checks on the environmentVariables passed by the user, such as checking that the user does not specify NCCL_TOPO_FILE when topologyFileConfigMap is also provided. These checks are *not* performed on any environment variables inherited from envFrom. |
| sshGitCloneConfig | object | `nil` | Private GitHub clone support. See [values.yaml](values.yaml) for additional instructions. |
| setupCommands | array | no custom commands are executed | List of custom commands to be ran at the beginning of the execution. Use `setupCommand` to clone code, download data, and change directories. |
| mainProgram | string | `nil` | Name of the PyTorch program to be executed by `torchrun`. Please provide your program name here and NOT in "setupCommands" as this helm template provides the necessary "torchrun" arguments for the parallel execution. WARNING: this program is relative to the current path set by change-of-directory commands in "setupCommands". If no value is provided; then only `setupCommands` are executed and torchrun is elided. |
Expand Down
8 changes: 8 additions & 0 deletions tools/pytorchjob-generator/chart/templates/appwrapper.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ spec:
{{- include "mlbatch.volumes" . | indent 38 }}
containers:
- name: pytorch
{{- if .Values.envFrom }}
envFrom:
{{- toYaml .Values.envFrom | nindent 46 }}
{{- end }}
image: {{ required "Please specify a 'containerImage' in the user file" .Values.containerImage }}
imagePullPolicy: {{ .Values.imagePullPolicy | default "IfNotPresent" }}
{{- include "mlbatch.securityContext" . | indent 44 }}
Expand All @@ -139,6 +143,10 @@ spec:
{{- include "mlbatch.volumes" . | indent 38 }}
containers:
- name: pytorch
{{- if .Values.envFrom }}
envFrom:
{{- toYaml .Values.envFrom | nindent 46 }}
{{- end }}
image: {{ required "Please specify a 'containerImage' in the user file" .Values.containerImage }}
imagePullPolicy: {{ .Values.imagePullPolicy | default "IfNotPresent" }}
{{- include "mlbatch.securityContext" . | indent 44 }}
Expand Down
4 changes: 4 additions & 0 deletions tools/pytorchjob-generator/chart/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@
{ "type": "null" },
{ "type": "array" }
]},
"envFrom": { "oneOf": [
{ "type": "null" },
{ "type": "array" }
]},
"sshGitCloneConfig": { "oneOf": [
{ "type": "null" },
{
Expand Down
17 changes: 17 additions & 0 deletions tools/pytorchjob-generator/chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,23 @@ environmentVariables:
# name: configmap-name
# key: configmap-key


# -- (array) List of ConfigMaps or Secrets specifying environment variables. See
# [values.yaml](values.yaml) for examples of supported syntaxes.
#
# NOTE: the environmentVariables field takes precedence over envFrom. mlbatch also performs some
# automatic checks on the environmentVariables passed by the user, such as checking that the user
# does not specify NCCL_TOPO_FILE when topologyFileConfigMap is also provided. These checks are
# *not* performed on any environment variables inherited from envFrom.
# @section -- Workload Specification
envFrom:
# - secretRef
# name: my-secrets
# - secretRef
# name: my-other-secrets
# - configMapRef
# name: my-config-map

# Private GitHub clone support.
#
# 0) Create a secret and configMap to enable Private GitHub cloning as documented for your organization.
Expand Down