Best Practices
PyTorch Distributed Training Task with minGPT
This guide provides steps for setting up and submitting a PyTorch distributed training task using the minGPT model. It includes the environment preparation, task submission, and various training configurations such as single-node and multi-node setups.
Introduction
This guide provides steps for setting up and submitting a PyTorch distributed training task using the minGPT model. It includes the environment preparation, task submission, and various training configurations such as single-node and multi-node setups.
Environment Preparation
-
Get the sample code.
-
Create a container instance of the PyTorch image.
Notice:
- The storage and data dataset of the container instance must be in the user directory where the sample code was uploaded.
- Select the Pytorch image for the container instance.
-
Log in to the container instance via Jupyter and run the following command to install environment dependencies: