LLaMA 2 Model Fine-tuning Based on PyTorch
Learn how to fine-tune the LLaMA 2 model based on the pre-trained Atom-7B-Chat model using PyTorch on a distributed training system.
Introduction
LLaMA is a commonly used open-source model. This best practice describes how to fine-tune the model by submitting distributed training tasks based on the Language pre-trained model (Atom-7B-Chat) based on LLaMA 2 7B.
Preparation
- Get the LLaMA 2 model fine-tuning code
- Get the Atom-7B-Chat trained model
- The file storage user directory has been created.
Procedure
-
Log in to the management console.
-
After decompressing the obtained training model and code files locally, upload them to the specified directory for file storage using SFTP. In this example, a user directory named
xxxx0002has been created, and the training model and corresponding code are uploaded to the/xxxx0002/Atom-7B-Chatand/xxxx0002/Llama-Languagefolders, respectively. -
Create a distributed training task and configure the following parameters:
| Configuration Items | Parameter | Description |
|---|---|---|
| Task Name | Customizable | Users can set it according to actual conditions. |
| Mirror | sjz-dockerhub.singlefabric.com/public/llama2-train:pytorch-2.1.2-cuda12.1-cudnn8 | Specify the image address. |
| Storage and Data | Select user directory | Choose where you uploaded the code files. |
| Code | None required | No need to upload code files in this example. |
| Environment Variables | None required | No setup is needed in this example. |
| Startup Command | bash /root/epfs/Llama-lang]/train/sft/torchrun_finetune_lora.sh | Modify according to the actual situation. |
| Automatic Retry | Disabled | Select "closure". |
| Timeout Configuration | Disabled | Select "closure". |
| Computing Resources | Pytorch | Select appropriate resources. |
| Resource Group | Public resource pool | Recommend selecting NVIDIA GPU model 4090, set resources to 4, and nodes to 2. |
-
After configuring the parameters, click OK and wait for the task to complete.
-
Once training is completed and successful, the trained model can be found in the path
/Llama-Language/train/sft/save_folder.
Appendix
Example Startup Script Description
The startup script used in the training task startup command is torchrun_finetune_lora.sh. Its content can be viewed on the Storage and Data Services page.
Some parameters in the startup script:
output_model: Fine-tune the model output path. In this example, it is/Llama-Language/train/sft/save_folder.model_name_or_path: Pre-trained model path. In this example, it is/root/epfs/Atom-7B-Chat.train_files: Training dataset.validation_files: Validation dataset.