LLaMA 2 Model Fine-tuning Based on PyTorch
Learn how to fine-tune the LLaMA 2 model based on the pre-trained Atom-7B-Chat model using PyTorch on a distributed training system.
Introduction
LLaMA is a commonly used open-source model. This best practice describes how to fine-tune the model by submitting distributed training tasks based on the Language pre-trained model (Atom-7B-Chat) based on LLaMA 2 7B.
Preparation
- Get the LLaMA 2 model fine-tuning code
- Get the Atom-7B-Chat trained model
- The file storage user directory has been created.
Procedure
-
Log in to the management console.
-
After decompressing the obtained training model and code files locally, upload them to the specified directory for file storage using SFTP. In this example, a user directory named
xxxx0002
has been created, and the training model and corresponding code are uploaded to the/xxxx0002/Atom-7B-Chat
and/xxxx0002/Llama-Language
folders, respectively. -
Create a distributed training task and configure the following parameters:
Configuration Items | Parameter | Description |
---|---|---|
Task Name | Customizable | Users can set it according to actual conditions. |
Mirror | sjz-dockerhub.singlefabric.com/public/llama2-train:pytorch-2.1.2-cuda12.1-cudnn8 | Specify the image address. |
Storage and Data | Select user directory | Choose where you uploaded the code files. |
Code | None required | No need to upload code files in this example. |
Environment Variables | None required | No setup is needed in this example. |
Startup Command | bash /root/epfs/Llama-lang]/train/sft/torchrun_finetune_lora.sh | Modify according to the actual situation. |
Automatic Retry | Disabled | Select "closure". |
Timeout Configuration | Disabled | Select "closure". |
Computing Resources | Pytorch | Select appropriate resources. |
Resource Group | Public resource pool | Recommend selecting NVIDIA GPU model 4090, set resources to 4, and nodes to 2. |
-
After configuring the parameters, click OK and wait for the task to complete.
-
Once training is completed and successful, the trained model can be found in the path
/Llama-Language/train/sft/save_folder
.
Appendix
Example Startup Script Description
The startup script used in the training task startup command is torchrun_finetune_lora.sh
. Its content can be viewed on the Storage and Data Services page.
Some parameters in the startup script:
output_model
: Fine-tune the model output path. In this example, it is/Llama-Language/train/sft/save_folder
.model_name_or_path
: Pre-trained model path. In this example, it is/root/epfs/Atom-7B-Chat
.train_files
: Training dataset.validation_files
: Validation dataset.