# **Model Fine-Tuning**

The platform currently supports SFT training mode. SFT (Supervised Fine-Tuning) is a training method that allows model developers to select open models or already uploaded models and combine them with their private datasets for fine-tuning to create custom vertical models for their business needs.

### **Create Task**

Click on the product "Model Service Platform UModelVerse" — Feature menu "Model Fine-Tuning" — Create Task

1. ##### **Model Selection**

   The platform presets popular open-source models and also supports importing private models from the US3 object storage.

   <!-- image-todo -->

2. ##### **Data Configuration**

   Select the dataset needed for training this time.

   Prerequisite: Upload training data to the US3 storage space and associate it with dataset management. For more details, see Dataset Management.

   <!-- image-todo -->

3. ##### **Parameter Configuration**

   The platform currently supports the LoRA fine-tuning method, which updates only the low-rank portion of the parameters during training, requiring fewer computing resources and speeding up the training process.

   Supported parameter configurations are as follows:

   | Parameter Name          | Parameter Description                                        |
   | ----------------------- | ------------------------------------------------------------ |
   | Initial Learning Rate   | A hyperparameter for updating weights during gradient descent.|
   | Training Rounds         | Controls the number of iterations during training. The Epoch size can be adjusted based on data scale. |
   | Batch Size              | Represents the data step size for updating model parameters during training. It's the amount of data the model processes before updating the parameters once. |
   | Maximum Sample Size     | The maximum number of samples for each dataset.              |
   | Computation Type        | Whether to use mixed precision training, such as FP16, BF16, FP32, etc. |
   | CheckPoint Save Interval| The interval Step number for saving Checkpoints during training.|
   | Maximum CheckPoint Saves| The number of Checkpoints to save at the end of training. Saving Checkpoints can increase the training time. |
   | Log Save Interval       | The interval step for saving logs.                           |
   | Truncation Length       | The maximum length of a single training data sample. Exceeding this length will result in automatic truncation. |
   | Learning Rate Scheduler | Used to adjust how the learning rate changes during training. |
   | Validation Set Ratio    | The percentage of the validation set out of the total samples.|
   | Use Flash Attention     | -                                                            |
   | LoRA Rank Value         | The rank size during LoRA training, affecting the degree of influence of LoRa's internal data on the model. A larger rank indicates a greater influence. Choose an appropriate rank based on the data volume. |
   | LoRA Scaling Factor     | The scaling factor in LoRa training, used to adjust the initial training weights to make them closer to or consistent with pre-trained weights. |
   | LoRA Random Dropout     | The rate of randomly dropping or ignoring neurons during training to prevent overfitting and enhance the model's generalization ability. |
   | LoRA+ Learning Rate Ratio| The multiplier for the learning rate of matrix B in LoRA+.  |

4. ##### **Output Configuration**

   Configure the storage path for the completed model after training. Currently, only the storage space of North China 2 US3 is supported.

   <!-- image-todo -->

5. ##### **Confirm Configuration**

   Reconfirm the configured information and billing details. Once confirmed, the training task can be submitted, and the platform will automatically allocate the training resources.

6. ##### **View & Manage Tasks**

   In the task list, you can view the real-time status of the fine-tuning tasks (including running status, token count, training duration, Loss graph, etc.).

   Operations available for tasks include: Details, Terminate, Copy, Delete, Publish

   Details: View the running condition of the task, including training status, estimated time, token count, Loss graph, etc.

   Terminate: When the task Loss graph shows anomalies, you can perform the "Terminate" operation to stop the current task, free computing resources, and stop billing.

   Copy: Copy the current task instance. After copying, it will redirect to a new creation page with the parameters carried over. Adjust the parameters to create a new fine-tune task.

   Delete: Delete tasks that either failed, have completed, or have been terminated.

   Publish: You can publish completed training tasks. During publishing, select a specific checkpoint step for model deployment. The successfully published model will be automatically stored in "Model Management" for subsequent service deployment.