@@ -851,6 +851,65 @@ if __name__ == "__main__":
851
851
852
852

853
853
854
+ #### 数据集信息
855
+
856
+ 由 Unit Eval + OSS Instruct 数据集构建而来:
857
+
858
+ - 3000 条补全(Inline,InBlock,AfterBlock)数据集。
859
+ - 1500 条单元测试数据集。
860
+ - 4000 条 OSS Instruct 数据集。
861
+
862
+ #### 参数示例:
863
+
864
+ ```bash
865
+ !cd DeepSeek-Coder/finetune && deepspeed finetune_deepseekcoder.py \
866
+ --model_name_or_path $MODEL_PATH \
867
+ --data_path $DATA_PATH \
868
+ --output_dir $OUTPUT_PATH \
869
+ --num_train_epochs 1 \
870
+ --model_max_length 1024 \
871
+ --per_device_train_batch_size 2 \
872
+ --per_device_eval_batch_size 1 \
873
+ --gradient_accumulation_steps 1 \
874
+ --evaluation_strategy "no" \
875
+ --save_strategy "steps" \
876
+ --save_steps 2000 \
877
+ --save_total_limit 10 \
878
+ --learning_rate 1e-4 \
879
+ --warmup_steps 10 \
880
+ --logging_steps 1 \
881
+ --lr_scheduler_type "cosine" \
882
+ --gradient_checkpointing True \
883
+ --report_to "tensorboard" \
884
+ --deepspeed configs/ds_config_zero3.json \
885
+ --bf16 True
886
+ ```
887
+
888
+ 运行日志:
889
+
890
+ ```bash
891
+ `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
892
+ 0%| | 0/2125 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
893
+ {' loss' : 3.9356, ' learning_rate' : 0.0, ' epoch' : 0.0}
894
+ {' loss' : 0.8462, ' learning_rate' : 3.0102999566398115e-05, ' epoch' : 0.0}
895
+ {' loss' : 0.909, ' learning_rate' : 4.771212547196624e-05, ' epoch' : 0.0}
896
+ {' loss' : 0.3674, ' learning_rate' : 6.020599913279623e-05, ' epoch' : 0.0}
897
+ {' loss' : 0.3959, ' learning_rate' : 6.989700043360187e-05, ' epoch' : 0.0}
898
+ {' loss' : 0.7964, ' learning_rate' : 7.781512503836436e-05, ' epoch' : 0.0}
899
+ {' loss' : 0.3542, ' learning_rate' : 8.450980400142567e-05, ' epoch' : 0.0}
900
+ {' loss' : 1.7094, ' learning_rate' : 9.030899869919434e-05, ' epoch' : 0.0}
901
+ {' loss' : 0.5968, ' learning_rate' : 9.542425094393248e-05, ' epoch' : 0.0}
902
+ {' loss' : 0.6208, ' learning_rate' : 9.999999999999999e-05, ' epoch' : 0.0}
903
+ {' loss' : 0.4074, ' learning_rate' : 0.0001, ' epoch' : 0.01}
904
+ {' loss' : 0.3637, ' learning_rate' : 0.0001, ' epoch' : 0.01}
905
+ {' loss' : 0.3459, ' learning_rate' : 0.0001, ' epoch' : 0.01}
906
+ {' loss' : 0.6971, ' learning_rate' : 0.0001, ' epoch' : 0.01}
907
+ {' loss' : 0.3917, ' learning_rate' : 0.0001, ' epoch' : 0.01}
908
+ {' loss' : 0.5859, ' learning_rate' : 0.0001, ' epoch' : 0.01}
909
+ {' loss' : 0.5923, ' learning_rate' : 0.0001, ' epoch' : 0.01}
910
+ 1%|▎ | 17/2125 [05:14<10:03:38, 17.18s/it]
911
+ ```
912
+
854
913
其它:
855
914
856
915
- 详细的 Notebook 见:[code/finetune/finetune.ipynb](code/finetune/finetune.ipynb)
0 commit comments