Skip to content

Commit 93f517b

Browse files
authored
Merge pull request #345 from lyudmil-pelov/main
Llama2 fine-tuning docs update and move to README
2 parents 2f09225 + 164b2d4 commit 93f517b

11 files changed

+548
-220
lines changed

distributed_training/llama2/README.md

Lines changed: 353 additions & 0 deletions
Large diffs are not rendered by default.

distributed_training/llama2/demo.md

Lines changed: 0 additions & 191 deletions
This file was deleted.
141 KB
Loading
627 KB
Loading
262 KB
Loading
239 KB
Loading
539 KB
Loading
582 KB
Loading
1.09 MB
Loading
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "58901f38",
6+
"metadata": {},
7+
"source": [
8+
"# Loading back FSDP checkpoints\n",
9+
"\n",
10+
"For more information: https://github.com/facebookresearch/llama-recipes/blob/main/docs/inference.md#loading-back-fsdp-checkpoints"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "98b57d30",
16+
"metadata": {},
17+
"source": [
18+
"## All of the code in this notebook should be run in the OCI Data Science Notebook Terminal!"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"id": "05a59132",
24+
"metadata": {},
25+
"source": [
26+
"Before you start make sure that you've installed the `pytorch20_p39_gpu_v2` Conda and activate it in the `Terminal`\n",
27+
"\n",
28+
"```bash\n",
29+
"odsc conda install -s pytorch20_p39_gpu_v2\n",
30+
"```\n",
31+
"\n",
32+
"... then activate it\n",
33+
"\n",
34+
"```bash\n",
35+
"conda activate /home/datascience/conda/pytorch20_p39_gpu_v2\n",
36+
"```\n",
37+
"\n",
38+
"Then install all of the required dependancies\n",
39+
"\n",
40+
"```bash\n",
41+
"!pip install tokenizers==0.13.3 -U && pip install transformers -U && pip install llama-recipes==0.0.1\n",
42+
"```"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"id": "6e2aecea",
48+
"metadata": {},
49+
"source": [
50+
"Following commands work best when you execute them in the `terminal` too!\n",
51+
"\n",
52+
"First you have to login to access the Llama2 model\n",
53+
"```bash\n",
54+
"!huggingface-cli login\n",
55+
"```\n",
56+
"\n",
57+
"Then run the checkpoint conververter, it looks like following\n",
58+
"\n",
59+
"```bash\n",
60+
"python -m llama_recipes.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path /mnt/llama2/outputs/lvp-7b/ocid1.datasciencejob.oc1.eu-frankfurt-1.amaaaaaan/fine-tuned-meta-llama/Llama-2-7b-hf --consolidated_model_path /mnt/llama2/fsdp_consolidated_checkpoints --HF_model_path_or_name \"meta-llama/Llama-2-13b-hf\"\n",
61+
"```\n",
62+
"\n",
63+
"Replace the `--fsdp_checkpoint_path` with the folder you specified by the `--dist_checkpoint_root_folder` which will be the location at your object storage bucket, as per the example above. Notice that we ran this in OCI Data Science Notebooks and mounted the object storage bucket used to store the FSDP checkpoints under `/mnt/llama2`. The `--consolidated_model_path` is the path where the consolidated weights will be stored back. The `--HF_model_path_or_name` is the name of the model used for the fine-tuning, or if you downloaded the model locally, the location of the downloaded model.\n",
64+
"\n",
65+
"If the merging process was successful, you should see in your `--consolidated_model_path` folder something like this:\n",
66+
"\n",
67+
"```bash\n",
68+
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 15:48 .\n",
69+
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 14:38 ..\n",
70+
" 512 -rw-r--r--. 1 datascience users 42 Oct 18 16:35 added_tokens.json\n",
71+
"1.0K -rw-r--r--. 1 datascience users 656 Oct 18 16:35 config.json\n",
72+
" 512 -rw-r--r--. 1 datascience users 111 Oct 18 16:35 generation_config.json\n",
73+
"9.2G -rw-r--r--. 1 datascience users 9.2G Oct 18 16:35 pytorch_model-00001-of-00003.bin\n",
74+
"9.3G -rw-r--r--. 1 datascience users 9.3G Oct 18 16:36 pytorch_model-00002-of-00003.bin\n",
75+
"6.7G -rw-r--r--. 1 datascience users 6.7G Oct 18 16:36 pytorch_model-00003-of-00003.bin\n",
76+
" 24K -rw-r--r--. 1 datascience users 24K Oct 18 16:36 pytorch_model.bin.index.json\n",
77+
" 512 -rw-r--r--. 1 datascience users 72 Oct 18 16:35 special_tokens_map.json\n",
78+
"1.5K -rw-r--r--. 1 datascience users 1.2K Oct 18 16:35 tokenizer_config.json\n",
79+
"489K -rw-r--r--. 1 datascience users 489K Oct 18 16:35 tokenizer.model\n",
80+
"```"
81+
]
82+
},
83+
{
84+
"cell_type": "code",
85+
"execution_count": null,
86+
"id": "2407ae40",
87+
"metadata": {},
88+
"outputs": [],
89+
"source": []
90+
}
91+
],
92+
"metadata": {
93+
"kernelspec": {
94+
"display_name": "Python [conda env:pytorch20_p39_gpu_v2]",
95+
"language": "python",
96+
"name": "conda-env-pytorch20_p39_gpu_v2-py"
97+
},
98+
"language_info": {
99+
"codemirror_mode": {
100+
"name": "ipython",
101+
"version": 3
102+
},
103+
"file_extension": ".py",
104+
"mimetype": "text/x-python",
105+
"name": "python",
106+
"nbconvert_exporter": "python",
107+
"pygments_lexer": "ipython3",
108+
"version": "3.9.16"
109+
}
110+
},
111+
"nbformat": 4,
112+
"nbformat_minor": 5
113+
}

0 commit comments

Comments
 (0)