Skip to content

Commit b8a2bc7

Browse files
ZailiWangjingxu10chunyuan-w
authored
backport rel2.2 doc to main (#2605)
* backport rel2.2 doc to main * update llm/README.md * update feature category terms * update linear kernel optim. description * minor correction * add README.md for training examples --------- Co-authored-by: Jing Xu <jing.xu@intel.com> Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
1 parent df31007 commit b8a2bc7

27 files changed

+779
-293
lines changed

README.md

+29-17
Original file line numberDiff line numberDiff line change
@@ -16,23 +16,35 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
1616

1717
### Optimized Model List
1818

19-
| MODEL FAMILY | Verified <MODEL ID> (Huggingface hub)| FP32/BF16 | Weight only quantzation INT8 | Weight only quantization INT4| Static quantization INT8 |
20-
|---|:---:|:---:|:---:|:---:|:---:|
21-
|LLAMA| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |||||
22-
|GPT-J| "EleutherAI/gpt-j-6b" |||||
23-
|GPT-NEOX| "EleutherAI/gpt-neox-20b", "databricks/dolly-v2-12b" |||||
24-
|FALCON|"tiiuae/falcon-40b" |||||
25-
|OPT|"facebook/opt-30b", "facebook/opt-1.3b"|||||
26-
|Bloom|"bigscience/bloom", "bigscience/bloom-1b7"|||||
27-
|CodeGen|"Salesforce/codegen-2B-multi"|||||
28-
|Baichuan|"baichuan-inc/Baichuan2-13B-Chat", "baichuan-inc/Baichuan2-7B-Chat", "baichuan-inc/Baichuan-13B-Chat"|||||
29-
|ChatGLM|"THUDM/chatglm3-6b", "THUDM/chatglm2-6b"|||||
30-
|GPTBigCode|"bigcode/starcoder"|||||
31-
|T5|"google/flan-t5-xl"|||||
32-
|Mistral|"mistralai/Mistral-7B-v0.1"|||||
33-
|MPT|"mosaicml/mpt-7b"|||||
34-
35-
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). For other LLM model families, we are working in progress to cover those optimizations, which will expand the model list above.
19+
| MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
20+
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
21+
|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
22+
|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
23+
|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
24+
|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
25+
|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
26+
|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
27+
|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
28+
|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
29+
|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
30+
|Bloom| bigscience/bloom-1b7 | 🟩 | 🟨 | 🟩 | 🟩 | 🟨 |
31+
|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
32+
|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | |
33+
|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | |
34+
|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟨 | 🟩 | 🟩 | |
35+
|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟨 | 🟩 | |
36+
|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟨 | 🟩 | |
37+
|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
38+
|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟨 | 🟩 | |
39+
|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
40+
|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
41+
42+
- 🟩 signifies that the model can perform well and with good accuracy (<1% difference as compared with FP32).
43+
44+
- 🟨 signifies that the model can perform well while accuracy may not been in a perfect state (>1% difference as compared with FP32).
45+
46+
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16).
47+
We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
3648

3749
## Support
3850

docker/Dockerfile.prebuilt

+4-4
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
2727
# Some TF tools expect a "python" binary
2828
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
2929

30-
ARG IPEX_VERSION=2.1.100
31-
ARG PYTORCH_VERSION=2.1.1
32-
ARG TORCHAUDIO_VERSION=2.1.1
33-
ARG TORCHVISION_VERSION=0.16.1
30+
ARG IPEX_VERSION=2.2.0
31+
ARG PYTORCH_VERSION=2.2.0
32+
ARG TORCHAUDIO_VERSION=2.2.0
33+
ARG TORCHVISION_VERSION=0.17.0
3434
ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
3535

3636
RUN \

docs/_static/htmls/tbl_deepspeed.html

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
<table class="docutils align-default">
2+
<thead>
3+
<tr class="row-odd">
4+
<th class="head" style="text-align: center; vertical-align: middle;">MODEL<br />FAMILY</th>
5+
<th class="head" style="text-align: center; vertical-align: middle;">MODEL NAME<br />(Huggingface hub)</th>
6+
<th class="head" style="text-align: center; vertical-align: middle;">BF16</th>
7+
<th class="head" style="text-align: center; vertical-align: middle;">Weight-Only<br />Quantization<br />INT8</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr class="row-even">
12+
<td><p>LLAMA</p></td>
13+
<td><p>meta-llama/Llama-2-7b-hf</p></td>
14+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
15+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
16+
</tr>
17+
<tr class="row-odd">
18+
<td><p>LLAMA</p></td>
19+
<td><p>meta-llama/Llama-2-13b-hf</p></td>
20+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
21+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
22+
</tr>
23+
<tr class="row-even">
24+
<td><p>LLAMA</p></td>
25+
<td><p>meta-llama/Llama-2-70b-hf</p></td>
26+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
27+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
28+
</tr>
29+
<tr class="row-odd">
30+
<td><p>GPT-J</p></td>
31+
<td><p>EleutherAI/gpt-j-6b</p></td>
32+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
33+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
34+
</tr>
35+
<tr class="row-even">
36+
<td><p>GPT-NEOX</p></td>
37+
<td><p>EleutherAI/gpt-neox-20b</p></td>
38+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
39+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
40+
</tr>
41+
<tr class="row-odd">
42+
<td><p>DOLLY</p></td>
43+
<td><p>databricks/dolly-v2-12b</p></td>
44+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
45+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
46+
</tr>
47+
<tr class="row-even">
48+
<td><p>FALCON</p></td>
49+
<td><p>tiiuae/falcon-40b</p></td>
50+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
51+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
52+
</tr>
53+
<tr class="row-odd">
54+
<td><p>OPT</p></td>
55+
<td><p>facebook/opt-30b</p></td>
56+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
57+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
58+
</tr>
59+
<tr class="row-even">
60+
<td><p>OPT</p></td>
61+
<td><p>facebook/opt-1.3b</p></td>
62+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
63+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
64+
</tr>
65+
<tr class="row-odd">
66+
<td><p>Bloom</p></td>
67+
<td><p>bigscience/bloom-1b7</p></td>
68+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
69+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
70+
</tr>
71+
<tr class="row-even">
72+
<td><p>CodeGen</p></td>
73+
<td><p>Salesforce/codegen-2B-multi</p></td>
74+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
75+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
76+
</tr>
77+
<tr class="row-odd">
78+
<td><p>Baichuan</p></td>
79+
<td><p>baichuan-inc/Baichuan2-7B-Chat</p></td>
80+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
81+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
82+
</tr>
83+
<tr class="row-even">
84+
<td><p>Baichuan</p></td>
85+
<td><p>baichuan-inc/Baichuan2-13B-Chat</p></td>
86+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
87+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
88+
</tr>
89+
<tr class="row-odd">
90+
<td><p>Baichuan</p></td>
91+
<td><p>baichuan-inc/Baichuan-13B-Chat</p></td>
92+
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
93+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
94+
</tr>
95+
<tr class="row-even">
96+
<td><p>GPTBigCode</p></td>
97+
<td><p>bigcode/starcoder</p></td>
98+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
99+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
100+
</tr>
101+
<tr class="row-odd">
102+
<td><p>T5</p></td>
103+
<td><p>google/flan-t5-xl</p></td>
104+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
105+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
106+
</tr>
107+
<tr class="row-even">
108+
<td><p>Mistral</p></td>
109+
<td><p>mistralai/Mistral-7B-v0.1</p></td>
110+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
111+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
112+
</tr>
113+
<tr class="row-odd">
114+
<td><p>MPT</p></td>
115+
<td><p>mosaicml/mpt-7b</p></td>
116+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
117+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
118+
</tr>
119+
</tbody>
120+
</table>
121+
<ul class="simple">
122+
<li><p>🟩 signifies that the model can perform well and with good accuracy (&lt;1% difference as compared with FP32).</p></li>
123+
<li><p>🟨 signifies that the model can perform well while accuracy may not been in a perfect state (&gt;1% difference as compared with FP32).</p></li>
124+
</ul>

0 commit comments

Comments
 (0)