Skip to content

Commit 0ec44e0

Browse files
committed
update path
1 parent 635a218 commit 0ec44e0

File tree

6 files changed

+32
-32
lines changed

6 files changed

+32
-32
lines changed

docs/contribution_guide/modify_the_code.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,5 @@ pytest .
2828
If necessary, please consider supplementing with [Python tests](https://github.com/torchpipe/torchpipe//test).
2929

3030
:::note Code Formatting (optional)
31-
Please configure a formatting plugin to enable [.clang-format](https://github.com/torchpipe/torchpipe/-/blob/develop/.clang-format).
31+
Please configure a formatting plugin to enable [.clang-format](https://github.com/torchpipe/torchpipe/blob/develop/.clang-format).
3232
:::

docs/quick_start_new_user.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ type: explainer
77

88
# Trial in 30mins(new users)
99

10-
TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and OpenCV) and RPC frameworks. It guarantees high service throughput while meeting latency requirements. This document is mainly for new users, that is, users who are in the introductory stage of acceleration-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of torchpipe for accelerating service deployment, complemented by performance and effect comparisons. The complete code of this document can be found at [resnet50_thrift](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/)
10+
TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and OpenCV) and RPC frameworks. It guarantees high service throughput while meeting latency requirements. This document is mainly for new users, that is, users who are in the introductory stage of acceleration-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of torchpipe for accelerating service deployment, complemented by performance and effect comparisons. The complete code of this document can be found at [resnet50_thrift](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/)
1111

1212
## Catalogue
1313
* [1. Basic knowledge](#1)
@@ -84,7 +84,7 @@ self.classification_engine = torch2trt(resnet50, [input_shape],
8484

8585
```
8686

87-
The overall online service deployment can be found at [main_trt.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/main_trt.py)
87+
The overall online service deployment can be found at [main_trt.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/main_trt.py)
8888

8989
:::tip
9090
Since TensorRT is not thread-safe, when using this method for model acceleration, it is necessary to handle locking (with self.lock:) during the service deployment process.
@@ -104,7 +104,7 @@ From the above process, it's clear that when accelerating a single model, the fo
104104

105105
![](images/quick_start_new_user/torchpipe_en.png)
106106

107-
We've made adjustments to the deployment of our service using TorchPipe.The overall online service deployment can be found at [main_torchpipe.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/main_torchpipe.py).
107+
We've made adjustments to the deployment of our service using TorchPipe.The overall online service deployment can be found at [main_torchpipe.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/main_torchpipe.py).
108108
The core function modifications as follows:
109109

110110
```py
@@ -219,7 +219,7 @@ std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
219219
`python clien_qps.py --img_dir /your/testimg/path/ --port 8888 --request_client 20 --request_batch 1
220220
`
221221

222-
The specific test code can be found at [client_qps.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/client_qps.py)
222+
The specific test code can be found at [client_qps.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/client_qps.py)
223223

224224
With the same Thrift service interface, testing on a machine with NIDIA-3080 GPU, 36-core CPU, and concurrency of 10, we have the following results:
225225

docs/tools/quantization.mdx

+11-11
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ For detection models, you can consider using the [official complete tutorial](ht
2929

3030
In addition to the pre-training parameters provided by the model for normal training, training-based quantization also requires quantization pre-training parameters provided by post-training quantization (ptq).
3131

32-
We have integrated [calib_tools](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/calib_tools.py) for reference.
32+
We have integrated [calib_tools](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/calib_tools.py) for reference.
3333

3434
- Define calibrator:
3535
```python
@@ -100,17 +100,17 @@ The official training format is very simple and is only used as an example.
100100
#### Direct Quantization without Modifying Backbone
101101
Following the official example, we conducted step-by-step experiments on resnet:
102102

103-
- Download training data: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/download_data.py)
104-
- Train for 10 epochs to obtain the resnet50 model: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/fp32_train.py), accuracy 98.44%
105-
- (optional) PyTorch ptq: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/ptq.py), accuracy 96.64% (max)
106-
- (optional) PyTorch qat: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/qat.py), accuracy 98.26%.
103+
- Download training data: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/download_data.py)
104+
- Train for 10 epochs to obtain the resnet50 model: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/fp32_train.py), accuracy 98.44%
105+
- (optional) PyTorch ptq: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/ptq.py), accuracy 96.64% (max)
106+
- (optional) PyTorch qat: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/qat.py), accuracy 98.26%.
107107

108108
#### MSE + Residual Fusion {#mseadd}
109109

110110
The above resnet training uses the max quantization method and does not fuse the Add layer, resulting in TensorRT running speed not meeting expectations. The following are the results after fusing Add under int8 and switching to the mse mode:
111111

112-
- ptq: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/ptq_merge_residual.py), accuracy 94.34% (mse)
113-
- qat: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/qat_merge_residual.py), accuracy 95.82%.
112+
- ptq: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/ptq_merge_residual.py), accuracy 94.34% (mse)
113+
- qat: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/qat_merge_residual.py), accuracy 95.82%.
114114

115115

116116
#### Summary of Results in PyTorch
@@ -124,9 +124,9 @@ The above resnet training uses the max quantization method and does not fuse the
124124
### Summary of Test Results in TorchPipe
125125
The following tests were performed using the onnx generated by TorchPipe:
126126

127-
- Export onnx: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/export_onnx_merge_residual.py)
128-
- Load fp32-onnx with TorchPipe and perform ptq: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/torchpipe_ptq_test.py)
129-
- Test with qat-onnx loaded with TorchPipe: [code](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/torchpipe_qat_test.py)
127+
- Export onnx: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/export_onnx_merge_residual.py)
128+
- Load fp32-onnx with TorchPipe and perform ptq: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/torchpipe_ptq_test.py)
129+
- Test with qat-onnx loaded with TorchPipe: [code](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/torchpipe_qat_test.py)
130130

131131

132132
| Model | Accuracy | Performance | Note |
@@ -135,4 +135,4 @@ The following tests were performed using the onnx generated by TorchPipe:
135135
| tensorrt's native int8 | 98.26% | - | |
136136
| qat | 98.67% | - | [Acc. under onnxruntime] is 98.69%. |
137137

138-
[Acc. under onnxruntime]: https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/onnxruntime_qat_test.py
138+
[Acc. under onnxruntime]: https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/onnxruntime_qat_test.py

i18n/zh/docusaurus-plugin-content-docs/current/contribution_guide/modify_the_code.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,5 @@ pytest .
2828
需要时请考虑补充[python测试](https://github.com/torchpipe/torchpipe//test)
2929

3030
:::note 代码格式(optinal)
31-
请配置格式化插件以便[.clang-format](https://github.com/torchpipe/torchpipe/-/blob/develop/.clang-format)生效。
31+
请配置格式化插件以便[.clang-format](https://github.com/torchpipe/torchpipe/blob/develop/.clang-format)生效。
3232
:::

i18n/zh/docusaurus-plugin-content-docs/current/quick_start_new_user.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ type: explainer
77

88
# torchpipe快速上手(30min体验版)
99

10-
torchpipe是为工业界所准备的一个独立作用于底层加速库(如tensorrt,opencv,torchscript)以及 RPC(如thrift, gRPC)之间的多实例流水线并行库,助力使用者能在部署阶段节约更多的硬件资源,帮助产品应用落地。此教程主要针对初级用户,即对于加速相关的理论知识处于入门阶段,具有一定的 Python基础,能够阅读简单代码的用户。此内容主要包括使用torchpipe进行服务部署加速的使用方法、性能和效果差异对比等。本文档的完整代码见可详见[resnet50_thrift](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/)
10+
torchpipe是为工业界所准备的一个独立作用于底层加速库(如tensorrt,opencv,torchscript)以及 RPC(如thrift, gRPC)之间的多实例流水线并行库,助力使用者能在部署阶段节约更多的硬件资源,帮助产品应用落地。此教程主要针对初级用户,即对于加速相关的理论知识处于入门阶段,具有一定的 Python基础,能够阅读简单代码的用户。此内容主要包括使用torchpipe进行服务部署加速的使用方法、性能和效果差异对比等。本文档的完整代码见可详见[resnet50_thrift](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/)
1111

1212

1313

@@ -89,7 +89,7 @@ self.classification_engine = torch2trt(resnet50, [input_shape],
8989

9090

9191

92-
整体的线上服务部署代码见[main_trt.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/main_trt.py)
92+
整体的线上服务部署代码见[main_trt.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/main_trt.py)
9393

9494
:::tip
9595
因为TensorRT不是线程安全的,所以利用这种方法进行模型加速时,服务部署过程中需要加锁(`with self.lock:`)处理。
@@ -108,7 +108,7 @@ self.classification_engine = torch2trt(resnet50, [input_shape],
108108

109109
![](images/quick_start_new_user/torchpipe.png)
110110

111-
利用torchpipe对本服务部署进行调整,整体的线上服务部署代码见[main_torchpipe.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/main_torchpipe.py),核心函数调整如下:
111+
利用torchpipe对本服务部署进行调整,整体的线上服务部署代码见[main_torchpipe.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/main_torchpipe.py),核心函数调整如下:
112112

113113
```py
114114
# ------- main -------
@@ -210,7 +210,7 @@ std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
210210
## 4 性能和效果对比
211211
`python test_tools.py --img_dir /your/testimg/path/ --port 8095 --request_client 10 --request_batch 1
212212
`
213-
测试具体代码见[client_qps.py](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/resnet50_thrift/client_qps.py)
213+
测试具体代码见[client_qps.py](https://github.com/torchpipe/torchpipe/blob/develop/examples/resnet50_thrift/client_qps.py)
214214

215215
采用相同的thrift的服务接口,测试机器3080,cpu 36核, 并发数10
216216

i18n/zh/docusaurus-plugin-content-docs/current/tools/quantization.mdx

+11-11
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ tensorrt 的训练后量化过程主要包含两步:
3131

3232
训练时量化除了需要正常训练的模型提供预训练参数,也需要训练后量化(ptq)提供量化的预训练参数。
3333

34-
我们集成了[calib_tools](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/calib_tools.py),可做参考.
34+
我们集成了[calib_tools](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/calib_tools.py),可做参考.
3535

3636
- 定义calibrater:
3737
```python
@@ -99,17 +99,17 @@ calib_tools.save_onnx(q_model, f"model_name_qat.onnx")
9999
#### 不改变backbone,直接量化
100100
仿照官方示例,我们对resnet进行了分步骤实验:
101101

102-
- 下载训练数据:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/download_data.py)
103-
- 训练10个epoch获得resnet50模型:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/fp32_train.py), 精度98.44%
104-
- (optinal)pytorch ptq:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/ptq.py), 精度96.64%(max)
105-
- (optinal)pytorch qat:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/qat.py), 精度98.26%.
102+
- 下载训练数据:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/download_data.py)
103+
- 训练10个epoch获得resnet50模型:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/fp32_train.py), 精度98.44%
104+
- (optinal)pytorch ptq:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/ptq.py), 精度96.64%(max)
105+
- (optinal)pytorch qat:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/qat.py), 精度98.26%.
106106

107107
#### mse + 残差融合 {#mseadd}
108108

109109
以上resnet的训练,采用max方式量化,并且没有对Add进行融合,导致tensorrt运行速度未达预期。以下将Add在int8下进行融合并换用mse模式后的结果:
110110

111-
- ptq:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/ptq_merge_residual.py), 精度94.34%(mse)
112-
- qat:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/qat_merge_residual.py), 精度95.82%
111+
- ptq:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/ptq_merge_residual.py), 精度94.34%(mse)
112+
- qat:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/qat_merge_residual.py), 精度95.82%
113113

114114
#### pytorch下结果汇总
115115
| Model | Accuracy | Performance |备注 |
@@ -122,9 +122,9 @@ calib_tools.save_onnx(q_model, f"model_name_qat.onnx")
122122
### torchpipe下测试结果汇总
123123
以下使用torchpipe加载生成的onnx进行测试:
124124

125-
- 导出onnx:[代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/export_onnx_merge_residual.py)
126-
- 使用torchpipe加载fp32-onnx并进行ptq: [代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/torchpipe_ptq_test.py)
127-
- 使用torchpipe加载qat-onnx进行测试: [代码](https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/torchpipe_qat_test.py)
125+
- 导出onnx:[代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/export_onnx_merge_residual.py)
126+
- 使用torchpipe加载fp32-onnx并进行ptq: [代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/torchpipe_ptq_test.py)
127+
- 使用torchpipe加载qat-onnx进行测试: [代码](https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/torchpipe_qat_test.py)
128128

129129

130130
| Model | Accuracy | Performance |备注 |
@@ -133,4 +133,4 @@ calib_tools.save_onnx(q_model, f"model_name_qat.onnx")
133133
| tensorrt's native int8 |98.26% | - | |
134134
| qat | 98.67% | - | [onnxruntime下精度]为98.69%。 |
135135

136-
[onnxruntime下精度]: https://github.com/torchpipe/torchpipe/-/blob/develop/examples/int8/qat/onnxruntime_qat_test.py
136+
[onnxruntime下精度]: https://github.com/torchpipe/torchpipe/blob/develop/examples/int8/qat/onnxruntime_qat_test.py

0 commit comments

Comments
 (0)