You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/introduction.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ There are some industry practices, such as [triton inference server](https://git
13
13
14
14
One common complaint from users of the Triton Inference Server is that in a system with multiple intertwined nodes, a lot of business logic needs to be completed on the client side and then called through RPC to the server, which can be cumbersome. For performance reasons, unconventional methods such as shared memory, ensemble, and [BLS](https://github.com/triton-inference-server/python_backend#business-logic-scripting) must be considered.
15
15
16
-
To address this issue, TorchPipe provides a thread-safe function interface for the PyTorch frontend and a fine-grained backend extension for users, by delving into PyTorch's C++ calculation backend and CUDA stream management, as well as modeling domain-specific languages for multiple nodes.
16
+
To address these issues, TorchPipe provides a thread-safe function interface for the PyTorch frontend and a fine-grained backend extension for users, by delving into PyTorch's C++ calculation backend and CUDA stream management, as well as modeling domain-specific languages for multiple nodes.
@@ -22,7 +22,7 @@ To address this issue, TorchPipe provides a thread-safe function interface for t
22
22
**Features of the torchpipe framework:**
23
23
- Achieves near-optimal performance (peak throughput/TP99) from a business perspective, reducing widespread negative optimization and performance loss between nodes.
24
24
- With a fine-grained generic backend, it is easy to expand hardware and weaken the difficulty of hardware vendor ecosystem migration.
25
-
- Simple and high-performance modeling, including complex business systems such as multi-model fusion. Typical industrial scenarios include AI systems A and B with up to 10 model nodes in smart cities, and OCR systems that involve subgraph independent scheduling, bucket scheduling, and intelligent batch grouping for extreme optimization.
25
+
- Simple and high-performance modeling, including complex business systems such as multi-model fusion. Typical industrial scenarios include AI systems with up to 10 model nodes in smart cities, and OCR systems that involve subgraph independent scheduling, bucket scheduling, and intelligent batch grouping for extreme optimization.
26
26
- Maximizes the elimination of performance loss caused by Python runtime, GIL, heterogeneous hardware, virtualization, and multi-process.
27
27
28
28
Unlike many other service-oriented frameworks, we decouple the system from RPC and focus on concurrent safety and pipeline scheduling of C++ and Python interfaces.
Copy file name to clipboardExpand all lines: docs/preliminaries/rpc.md
-1
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,6 @@ title: Performance indicators for services
4
4
type: explainer
5
5
---
6
6
7
-
Performance indicators for services
8
7
9
8
When evaluating the performance of a service, there are several key indicators to consider. These indicators can help us understand the performance of the service in terms of latency, throughput, error rate, and other aspects. Here are some commonly used key performance indicators:
0 commit comments