title | description | services | author | ms.service | ms.collection | ms.custom | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|
Azure API Management policy reference - llm-semantic-cache-store |
Reference for the llm-semantic-cache-store policy available for use in Azure API Management. Provides policy usage, settings, and examples. |
api-management |
dlepow |
azure-api-management |
ce-skilling-ai-copilot |
reference |
12/13/2024 |
danlep |
[!INCLUDE api-management-availability-all-tiers]
The llm-semantic-cache-store
policy caches responses to chat completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
Note
- This policy must have a corresponding Get cached responses to large language model API requests policy.
- For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.
- Currently, this policy is in preview.
[!INCLUDE api-management-policy-generic-alert]
[!INCLUDE api-management-llm-models]
<llm-semantic-cache-store duration="seconds"/>
Attribute | Description | Required | Default |
---|---|---|---|
duration | Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed. | Yes | N/A |
- Policy sections: outbound
- Policy scopes: global, product, API, operation
- Gateways: classic, v2, consumption
- This policy can only be used once in a policy section.
- If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.
[!INCLUDE api-management-llm-semantic-cache-example]
[!INCLUDE api-management-policy-ref-next-steps]