title | description | services | author | ms.service | ms.collection | ms.custom | ms.topic | ms.date | ms.author | |
---|---|---|---|---|---|---|---|---|---|---|
Azure API Management policy reference - azure-openai-semantic-cache-lookup | Microsoft Docs |
Reference for the azure-openai-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples. |
api-management |
dlepow |
azure-api-management |
ce-skilling-ai-copilot |
|
reference |
12/13/2024 |
danlep |
[!INCLUDE api-management-availability-all-tiers]
Use the azure-openai-semantic-cache-lookup
policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
Note
- This policy must have a corresponding Cache responses to Azure OpenAI API requests policy.
- For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.
- Currently, this policy is in preview.
[!INCLUDE api-management-policy-generic-alert]
[!INCLUDE api-management-azure-openai-models]
<azure-openai-semantic-cache-lookup
score-threshold="similarity score threshold"
embeddings-backend-id ="backend entity ID for embeddings API"
embeddings-backend-auth ="system-assigned"
ignore-system-messages="true | false"
max-message-count="count" >
<vary-by>"expression to partition caching"</vary-by>
</azure-openai-semantic-cache-lookup>
Attribute | Description | Required | Default |
---|---|---|---|
score-threshold | Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. Learn more. | Yes | N/A |
embeddings-backend-id | Backend ID for OpenAI embeddings API call. | Yes | N/A |
embeddings-backend-auth | Authentication used for Azure OpenAI embeddings API backend. | Yes. Must be set to system-assigned . |
N/A |
ignore-system-messages | Boolean. If set to true , removes system messages from a GPT chat completion prompt before assessing cache similarity. |
No | false |
max-message-count | If specified, number of remaining dialog messages after which caching is skipped. | No | N/A |
Name | Description | Required |
---|---|---|
vary-by | A custom expression determined at runtime whose value partitions caching. If multiple vary-by elements are added, values are concatenated to create a unique combination. |
No |
- Policy sections: inbound
- Policy scopes: global, product, API, operation
- Gateways: classic, v2, consumption
- This policy can only be used once in a policy section.
[!INCLUDE api-management-semantic-cache-example]
[!INCLUDE api-management-policy-ref-next-steps]