title

description

services

author

ms.service

ms.collection

ms.custom

ms.topic

ms.date

ms.author

Azure API Management policy reference - azure-openai-semantic-cache-lookup | Microsoft Docs

Reference for the azure-openai-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples.

api-management

dlepow

azure-api-management

ce-skilling-ai-copilot

build-2024

reference

12/13/2024

danlep

Get cached responses of Azure OpenAI API requests

[!INCLUDE api-management-availability-all-tiers]

Use the azure-openai-semantic-cache-lookup policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

This policy must have a corresponding Cache responses to Azure OpenAI API requests policy.
For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.
Currently, this policy is in preview.

[!INCLUDE api-management-policy-generic-alert]

[!INCLUDE api-management-azure-openai-models]

Policy statement

<azure-openai-semantic-cache-lookup
    score-threshold="similarity score threshold"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</azure-openai-semantic-cache-lookup>

Attributes

Attribute	Description	Required	Default
score-threshold	Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. Learn more.	Yes	N/A
embeddings-backend-id	Backend ID for OpenAI embeddings API call.	Yes	N/A
embeddings-backend-auth	Authentication used for Azure OpenAI embeddings API backend.	Yes. Must be set to `system-assigned`.	N/A
ignore-system-messages	Boolean. If set to `true`, removes system messages from a GPT chat completion prompt before assessing cache similarity.	No	false
max-message-count	If specified, number of remaining dialog messages after which caching is skipped.	No	N/A

Elements

Name	Description	Required
vary-by	A custom expression determined at runtime whose value partitions caching. If multiple `vary-by` elements are added, values are concatenated to create a unique combination.	No

Usage

Policy sections: inbound
Policy scopes: global, product, API, operation
Gateways: classic, v2, consumption

Usage notes

This policy can only be used once in a policy section.

Examples

Example with corresponding azure-openai-semantic-cache-store policy

[!INCLUDE api-management-semantic-cache-example]

Related policies

Caching
azure-openai-semantic-cache-store

[!INCLUDE api-management-policy-ref-next-steps]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azure-openai-semantic-cache-lookup-policy.md

azure-openai-semantic-cache-lookup-policy.md

Get cached responses of Azure OpenAI API requests

Policy statement

Attributes

Elements

Usage

Usage notes

Examples

Example with corresponding azure-openai-semantic-cache-store policy

Related policies

Files

azure-openai-semantic-cache-lookup-policy.md

Latest commit

History

azure-openai-semantic-cache-lookup-policy.md

File metadata and controls

Get cached responses of Azure OpenAI API requests

Policy statement

Attributes

Elements

Usage

Usage notes

Examples

Example with corresponding azure-openai-semantic-cache-store policy

Related policies