Skip to content

Latest commit

 

History

History
83 lines (58 loc) · 4.06 KB

azure-openai-semantic-cache-lookup-policy.md

File metadata and controls

83 lines (58 loc) · 4.06 KB
title description services author ms.service ms.collection ms.custom ms.topic ms.date ms.author
Azure API Management policy reference - azure-openai-semantic-cache-lookup | Microsoft Docs
Reference for the azure-openai-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples.
api-management
dlepow
azure-api-management
ce-skilling-ai-copilot
build-2024
reference
12/13/2024
danlep

Get cached responses of Azure OpenAI API requests

[!INCLUDE api-management-availability-all-tiers]

Use the azure-openai-semantic-cache-lookup policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

[!INCLUDE api-management-policy-generic-alert]

[!INCLUDE api-management-azure-openai-models]

Policy statement

<azure-openai-semantic-cache-lookup
    score-threshold="similarity score threshold"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</azure-openai-semantic-cache-lookup>

Attributes

Attribute Description Required Default
score-threshold Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. Learn more. Yes N/A
embeddings-backend-id Backend ID for OpenAI embeddings API call. Yes N/A
embeddings-backend-auth Authentication used for Azure OpenAI embeddings API backend. Yes. Must be set to system-assigned. N/A
ignore-system-messages Boolean. If set to true, removes system messages from a GPT chat completion prompt before assessing cache similarity. No false
max-message-count If specified, number of remaining dialog messages after which caching is skipped. No N/A

Elements

Name Description Required
vary-by A custom expression determined at runtime whose value partitions caching. If multiple vary-by elements are added, values are concatenated to create a unique combination. No

Usage

Usage notes

  • This policy can only be used once in a policy section.

Examples

Example with corresponding azure-openai-semantic-cache-store policy

[!INCLUDE api-management-semantic-cache-example]

Related policies

[!INCLUDE api-management-policy-ref-next-steps]