Skip to content

Prefix Aware Scorer #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 106 commits into
base: dev
Choose a base branch
from
Open

Prefix Aware Scorer #48

wants to merge 106 commits into from

Conversation

oglok
Copy link

@oglok oglok commented Apr 23, 2025

This PR implements:

  • prefix store based in a radix tree storing a map of prefixes (key) and prefixEntry (value) that contains the target pod, last time used and model name.
  • prefix aware scorer
  • addition of new scorer to the scorer manager
  • unit tests for new store and scorer

NOTE: there is a runMaintenance function in the store that is not being used. Need to figure out where to call it from without blocking all requests.

mayabar and others added 30 commits April 10, 2025 14:50
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID
- Remove datastore from scoreTargets, add datastore to SessionAffinityScorer
- Rename ScoredPod to PodScore
…f ScoreMng

- If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0.
- [Commits](golang/net@v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
…es/golang.org/x/net-0.38.0

Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments

First iteration of development deployments & environments
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
…ilds

fix: basic container image builds for linux
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
empty top level kustomization.yaml - make CICD happy
clubanderson and others added 20 commits April 21, 2025 21:40
This is the namespace we'll use for tekton deploys to
test deploy the kubernetes dev environment.

Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
@oglok oglok changed the title WIP: Prefix Aware Scorer Prefix Aware Scorer Apr 23, 2025
@rootfs rootfs requested a review from Copilot April 23, 2025 12:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new prefix store based on a radix tree and a prefix-aware scorer to route requests more efficiently. Key changes include:

  • Adding a PrefixStore and its configuration to scheduler configuration with environmental overrides.
  • Implementing a new PrefixAwareScorer and integrating it into the scheduler.
  • Providing unit tests for the new store and scorer components.

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/epp/scheduling/types/types.go Added a Prompt field to the LLMRequest structure.
pkg/epp/scheduling/scheduler.go Introduced prefix store configuration and integrated the new scorer.
pkg/epp/scheduling/prefix_store.go Implements the prefix store with entry addition, lookup, eviction, and maintenance.
pkg/epp/scheduling/prefix_store_test.go Unit tests for basic prefix operations, constraints, TTL expiration, max entries, and maintenance routine.
pkg/epp/scheduling/prefix_aware_scorer.go Defines the scorer which uses the prefix store to score pods.
pkg/epp/scheduling/prefix_aware_scorer_test.go Tests behavior of the new scorer with various input scenarios.
Files not reviewed (1)
  • go.mod: Language not supported

Comment on lines +118 to +120
// Update LastUsed time for the matched entry
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
Copy link
Preview

Copilot AI Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying 'entry.LastUsed' under a read lock (RLock) in FindPodForPrefix may lead to a race condition. Consider acquiring the write lock (Lock) when updating mutable state or restructuring the update to ensure thread safety.

Suggested change
// Update LastUsed time for the matched entry
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
// Update LastUsed time for the matched entry
ps.mu.RUnlock() // Release the read lock before acquiring the write lock
ps.mu.Lock()
entry.LastUsed = time.Now()
ps.tree.Insert(matchedPrefix, entry)
ps.mu.Unlock()

Copilot uses AI. Check for mistakes.

@@ -43,6 +48,10 @@ const (
defaultQueueThresholdCritical = 5
defaultQueueingThresholdLoRA = 128
defaultLoraAffinityThreshold = 0.999
defaultPrefixStoreMaxEntries = 1000
defaultPrefixStoreMinLen = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably too small :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! testing defaults xD

defaultPrefixStoreMaxEntries = 1000
defaultPrefixStoreMinLen = 3
defaultPrefixStoreMaxLen = 100
defaultPrefixStoreTTLHours = 24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minute instead or hour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what should be a reasonable TTL for LLM inferencing.

@vMaroon
Copy link

vMaroon commented Apr 23, 2025

The dev branch was rebased on upstream main, which includes the new and different Scorer interface. The Scorer pulled from there defines a Score function that works per-pod and not group of pods. We need to discuss whether to propose changing that but it makes sense to first adapt to it.

I'll open a KVCacheAwareScorer PR soon and we can sync.

Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants