-
Notifications
You must be signed in to change notification settings - Fork 7
Prefix Aware Scorer #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID - Remove datastore from scoreTargets, add datastore to SessionAffinityScorer - Rename ScoredPod to PodScore
…f ScoreMng - If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0. - [Commits](golang/net@v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.38.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…es/golang.org/x/net-0.38.0 Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments First iteration of development deployments & environments
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
…ilds fix: basic container image builds for linux
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
empty top level kustomization.yaml - make CICD happy
Fix kustomize envs
This is the namespace we'll use for tekton deploys to test deploy the kubernetes dev environment. Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
provide more defaults in Makefile
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Shane Utt <shaneutt@linux.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new prefix store based on a radix tree and a prefix-aware scorer to route requests more efficiently. Key changes include:
- Adding a PrefixStore and its configuration to scheduler configuration with environmental overrides.
- Implementing a new PrefixAwareScorer and integrating it into the scheduler.
- Providing unit tests for the new store and scorer components.
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
pkg/epp/scheduling/types/types.go | Added a Prompt field to the LLMRequest structure. |
pkg/epp/scheduling/scheduler.go | Introduced prefix store configuration and integrated the new scorer. |
pkg/epp/scheduling/prefix_store.go | Implements the prefix store with entry addition, lookup, eviction, and maintenance. |
pkg/epp/scheduling/prefix_store_test.go | Unit tests for basic prefix operations, constraints, TTL expiration, max entries, and maintenance routine. |
pkg/epp/scheduling/prefix_aware_scorer.go | Defines the scorer which uses the prefix store to score pods. |
pkg/epp/scheduling/prefix_aware_scorer_test.go | Tests behavior of the new scorer with various input scenarios. |
Files not reviewed (1)
- go.mod: Language not supported
// Update LastUsed time for the matched entry | ||
entry.LastUsed = time.Now() | ||
ps.tree.Insert(matchedPrefix, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying 'entry.LastUsed' under a read lock (RLock) in FindPodForPrefix may lead to a race condition. Consider acquiring the write lock (Lock) when updating mutable state or restructuring the update to ensure thread safety.
// Update LastUsed time for the matched entry | |
entry.LastUsed = time.Now() | |
ps.tree.Insert(matchedPrefix, entry) | |
// Update LastUsed time for the matched entry | |
ps.mu.RUnlock() // Release the read lock before acquiring the write lock | |
ps.mu.Lock() | |
entry.LastUsed = time.Now() | |
ps.tree.Insert(matchedPrefix, entry) | |
ps.mu.Unlock() |
Copilot uses AI. Check for mistakes.
@@ -43,6 +48,10 @@ const ( | |||
defaultQueueThresholdCritical = 5 | |||
defaultQueueingThresholdLoRA = 128 | |||
defaultLoraAffinityThreshold = 0.999 | |||
defaultPrefixStoreMaxEntries = 1000 | |||
defaultPrefixStoreMinLen = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably too small :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! testing defaults xD
defaultPrefixStoreMaxEntries = 1000 | ||
defaultPrefixStoreMinLen = 3 | ||
defaultPrefixStoreMaxLen = 100 | ||
defaultPrefixStoreTTLHours = 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minute instead or hour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what should be a reasonable TTL for LLM inferencing.
The I'll open a KVCacheAwareScorer PR soon and we can sync. |
Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
This PR implements:
NOTE: there is a runMaintenance function in the store that is not being used. Need to figure out where to call it from without blocking all requests.