-
Notifications
You must be signed in to change notification settings - Fork 41
docs: update using "starter" distro than "ollama" #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request has merge conflicts that must be resolved before it can be merged. @zdtsw please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
README.md
Outdated
env: | ||
- name: INFERENCE_MODEL | ||
value: "llama3.2:1b" | ||
value: "llama3.2:3b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this doesn't need to be changed from 1b
to 3b
- 1b
should continue to work just fine
However, INFERENCE_MODEL
will need to change to OLLAMA_INFERENCE_MODEL
, see https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/starter.html#model-configuration
env: | ||
- name: INFERENCE_MODEL | ||
value: 'llama3.2:1b' | ||
value: 'llama3.2:3b' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here regarding 3b
versus 1b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
url: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434" | ||
models: | ||
- model_id: "llama3.2:1b" | ||
- model_id: "ollama/llama3.2:3b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about 1b
versus 3b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the ollama
prefix here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
- name: ENABLE_OLLAMA | ||
value: ollama | ||
- name: OLLAMA_EMBEDDING_MODEL | ||
value: all-minilm:l6-v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does OLLAMA_INFERENCE_MODEL
need to be added here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_OLLAMA
is now removed,
as for OLLAMA_INFERENCE_MODEL
think it will be passed from Configmap llama-stack-config
right? line 19-22
port: 8321 | ||
env: | ||
- name: OLLAMA_INFERENCE_MODEL | ||
value: "llama3.2:3b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment around 3b
versus 1b
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
distributions.json
Outdated
"tgi": "docker.io/llamastack/distribution-tgi:latest", | ||
"together": "docker.io/llamastack/distribution-together:latest", | ||
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest" | ||
"starter": "docker.io/llamastack/distribution-starter:0.2.15", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep it on latest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, all set to use latest
tag now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did not add
"starter-gpu": "docker.io/llamastack/distribution-starter-gpu:latest",
"dell": "docker.io/llamastack/distribution-dell:latest"
in this file (ref https://github.com/llamastack/llama-stack-ops/blob/main/actions/publish-dockers/main.sh#L71) can have a follow up PR for these 2
providers: | ||
inference: | ||
- provider_id: ollama | ||
- provider_id: ${env.ENABLE_OLLAMA:=__disabled__} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep ollama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
config/manager/manager.yaml
Outdated
capabilities: | ||
drop: | ||
- "ALL" | ||
- "ALL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ? Please try to keep your changes focused on the PR. It would be great to check before submitting the PR that your AI assistant doesn't touch things outside the scope of the PR (even when it makes changes, please open another PR for changing unrelated stuff).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the change came from IDE/AI, mainly the reason for this change was to align with indentation in others lines in this yaml file.
I can revert it here
README.md
Outdated
3. Verify the server pod is running in the user defined namespace. | ||
|
||
### Using a ConfigMap for run.yaml configuration | ||
### Using a ConfigMap to override default run.yaml configuration from distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
model_id: ${env.ENABLE_OLLAMA:=__disabled__}/${env.OLLAMA_EMBEDDING_MODEL:=__disabled__} | ||
provider_id: ${env.ENABLE_OLLAMA:=__disabled__} | ||
provider_model_id: ${env.OLLAMA_EMBEDDING_MODEL:=__disabled__} | ||
model_type: embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this extra entry ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the LLS moved away from the ENABLE_OLLAMA
"trick", and just uses OLLAMA_URL
. See the run.yaml
of the starter distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think when this PR was opened it was based on 0.2.15,
changes after that release might not be well captured here
i've removed extral changes, to only set OLLAMA_INFERENCE_MODEL
and OLLAMA_URL
README.md
Outdated
- name: OLLAMA_URL | ||
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434" | ||
- name: ENABLE_OLLAMA | ||
value: ollama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this value for the name a bit odd... 😄 but I guess is outside of scope here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed after comments https://github.com/llamastack/llama-stack-k8s-operator/pull/96/files#r2306693329
value: 'llama3.2:3b' | ||
- name: OLLAMA_URL | ||
value: 'http://ollama-server-service.ollama-dist.svc.cluster.local:11434' | ||
- name: ENABLE_OLLAMA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
value: all-minilm:l6-v2 | ||
userConfig: | ||
configMapName: llama-stack-config | ||
configMapName: llama-stack-config # use ConfigMap's data.run.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
…ama-stack-k8s-operator#96. Seems like ollama distribution is not longer maintained and switching to 'starter', and adjustments for OLLAMA env settings Signed-off-by: Matthias Wessendorf <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zdtsw thanks a lot for the PR! It looks good in general.
Since you have created it with the help of an AI-assistant (which is fine!), please help us reviewers by using some guidelines as laid out in rvc-guide.github.io. E.g. this PR touches quite some things outside of its scope (which is an doc update).
Especially these advices might be helpful here:
Stay on target – avoid irrelevant code changes: AI models sometimes get “distracted”, modifying parts of the codebase that you didn’t ask to change (like drive-by “improvements” that are not needed). Such changes can confuse reviewers and bloat your PR. Avoid touching code that is not relevant to the task at hand. Each PR should have a laser-focused diff - if you notice unrelated edits (no matter how well-intentioned by the AI), strip them out.
Do a thorough self-review and test your changes: Never treat an AI-generated patch as ready to go without reviewing it yourself. Before requesting a peer review, inspect and verify the code yourself. This means reading through the diff carefully, understanding every change, and checking that it solves the problem without breaking anything else. Try to run the code end-to-end, and run all relevant unit tests or sample scenarios. If the project has automated test suites or linters, run those tools locally to catch obvious issues. You are responsible for the code you submit, whether you or an AI wrote it, and AI tools are no substitute for your own review of code quality, correctness, style, security, and licensing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did udpates to address review comments
config/manager/manager.yaml
Outdated
capabilities: | ||
drop: | ||
- "ALL" | ||
- "ALL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the change came from IDE/AI, mainly the reason for this change was to align with indentation in others lines in this yaml file.
I can revert it here
env: | ||
- name: INFERENCE_MODEL | ||
value: 'llama3.2:1b' | ||
value: 'llama3.2:3b' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
README.md
Outdated
3. Verify the server pod is running in the user defined namespace. | ||
|
||
### Using a ConfigMap for run.yaml configuration | ||
### Using a ConfigMap to override default run.yaml configuration from distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
providers: | ||
inference: | ||
- provider_id: ollama | ||
- provider_id: ${env.ENABLE_OLLAMA:=__disabled__} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
value: 'llama3.2:3b' | ||
- name: OLLAMA_URL | ||
value: 'http://ollama-server-service.ollama-dist.svc.cluster.local:11434' | ||
- name: ENABLE_OLLAMA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
distributions.json
Outdated
"tgi": "docker.io/llamastack/distribution-tgi:latest", | ||
"together": "docker.io/llamastack/distribution-together:latest", | ||
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest" | ||
"starter": "docker.io/llamastack/distribution-starter:0.2.15", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, all set to use latest
tag now
distributions.json
Outdated
"tgi": "docker.io/llamastack/distribution-tgi:latest", | ||
"together": "docker.io/llamastack/distribution-together:latest", | ||
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest" | ||
"starter": "docker.io/llamastack/distribution-starter:0.2.15", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did not add
"starter-gpu": "docker.io/llamastack/distribution-starter-gpu:latest",
"dell": "docker.io/llamastack/distribution-dell:latest"
in this file (ref https://github.com/llamastack/llama-stack-ops/blob/main/actions/publish-dockers/main.sh#L71) can have a follow up PR for these 2
- name: ENABLE_OLLAMA | ||
value: ollama | ||
- name: OLLAMA_EMBEDDING_MODEL | ||
value: all-minilm:l6-v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_OLLAMA
is now removed,
as for OLLAMA_INFERENCE_MODEL
think it will be passed from Configmap llama-stack-config
right? line 19-22
url: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434" | ||
models: | ||
- model_id: "llama3.2:1b" | ||
- model_id: "ollama/llama3.2:3b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
README.md
Outdated
- name: OLLAMA_URL | ||
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434" | ||
- name: ENABLE_OLLAMA | ||
value: ollama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed after comments https://github.com/llamastack/llama-stack-k8s-operator/pull/96/files#r2306693329
Thanks for the review comments and suggestions! |
distribution: | ||
name: starter | ||
containerSpec: | ||
port: 8321 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed the port
in the README`, perhaps here also not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is a default one, not needed.
let me remove it as well
- model_id: "llama3.2:1b" | ||
provider_id: ollama | ||
model_type: llm | ||
provider_model_id: llama3.2:1b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Since above is already a model_id
and a provider_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary, i can remove it
README.md
Outdated
containerSpec: | ||
port: 8321 | ||
env: | ||
- name: INFERENCE_MODEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be also OLLAMA_INFERENCE_MODEL
like in the sample change below ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right!
let me fix this
1950dad
to
cbc3dbc
Compare
@nathan-weinberg @rhuss Is it okay to merge this PR ? |
@mergify rebase |
- update example and create one without using userconfigmap - set new env to enable ollama - use the same llama model as in llama-stack - remove deprecated distro images from distribution.json Signed-off-by: Wen Zhou <[email protected]>
- revert back to use llama3.2:1b - remove unnecessary/unrelated comments/changes - set INFERNECE_MODEL to OLLAMA_INFERENCE_MODEL - remove ENALBE_OLLAMA - set images to use "latest" tag than 0.2.15 - Signed-off-by: Wen Zhou <[email protected]>
- remove default port 8321 in sample Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
✅ Branch has been successfully rebased |
cbc3dbc
to
23d0be3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #156 is merged, this file will have the same properties as example-with-custom-config.yaml
, and example-with-configmap will be removed. If this one goes through first I'll update the name:
config/samples/example-withoutconfigmpa.yaml ->
config/samples/example-withoutconfigmap.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i saw #156 is closed, any action still needed for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #156 merges before this PR, this file can be deleted.
where do we stand with this PR ? In general I think its a great addition and we should move forward with it. |
need wait for a new docker image to be released llamastack/distribution-starter:0.2.15 to include llamastack/llama-stack#2516 otherwise all providers are still enabled in image 0.2.14 which requires more env variable to disable them if use starter template