Skip to content

Conversation

zdtsw
Copy link
Contributor

@zdtsw zdtsw commented Jul 7, 2025


need wait for a new docker image to be released llamastack/distribution-starter:0.2.15 to include llamastack/llama-stack#2516 otherwise all providers are still enabled in image 0.2.14 which requires more env variable to disable them if use starter template

@zdtsw zdtsw marked this pull request as draft July 8, 2025 10:05
Copy link

mergify bot commented Jul 15, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @zdtsw please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 15, 2025
@mergify mergify bot removed the needs-rebase label Jul 20, 2025
@zdtsw zdtsw marked this pull request as ready for review July 20, 2025 14:07
README.md Outdated
env:
- name: INFERENCE_MODEL
value: "llama3.2:1b"
value: "llama3.2:3b"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this doesn't need to be changed from 1b to 3b - 1b should continue to work just fine

However, INFERENCE_MODEL will need to change to OLLAMA_INFERENCE_MODEL, see https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/starter.html#model-configuration

env:
- name: INFERENCE_MODEL
value: 'llama3.2:1b'
value: 'llama3.2:3b'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here regarding 3b versus 1b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

url: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
models:
- model_id: "llama3.2:1b"
- model_id: "ollama/llama3.2:3b"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about 1b versus 3b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the ollama prefix here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Comment on lines 43 to 38
- name: ENABLE_OLLAMA
value: ollama
- name: OLLAMA_EMBEDDING_MODEL
value: all-minilm:l6-v2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OLLAMA_INFERENCE_MODEL need to be added here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_OLLAMA is now removed,
as for OLLAMA_INFERENCE_MODEL think it will be passed from Configmap llama-stack-config right? line 19-22

port: 8321
env:
- name: OLLAMA_INFERENCE_MODEL
value: "llama3.2:3b"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment around 3b versus 1b here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

"tgi": "docker.io/llamastack/distribution-tgi:latest",
"together": "docker.io/llamastack/distribution-together:latest",
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest"
"starter": "docker.io/llamastack/distribution-starter:0.2.15",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep it on latest ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, all set to use latest tag now

Copy link
Contributor Author

@zdtsw zdtsw Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did not add

  "starter-gpu": "docker.io/llamastack/distribution-starter-gpu:latest",
  "dell": "docker.io/llamastack/distribution-dell:latest" 

in this file (ref https://github.com/llamastack/llama-stack-ops/blob/main/actions/publish-dockers/main.sh#L71) can have a follow up PR for these 2

providers:
inference:
- provider_id: ollama
- provider_id: ${env.ENABLE_OLLAMA:=__disabled__}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep ollama

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

capabilities:
drop:
- "ALL"
- "ALL"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? Please try to keep your changes focused on the PR. It would be great to check before submitting the PR that your AI assistant doesn't touch things outside the scope of the PR (even when it makes changes, please open another PR for changing unrelated stuff).

Copy link
Contributor Author

@zdtsw zdtsw Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the change came from IDE/AI, mainly the reason for this change was to align with indentation in others lines in this yaml file.
I can revert it here

README.md Outdated
3. Verify the server pod is running in the user defined namespace.

### Using a ConfigMap for run.yaml configuration
### Using a ConfigMap to override default run.yaml configuration from distribution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

model_id: ${env.ENABLE_OLLAMA:=__disabled__}/${env.OLLAMA_EMBEDDING_MODEL:=__disabled__}
provider_id: ${env.ENABLE_OLLAMA:=__disabled__}
provider_model_id: ${env.OLLAMA_EMBEDDING_MODEL:=__disabled__}
model_type: embedding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this extra entry ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the LLS moved away from the ENABLE_OLLAMA "trick", and just uses OLLAMA_URL. See the run.yaml of the starter distribution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think when this PR was opened it was based on 0.2.15,
changes after that release might not be well captured here
i've removed extral changes, to only set OLLAMA_INFERENCE_MODEL and OLLAMA_URL

README.md Outdated
- name: OLLAMA_URL
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
- name: ENABLE_OLLAMA
value: ollama
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this value for the name a bit odd... 😄 but I guess is outside of scope here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value: 'llama3.2:3b'
- name: OLLAMA_URL
value: 'http://ollama-server-service.ollama-dist.svc.cluster.local:11434'
- name: ENABLE_OLLAMA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

value: all-minilm:l6-v2
userConfig:
configMapName: llama-stack-config
configMapName: llama-stack-config # use ConfigMap's data.run.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

matzew added a commit to matzew/llama-stack-stack that referenced this pull request Aug 28, 2025
…ama-stack-k8s-operator#96. Seems like ollama distribution is not longer maintained and switching to 'starter', and adjustments for OLLAMA env settings

Signed-off-by: Matthias Wessendorf <[email protected]>
Copy link
Collaborator

@rhuss rhuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdtsw thanks a lot for the PR! It looks good in general.

Since you have created it with the help of an AI-assistant (which is fine!), please help us reviewers by using some guidelines as laid out in rvc-guide.github.io. E.g. this PR touches quite some things outside of its scope (which is an doc update).

Especially these advices might be helpful here:

Stay on target – avoid irrelevant code changes: AI models sometimes get “distracted”, modifying parts of the codebase that you didn’t ask to change (like drive-by “improvements” that are not needed). Such changes can confuse reviewers and bloat your PR. Avoid touching code that is not relevant to the task at hand. Each PR should have a laser-focused diff - if you notice unrelated edits (no matter how well-intentioned by the AI), strip them out.

Do a thorough self-review and test your changes: Never treat an AI-generated patch as ready to go without reviewing it yourself. Before requesting a peer review, inspect and verify the code yourself. This means reading through the diff carefully, understanding every change, and checking that it solves the problem without breaking anything else. Try to run the code end-to-end, and run all relevant unit tests or sample scenarios. If the project has automated test suites or linters, run those tools locally to catch obvious issues. You are responsible for the code you submit, whether you or an AI wrote it, and AI tools are no substitute for your own review of code quality, correctness, style, security, and licensing.

Copy link
Contributor Author

@zdtsw zdtsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did udpates to address review comments

capabilities:
drop:
- "ALL"
- "ALL"
Copy link
Contributor Author

@zdtsw zdtsw Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the change came from IDE/AI, mainly the reason for this change was to align with indentation in others lines in this yaml file.
I can revert it here

env:
- name: INFERENCE_MODEL
value: 'llama3.2:1b'
value: 'llama3.2:3b'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

README.md Outdated
3. Verify the server pod is running in the user defined namespace.

### Using a ConfigMap for run.yaml configuration
### Using a ConfigMap to override default run.yaml configuration from distribution
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

providers:
inference:
- provider_id: ollama
- provider_id: ${env.ENABLE_OLLAMA:=__disabled__}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

value: 'llama3.2:3b'
- name: OLLAMA_URL
value: 'http://ollama-server-service.ollama-dist.svc.cluster.local:11434'
- name: ENABLE_OLLAMA
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

"tgi": "docker.io/llamastack/distribution-tgi:latest",
"together": "docker.io/llamastack/distribution-together:latest",
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest"
"starter": "docker.io/llamastack/distribution-starter:0.2.15",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, all set to use latest tag now

"tgi": "docker.io/llamastack/distribution-tgi:latest",
"together": "docker.io/llamastack/distribution-together:latest",
"vllm-gpu": "docker.io/llamastack/distribution-vllm-gpu:latest"
"starter": "docker.io/llamastack/distribution-starter:0.2.15",
Copy link
Contributor Author

@zdtsw zdtsw Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did not add

  "starter-gpu": "docker.io/llamastack/distribution-starter-gpu:latest",
  "dell": "docker.io/llamastack/distribution-dell:latest" 

in this file (ref https://github.com/llamastack/llama-stack-ops/blob/main/actions/publish-dockers/main.sh#L71) can have a follow up PR for these 2

Comment on lines 43 to 38
- name: ENABLE_OLLAMA
value: ollama
- name: OLLAMA_EMBEDDING_MODEL
value: all-minilm:l6-v2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_OLLAMA is now removed,
as for OLLAMA_INFERENCE_MODEL think it will be passed from Configmap llama-stack-config right? line 19-22

url: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
models:
- model_id: "llama3.2:1b"
- model_id: "ollama/llama3.2:3b"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

README.md Outdated
- name: OLLAMA_URL
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
- name: ENABLE_OLLAMA
value: ollama
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdtsw
Copy link
Contributor Author

zdtsw commented Aug 28, 2025

@zdtsw thanks a lot for the PR! It looks good in general.

Since you have created it with the help of an AI-assistant (which is fine!), please help us reviewers by using some guidelines as laid out in rvc-guide.github.io. E.g. this PR touches quite some things outside of its scope (which is an doc update).

Especially these advices might be helpful here:

Stay on target – avoid irrelevant code changes: AI models sometimes get “distracted”, modifying parts of the codebase that you didn’t ask to change (like drive-by “improvements” that are not needed). Such changes can confuse reviewers and bloat your PR. Avoid touching code that is not relevant to the task at hand. Each PR should have a laser-focused diff - if you notice unrelated edits (no matter how well-intentioned by the AI), strip them out.

Do a thorough self-review and test your changes: Never treat an AI-generated patch as ready to go without reviewing it yourself. Before requesting a peer review, inspect and verify the code yourself. This means reading through the diff carefully, understanding every change, and checking that it solves the problem without breaking anything else. Try to run the code end-to-end, and run all relevant unit tests or sample scenarios. If the project has automated test suites or linters, run those tools locally to catch obvious issues. You are responsible for the code you submit, whether you or an AI wrote it, and AI tools are no substitute for your own review of code quality, correctness, style, security, and licensing.

Thanks for the review comments and suggestions!
original commit was made a bit long time ago, some parts are out of date to catch the latest code in llama-stack.
my initial plan was to only updates README.md but as usual it became a snowball PR.
I will keep this in mind in the future.
I keep the change in the samples, but if these should be moved to a dedicated PR, do let me know i can split them out.

distribution:
name: starter
containerSpec:
port: 8321
Copy link
Contributor

@matzew matzew Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed the port in the README`, perhaps here also not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is a default one, not needed.
let me remove it as well

- model_id: "llama3.2:1b"
provider_id: ollama
model_type: llm
provider_model_id: llama3.2:1b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Since above is already a model_id and a provider_id ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, i can remove it

README.md Outdated
containerSpec:
port: 8321
env:
- name: INFERENCE_MODEL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be also OLLAMA_INFERENCE_MODEL like in the sample change below ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right!
let me fix this

@zdtsw zdtsw force-pushed the chore_2 branch 3 times, most recently from 1950dad to cbc3dbc Compare September 2, 2025 08:33
@VaishnaviHire
Copy link
Collaborator

@nathan-weinberg @rhuss Is it okay to merge this PR ?

@VaishnaviHire
Copy link
Collaborator

@mergify rebase

- update example and create one without using userconfigmap
- set new env to enable ollama
- use the same llama model as in llama-stack
- remove deprecated distro images from distribution.json

Signed-off-by: Wen Zhou <[email protected]>
- revert back to use llama3.2:1b
- remove unnecessary/unrelated comments/changes
- set INFERNECE_MODEL to OLLAMA_INFERENCE_MODEL
- remove ENALBE_OLLAMA
- set images to use "latest" tag than 0.2.15
-

Signed-off-by: Wen Zhou <[email protected]>
- remove default port 8321 in sample

Signed-off-by: Wen Zhou <[email protected]>
Copy link

mergify bot commented Sep 4, 2025

rebase

✅ Branch has been successfully rebased

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #156 is merged, this file will have the same properties as example-with-custom-config.yaml, and example-with-configmap will be removed. If this one goes through first I'll update the name:
config/samples/example-withoutconfigmpa.yaml ->
config/samples/example-withoutconfigmap.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i saw #156 is closed, any action still needed for this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #156 merges before this PR, this file can be deleted.

@rhuss
Copy link
Collaborator

rhuss commented Oct 13, 2025

where do we stand with this PR ? In general I think its a great addition and we should move forward with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants