Skip to content

Conversation

@nathan-weinberg
Copy link
Collaborator

@nathan-weinberg nathan-weinberg commented Oct 16, 2025

What does this PR do?

this commit changes the server start command to be dynamically generated based on Llama Stack version

it allows users to pass a custom run.yaml if they so choose will keeping the official run.yaml we ship with the distro image as a default

Closes #83

Summary by CodeRabbit

  • Refactor
    • Container startup is now configurable at build time; build logic selects an appropriate startup command and emits clearer validation/info when generating container configs.
    • The default runtime argument (path to the run YAML) is now provided separately so the startup command is more flexible.
  • Documentation
    • Added "Running with a custom run YAML" guidance in two places with examples and a dependency notice.

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

Walkthrough

Decouples the container ENTRYPOINT from the run.yaml path: templates now accept a configurable entrypoint and a CMD for /opt/app-root/run.yaml. distribution/build.py adds get_entrypoint() (version-aware), updates containerfile generation to inject the entrypoint and validate/clean output. README adds "Running with a custom run YAML" guidance.

Changes

Cohort / File(s) Summary
Containerfile
distribution/Containerfile
ENTRYPOINT changed from ["llama","stack","run","/opt/app-root/run.yaml"] to ["llama","stack","run"]; new CMD ["/opt/app-root/run.yaml"] added.
Containerfile template
distribution/Containerfile.in
Replaced fixed ENTRYPOINT with templated {entrypoint} and added CMD ["/opt/app-root/run.yaml"].
Build & entrypoint logic
distribution/build.py
Added get_entrypoint(llama_stack_version) using packaging.version; updated generate_containerfile(..., entrypoint) signature and logic to validate template existence, inject entrypoint, prepend a generated-header, and strip blank lines; main now computes and passes entrypoint.
Docs
README.md
Added "Running with a custom run YAML" section (inserted twice) with a podman example and a dependency note about matching image dependencies to custom YAML.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as build.py:main
  participant GE as get_entrypoint()
  participant PV as packaging.version
  participant GC as generate_containerfile()
  participant TF as Containerfile.in
  participant Out as distribution/Containerfile

  CLI->>GE: get_entrypoint(llama_stack_version)
  GE->>PV: parse & compare version (or detect source install)
  PV-->>GE: comparison result
  GE-->>CLI: return templated `entrypoint` string
  CLI->>GC: generate_containerfile(deps, install_flag, entrypoint)
  GC->>TF: load template & substitute `{entrypoint}`
  GC->>GC: prepend header, strip blank lines
  GC->>Out: write formatted Containerfile
  note right of Out#a8dda8: Output now uses configurable ENTRYPOINT + CMD "/opt/app-root/run.yaml"
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review get_entrypoint() logic and version comparison edge-cases in distribution/build.py.
  • Verify template substitution and blank-line stripping behavior in generate_containerfile.
  • Confirm resulting distribution/Containerfile preserves intended quoting/JSON array ENTRYPOINT format.

Poem

🐰
I nudged the entry, set the path free,
versions checked kindly, then wiggled with glee.
Template trimmed neat, CMD tucked in line,
the container hops off, all tidy and fine—
carrots packed, deployment feels divine 🥕

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: allow for dynamic server start command and custom run.yaml" directly and accurately reflects the primary changes in the changeset. The modifications to Containerfile, Containerfile.in, build.py, and README.md all center on implementing these two core features: replacing the hardcoded entrypoint with a version-aware dynamic mechanism and enabling users to supply custom run.yaml files. The title is concise, specific, and clearly communicates the main changes without unnecessary detail or vagueness.
Linked Issues Check ✅ Passed The code changes comprehensively address all four primary objectives from issue #83. The PR implements dynamic entrypoint selection via the new get_entrypoint() function that adapts the startup command based on Llama Stack version [#83], separates ENTRYPOINT and CMD to allow users to override run.yaml by mounting custom files and providing them as CMD arguments [#83], and includes README documentation demonstrating how to use custom run.yaml [#83]. The distribution image now handles custom run.yaml internally rather than requiring operator-level handling [#83]. Version-aware entrypoint logic ensures maintainers need not manually update the command for CLI changes [#83].
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly scoped to implementing issue #83 objectives. The modifications to Containerfile and Containerfile.in implement the dynamic entrypoint and custom run.yaml support; build.py changes provide the version-aware entrypoint selection logic needed for that feature; and README.md additions document how users can leverage the new custom run.yaml capability. No unrelated functionality, refactoring, or features outside the scope of supporting dynamic server startup and custom run.yaml handling are present in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
distribution/build.py (1)

55-73: Consider making the version threshold configurable.

The hardcoded threshold version "0.2.23" (line 64) matches the current CURRENT_LLAMA_STACK_VERSION (line 17). When the version is bumped in the future, maintainers must remember to update this threshold if the entrypoint behavior needs to change, creating a maintenance burden.

Consider extracting the threshold as a module-level constant or deriving it from CURRENT_LLAMA_STACK_VERSION:

# Near the top of the file, after CURRENT_LLAMA_STACK_VERSION
ENTRYPOINT_CHANGE_VERSION = "0.2.23"  # Version where entrypoint changed from python -m to llama stack run

def get_entrypoint(llama_stack_version):
    """Determine the appropriate ENTRYPOINT based on llama-stack version."""
    if is_install_from_source(llama_stack_version):
        return 'ENTRYPOINT ["llama", "stack", "run"]'

    try:
        current_version = version.parse(llama_stack_version)
        threshold_version = version.parse(ENTRYPOINT_CHANGE_VERSION)
        # ... rest of function

Additionally, consider narrowing the exception handling on line 70 from except Exception to except version.InvalidVersion to catch only version parsing errors, allowing other unexpected errors to surface properly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c752da and 07f27cb.

📒 Files selected for processing (3)
  • distribution/Containerfile (1 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-15T14:25:54.837Z
Learnt from: nathan-weinberg
PR: opendatahub-io/llama-stack-distribution#33
File: distribution/Containerfile:17-21
Timestamp: 2025-09-15T14:25:54.837Z
Learning: In the opendatahub-io/llama-stack-distribution repository, the distribution/Containerfile is auto-generated by distribution/build.py based on configuration in build.yaml. When providers are added to build.yaml, the build script automatically regenerates the Containerfile with the required dependencies. Changes to the Containerfile should not be flagged as manual edits if they correspond to legitimate changes in the build configuration.

Applied to files:

  • distribution/Containerfile
  • distribution/Containerfile.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (3)
distribution/build.py (2)

15-15: LGTM!

The packaging.version import is appropriate for semantic version comparison and ensures correct handling of version strings.


195-216: LGTM!

The integration of the entrypoint parameter into generate_containerfile() and the main flow is correct and complete. The function signature update, template substitution, and call site changes properly implement the dynamic entrypoint feature.

Also applies to: 246-250

distribution/Containerfile (1)

62-63: Critical: Environment variable won't expand in CMD exec form.

Docker does not expand environment variables in JSON array (exec) form. The literal string "${APP_ROOT}/run.yaml" will be passed to the entrypoint instead of the expanded path /opt/app-root/run.yaml, causing the default startup to fail.

Solution: Use the literal path in the JSON array:

 ENTRYPOINT ["llama", "stack", "run"]
-CMD ["${APP_ROOT}/run.yaml"]
+CMD ["/opt/app-root/run.yaml"]

Note: Users can still override the config path at runtime with docker run <image> /custom/path/to/run.yaml, which satisfies the PR objective to allow custom run.yaml files.

⛔ Skipped due to learnings
Learnt from: nathan-weinberg
PR: opendatahub-io/llama-stack-distribution#33
File: distribution/Containerfile:17-21
Timestamp: 2025-09-15T14:25:54.837Z
Learning: In the opendatahub-io/llama-stack-distribution repository, the distribution/Containerfile is auto-generated by distribution/build.py based on configuration in build.yaml. When providers are added to build.yaml, the build script automatically regenerates the Containerfile with the required dependencies. Changes to the Containerfile should not be flagged as manual edits if they correspond to legitimate changes in the build configuration.

@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Oct 16, 2025
@nathan-weinberg nathan-weinberg force-pushed the entrypoint branch 2 times, most recently from 2d74b58 to 0382ced Compare October 16, 2025 18:30
@nathan-weinberg nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Oct 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0382ced and a90eb0c.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~69-~69: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2062 characters long)
Context: ...k: \ <path_in_container> ``` > [!IMPORTANT] > The distribution image ship...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~70-~70: Ensure spelling is correct
Context: ...es already pre-installed. There is no guarentee that your custom run YAML will nessesar...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~70-~70: Use a hyphen to join words.
Context: ...There is no guarentee that your custom run YAML will nessesarily work with the ...

(QB_NEW_EN_HYPHEN)


[grammar] ~70-~70: Ensure spelling is correct
Context: ...uarentee that your custom run YAML will nessesarily work with the included dependencies.

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~70-~70: There might be a mistake here.
Context: ...ily work with the included dependencies.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
README.md (1)

59-59: Consider more concise wording for clarity.

The phrase "To do so, run the image in the following way. The "path" mentioned should be the path to your custom run YAML file." is slightly verbose. Consider condensing to:

"To use a custom run YAML, run the image with the path to your file as the final argument, mounted as a volume:"

This removes redundancy and gets straight to the instruction.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a90eb0c and 0d59530.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~69-~69: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2062 characters long)
Context: ...k: \ <path_in_container> ``` > [!IMPORTANT] > The distribution image ship...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~70-~70: Use a hyphen to join words.
Context: ...There is no guarantee that your custom run YAML will necessarily work with the ...

(QB_NEW_EN_HYPHEN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...ily work with the included dependencies.

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (1)
README.md (1)

57-70: Documentation addition is clear and well-aligned with PR objectives.

The new section provides users with clear instructions on how to supply a custom run.yaml file, complete with a practical example and an important caveat about dependency compatibility. The spelling corrections from the previous review ("guarantee" and "necessarily") are in place.

However, verify that the example command syntax (<path_in_container> as a positional argument after the image name) reflects the actual container behavior with the new entrypoint mechanism. This assumes the container ENTRYPOINT is a script or binary that accepts the path as a CMD argument—which aligns with the PR objective to decouple the entrypoint from the run.yaml path.

Confirm that:

  1. The new entrypoint script or binary correctly interprets the <path_in_container> argument
  2. The default behavior (when no custom path is provided) still uses /opt/app-root/run.yaml
  3. The container can run without arguments and still work with the default run.yaml

You may find this easier to verify once the container images are built or if you can review the entrypoint implementation (likely in distribution/build.py or a shell script entrypoint).

Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dependencies we package remain locked: that pretty much only allows for the same set of providers but with different config options or maybe different server level config options, right?

Not sure if this is possible or worth it, but if the deps are locked, it might make sense to validate that people are not changing the providers or introducing a provider with different dependencies. Otherwise this feature will commonly break

@nathan-weinberg
Copy link
Collaborator Author

If the dependencies we package remain locked: that pretty much only allows for the same set of providers but with different config options or maybe different server level config options, right?

Correct

Not sure if this is possible or worth it, but if the deps are locked, it might make sense to validate that people are not changing the providers or introducing a provider with different dependencies. Otherwise this feature will commonly break

See the README changes

@cdoern
Copy link
Collaborator

cdoern commented Oct 16, 2025

@nathan-weinberg

See the README changes

are you pointing me to the readme where it says There is *no* guarantee that your custom run YAML will necessarily work with the included dependencies.? I think this is kind of a non-answer to my question since I am asking if there is anything we can do to avoid these breakages (which will be frequent).

I am suggesting a simple diff against the provided run.yaml to check if the providers changed and if the new providers require net-new dependencies. This to me, seems like introducing a feature which will more often than not cause failures so some sort of guardrails is advisable IMO.

Let me know if this makes sense and is possible! Thanks

@nathan-weinberg
Copy link
Collaborator Author

@nathan-weinberg

See the README changes

are you pointing me to the readme where it says There is *no* guarantee that your custom run YAML will necessarily work with the included dependencies.? I think this is kind of a non-answer to my question since I am asking if there is anything we can do to avoid these breakages (which will be frequent).

Why do we want to avoid them? This is just to let people try their own run YAMLs, we aren't invested into making sure they work. That's why I pointed you here.

I am suggesting a simple diff against the provided run.yaml to check if the providers changed and if the new providers require net-new dependencies. This to me, seems like introducing a feature which will more often than not cause failures so some sort of guardrails is advisable IMO.

Let me know if this makes sense and is possible! Thanks

It's really more of a utility for development - any user trying to actually use this as a production server (either standalone or via the operator) should be using the official run YAML

@cdoern
Copy link
Collaborator

cdoern commented Oct 16, 2025

It's really more of a utility for development

Fair, all I am saying that more often than not this'll break without manual manipulation of the dependencies in the container

@nathan-weinberg
Copy link
Collaborator Author

It's really more of a utility for development

Fair, all I am saying that more often than not this'll break without manual manipulation of the dependencies in the container

True, but note we are already doing these custom run YAMLs from the operator - note this comment llamastack/llama-stack-k8s-operator#171 (comment) that led to the issue that this PR is resolving. This simply allows us to move the functionality from the operator to the distro image, which is preferred.

We can iterate with some additional checking like you've mentioned, but this is more focused on the movement of that functionality and removing some hardcoded things that really shouldn't be hardcoded.

@kami619
Copy link
Collaborator

kami619 commented Oct 16, 2025

@nathan-weinberg do we know what are the common customizations users are expecting to perform by following this path ? For example, I could see the below example you linked, I was imploring to see, if we have any insight into any broader customizations we can test proactively ?

This handles the case where we use the ConfigMap to store the run.yaml, when this happens we need to override the entrypoint to give the path of the run.yaml file.

# Parse current LLS version and compare with threshold LLS version
try:
current_version = version.parse(llama_stack_version)
threshold_version = version.parse("0.2.23")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to avoid that hardcoded conditional and instead use a Git branch for 0.2.23 releases (so that different build.py scripts live on different branches, each specific for the version it creates) ? The idea is that each distribution image knows how to start itself, but a distribution image is connected to a particular LLS version.

We should consider this when we doing real branching and backporting. This is to avoid to pile up legacy stuff that might not be needed anymore in the the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go with this, it's already better to have that here instead of in the operator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with your first point, This is a script for a specific LLS version, it should only know about the ENTRYPOINT for that version

Even if doing backporting, we're unlikely to backport a change to update the entry point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are y'all suggesting a different git branch within this repo for each LLS version we ship at a given time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to branch off for each release and do z-stream releases on that branch. Changes for the next version then just go directly in to main branch, overwriting the previous behaviour (that lives on on the branch)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing that currently for RHOAI releases - trying to do it for Llama Stack releases as well would be a significant maintenance effort

I could just change the hardcoding to what is currently in Llama Stack - however, I feel it makes the container less versatile, in case anyone ever wanted to build the current distro with an older Llama Stack version.

@rhuss
Copy link
Collaborator

rhuss commented Oct 17, 2025

@nathan-weinberg do we know what are the common customizations users are expecting to perform by following this path ? For example, I could see the below example you linked, I was imploring to see, if we have any insight into any broader customizations we can test proactively ?

This handles the case where we use the ConfigMap to store the run.yaml, when this happens we need to override the entrypoint to give the path of the run.yaml file.

The customization is just that you can provide an alternative run.yaml than the default included in the distribution image. For this to work you need (a) to mount a new run.yaml into the container and then (b) provide a startup CMD to point to the run.yaml into the mounted file system. The operator takes care about that, but you can do the same just with vanillad podman, too:

podman run --rm -it \
  --mount type=bind,src="$(pwd)/config",target=/mnt,ro \
  lls-distribution:latest /mnt/run.yaml

assuming you have a ./config/run.yaml prepared.

@derekhiggins
Copy link
Collaborator

Shouldn't we be doing this upstream first and establishing a consensus there ? If the end goal is to remove the code from the operator and how it runs custom images then it will need to work with upstream images also

@nathan-weinberg
Copy link
Collaborator Author

Shouldn't we be doing this upstream first and establishing a consensus there ? If the end goal is to remove the code from the operator and how it runs custom images then it will need to work with upstream images also

I don't quite follow - is upstream creating distribution containers similar to this? We aren't consuming any base container image layers apart from the Red Hat Python3.12 base image, as far as I know

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
distribution/build.py (1)

55-73: Consider using a constant for the version threshold.

The hardcoded "0.2.23" at line 64 creates coupling with CURRENT_LLAMA_STACK_VERSION and will require manual updates each time the entrypoint command changes. As discussed in past reviews, this approach accumulates legacy conditionals over time.

Apply this diff to introduce a named constant:

 CURRENT_LLAMA_STACK_VERSION = "0.2.23"
 LLAMA_STACK_VERSION = os.getenv("LLAMA_STACK_VERSION", CURRENT_LLAMA_STACK_VERSION)
+# Version at which entrypoint changed from python module to llama stack run
+ENTRYPOINT_CHANGE_VERSION = "0.2.23"
 BASE_REQUIREMENTS = [
     f"llama-stack=={LLAMA_STACK_VERSION}",
 ]

Then update line 64:

-        threshold_version = version.parse("0.2.23")
+        threshold_version = version.parse(ENTRYPOINT_CHANGE_VERSION)

This makes the relationship explicit and easier to maintain when future entrypoint changes occur. Based on learnings from past review comments.

README.md (1)

57-70: LGTM: Documentation clearly explains custom run YAML feature with appropriate warning.

The new section provides a clear example for mounting and using a custom run.yaml file. The IMPORTANT note appropriately warns users about potential dependency incompatibilities, which aligns with the concerns raised in PR discussions about locked dependencies.

Optional: Static analysis suggests hyphenating compound modifiers ("custom run-YAML"), though "custom run YAML" is also acceptable in technical documentation. If you prefer strict style conformance:

-### Running with a custom run YAML
+### Running with a custom run-YAML
-> The distribution image ships with various dependencies already pre-installed. There is *no* guarantee that your custom run YAML will necessarily work with the included dependencies.
+> The distribution image ships with various dependencies already pre-installed. There is *no* guarantee that your custom run-YAML will necessarily work with the included dependencies.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d59530 and d973d39.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • distribution/Containerfile (1 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~69-~69: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2062 characters long)
Context: ...k: \ <path_in_container> ``` > [!IMPORTANT] > The distribution image ship...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~70-~70: Use a hyphen to join words.
Context: ...There is no guarantee that your custom run YAML will necessarily work with the ...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (4)
distribution/build.py (3)

15-15: LGTM: Appropriate import for version comparison.

The packaging.version import is the standard approach for semantic version parsing and comparison in Python.


195-216: LGTM: Entrypoint parameter correctly threaded through template.

The updated signature and template formatting properly propagate the dynamic entrypoint to distribution/Containerfile.in. The template validation and blank-line cleanup logic remain sound.


246-250: LGTM: Main flow correctly integrates entrypoint retrieval.

The entrypoint is obtained at the appropriate point in the build flow and passed to generate_containerfile() as expected.

distribution/Containerfile.in (1)

11-12: LGTM: Template variable and CMD correctly enable custom run.yaml.

The {entrypoint} template variable will be substituted by build.py with the version-appropriate ENTRYPOINT, and the separate CMD with a literal path allows users to override the run.yaml at runtime (e.g., podman run <image> /custom/path/to/run.yaml). The use of a literal path in exec form avoids environment variable expansion issues.

@nathan-weinberg nathan-weinberg force-pushed the entrypoint branch 2 times, most recently from acfa515 to b350292 Compare October 20, 2025 17:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
README.md (1)

57-70: Documentation clearly explains the custom run YAML feature.

The section appropriately documents the new capability and includes a necessary warning about dependency compatibility, addressing the concerns raised in PR discussions.

Optional: Consider hyphenating the compound modifier for clarity:

-### Running with a custom run YAML
+### Running with a custom run-YAML

or

-The distribution image allows you to run a custom run YAML file within it.
+The distribution image allows you to run a custom run-YAML file within it.

This is a minor style improvement and is entirely optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acfa515 and b350292.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • distribution/Containerfile (1 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~69-~69: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2062 characters long)
Context: ...k: \ <path_in_container> ``` > [!IMPORTANT] > The distribution image ship...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~70-~70: Use a hyphen to join words.
Context: ...There is no guarantee that your custom run YAML will necessarily work with the ...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (5)
distribution/Containerfile.in (1)

11-12: Template correctly separates ENTRYPOINT from CMD.

The design properly allows the entrypoint to be dynamically determined at build time while enabling users to override the run.yaml path at runtime. The literal path in CMD avoids shell expansion issues.

distribution/build.py (4)

15-15: LGTM!

Appropriate import for semantic version comparison.


195-195: LGTM!

Function signature appropriately extended to accept the dynamic entrypoint parameter.


215-215: LGTM!

Template formatting correctly includes the entrypoint parameter for substitution.


246-250: LGTM!

The main flow correctly integrates the entrypoint logic, retrieving the version-appropriate ENTRYPOINT and passing it to the containerfile generation.

Comment on lines +55 to +73
def get_entrypoint(llama_stack_version):
"""Determine the appropriate ENTRYPOINT based on llama-stack version."""
# If installing from source (commit SHA), use the new entrypoint
if is_install_from_source(llama_stack_version):
return 'ENTRYPOINT ["llama", "stack", "run"]'

# Parse current LLS version and compare with threshold LLS version
try:
current_version = version.parse(llama_stack_version)
threshold_version = version.parse("0.2.23")

if current_version < threshold_version:
return 'ENTRYPOINT ["python", "-m", "llama_stack.core.server.server"]'
else:
return 'ENTRYPOINT ["llama", "stack", "run"]'
except Exception as e:
print(f"Error: Could not parse version {llama_stack_version}: {e}")
sys.exit(1)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace hardcoded threshold version with named constant.

The function logic correctly determines the appropriate ENTRYPOINT based on version. However, line 64 hardcodes "0.2.23", duplicating the CURRENT_LLAMA_STACK_VERSION constant defined on line 17. This creates a maintenance burden if the version changes.

Apply this diff to use the existing constant:

 def get_entrypoint(llama_stack_version):
     """Determine the appropriate ENTRYPOINT based on llama-stack version."""
     # If installing from source (commit SHA), use the new entrypoint
     if is_install_from_source(llama_stack_version):
         return 'ENTRYPOINT ["llama", "stack", "run"]'
 
     # Parse current LLS version and compare with threshold LLS version
     try:
         current_version = version.parse(llama_stack_version)
-        threshold_version = version.parse("0.2.23")
+        threshold_version = version.parse(CURRENT_LLAMA_STACK_VERSION)
 
         if current_version < threshold_version:
             return 'ENTRYPOINT ["python", "-m", "llama_stack.core.server.server"]'
         else:
             return 'ENTRYPOINT ["llama", "stack", "run"]'
     except Exception as e:
         print(f"Error: Could not parse version {llama_stack_version}: {e}")
         sys.exit(1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_entrypoint(llama_stack_version):
"""Determine the appropriate ENTRYPOINT based on llama-stack version."""
# If installing from source (commit SHA), use the new entrypoint
if is_install_from_source(llama_stack_version):
return 'ENTRYPOINT ["llama", "stack", "run"]'
# Parse current LLS version and compare with threshold LLS version
try:
current_version = version.parse(llama_stack_version)
threshold_version = version.parse("0.2.23")
if current_version < threshold_version:
return 'ENTRYPOINT ["python", "-m", "llama_stack.core.server.server"]'
else:
return 'ENTRYPOINT ["llama", "stack", "run"]'
except Exception as e:
print(f"Error: Could not parse version {llama_stack_version}: {e}")
sys.exit(1)
def get_entrypoint(llama_stack_version):
"""Determine the appropriate ENTRYPOINT based on llama-stack version."""
# If installing from source (commit SHA), use the new entrypoint
if is_install_from_source(llama_stack_version):
return 'ENTRYPOINT ["llama", "stack", "run"]'
# Parse current LLS version and compare with threshold LLS version
try:
current_version = version.parse(llama_stack_version)
threshold_version = version.parse(CURRENT_LLAMA_STACK_VERSION)
if current_version < threshold_version:
return 'ENTRYPOINT ["python", "-m", "llama_stack.core.server.server"]'
else:
return 'ENTRYPOINT ["llama", "stack", "run"]'
except Exception as e:
print(f"Error: Could not parse version {llama_stack_version}: {e}")
sys.exit(1)
🤖 Prompt for AI Agents
In distribution/build.py around lines 55 to 73, the threshold version "0.2.23"
is hardcoded; replace that literal with the existing CURRENT_LLAMA_STACK_VERSION
constant defined on line 17 so the comparison uses the shared constant instead
of duplicating the version string; ensure the constant is in scope (import or
reference it directly) and update the version.parse call to use
CURRENT_LLAMA_STACK_VERSION.

@derekhiggins
Copy link
Collaborator

Shouldn't we be doing this upstream first and establishing a consensus there ? If the end goal is to remove the code from the operator and how it runs custom images then it will need to work with upstream images also

I don't quite follow - is upstream creating distribution containers similar to this? We aren't consuming any base container image layers apart from the Red Hat Python3.12 base image, as far as I know

Upstream produces these images https://hub.docker.com/u/llamastack
For those the entrypoint is /usr/local/bin/llama-stack-entrypoint.sh (see https://github.com/llamastack/llama-stack/blob/main/containers/Containerfile )

For those a custom run.yaml is set in $RUN_CONFIG_PATH

In theory the lls operator also supports these images (although I haven't tried it recently). It would make sense to have both images us a common API for setting a custom run.yaml.

this commit changes the server start command to be
dynamically generated based on Llama Stack version

it allows users to pass a custom run.yaml if they so
choose will keeping the official run.yaml we ship with
the distro image as a default

Signed-off-by: Nathan Weinberg <[email protected]>
@nathan-weinberg
Copy link
Collaborator Author

Shouldn't we be doing this upstream first and establishing a consensus there ? If the end goal is to remove the code from the operator and how it runs custom images then it will need to work with upstream images also

I don't quite follow - is upstream creating distribution containers similar to this? We aren't consuming any base container image layers apart from the Red Hat Python3.12 base image, as far as I know

Upstream produces these images https://hub.docker.com/u/llamastack For those the entrypoint is /usr/local/bin/llama-stack-entrypoint.sh (see https://github.com/llamastack/llama-stack/blob/main/containers/Containerfile )

For those a custom run.yaml is set in $RUN_CONFIG_PATH

In theory the lls operator also supports these images (although I haven't tried it recently). It would make sense to have both images us a common API for setting a custom run.yaml.

I agree - we're in a bit of a strange spot in that we build these containers differently that upstream does - I'm wondering if we might be able to merge in the functionality first and try to find alignment with upstream as a followup?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
distribution/build.py (1)

55-73: Add explanatory comment for the threshold version.

The function logic correctly determines the appropriate ENTRYPOINT based on version. The hardcoded "0.2.23" on line 64 represents the historical llama-stack version when the entrypoint command changed upstream (from python -m to llama stack run). This is semantically different from CURRENT_LLAMA_STACK_VERSION, which represents the version being built. However, this distinction is not obvious without a comment.

Add a comment explaining the threshold:

     # Parse current LLS version and compare with threshold LLS version
     try:
         current_version = version.parse(llama_stack_version)
+        # 0.2.23 is when llama-stack upstream changed the entrypoint command
         threshold_version = version.parse("0.2.23")
 
         if current_version < threshold_version:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b350292 and 6cb349a.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • distribution/Containerfile (1 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile.in
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~69-~69: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2062 characters long)
Context: ...k: \ <path_in_container> ``` > [!IMPORTANT] > The distribution image ship...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~70-~70: Use a hyphen to join words.
Context: ...There is no guarantee that your custom run YAML will necessarily work with the ...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (4)
README.md (1)

57-70: Clear documentation for the new custom run YAML feature.

The documentation effectively explains how to mount and use a custom run YAML file, and the IMPORTANT note appropriately warns users about potential dependency incompatibilities with the pre-installed packages.

distribution/Containerfile (1)

62-63: LGTM! Correct separation of ENTRYPOINT and CMD.

Splitting the ENTRYPOINT from CMD follows the standard container pattern and allows users to override just the run YAML path (via CMD) without having to replace the entire ENTRYPOINT. This is the right approach for the custom run YAML feature.

distribution/build.py (2)

195-216: LGTM! Clean integration of entrypoint parameter.

The generate_containerfile signature correctly accepts the new entrypoint parameter and properly substitutes it into the template on line 215. The implementation is straightforward and maintains the existing validation and formatting logic.


246-250: LGTM! Correct wiring of entrypoint generation.

The main function properly computes the entrypoint using get_entrypoint(LLAMA_STACK_VERSION) and passes it to generate_containerfile, maintaining consistency with the existing pattern for dependencies and install commands.

@derekhiggins
Copy link
Collaborator

Upstream produces these images https://hub.docker.com/u/llamastack For those the entrypoint is /usr/local/bin/llama-stack-entrypoint.sh (see https://github.com/llamastack/llama-stack/blob/main/containers/Containerfile )
For those a custom run.yaml is set in $RUN_CONFIG_PATH
In theory the lls operator also supports these images (although I haven't tried it recently). It would make sense to have both images us a common API for setting a custom run.yaml.

I agree - we're in a bit of a strange spot in that we build these containers differently that upstream does - I'm wondering if we might be able to merge in the functionality first and try to find alignment with upstream as a followup?

We should make this decision before changing anything, this PR will change the API into the container, we don't want to have to change it again.

For what its worth I would prefer to see us do it the same way as upstream (call /usr/local/bin/llama-stack-entrypoint.sh ). This will then give us a place to run code in future before the server is started e.g. data migrations, sanity checks, CA bundle setup etc....

@nathan-weinberg
Copy link
Collaborator Author

Upstream produces these images https://hub.docker.com/u/llamastack For those the entrypoint is /usr/local/bin/llama-stack-entrypoint.sh (see https://github.com/llamastack/llama-stack/blob/main/containers/Containerfile )
For those a custom run.yaml is set in $RUN_CONFIG_PATH
In theory the lls operator also supports these images (although I haven't tried it recently). It would make sense to have both images us a common API for setting a custom run.yaml.

I agree - we're in a bit of a strange spot in that we build these containers differently that upstream does - I'm wondering if we might be able to merge in the functionality first and try to find alignment with upstream as a followup?

We should make this decision before changing anything, this PR will change the API into the container, we don't want to have to change it again.

For what its worth I would prefer to see us do it the same way as upstream (call /usr/local/bin/llama-stack-entrypoint.sh ). This will then give us a place to run code in future before the server is started e.g. data migrations, sanity checks, CA bundle setup etc....

Ack, that makes sense - I'll update the implementation to align moreso with what's upstream

@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge Apply to PRs that should not be merged (yet)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update container entrypoint to allow for dynamic startup command and custom run.yaml

6 participants