feat: Add responses and safety impl extra_body #3781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

slekkala1 wants to merge 14 commits into main from new-responses-and-safety

Contributor

slekkala1 commented Oct 10, 2025 •

edited

Loading

What does this PR do?

Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from #3768 (sorry for carrying over to this one)

Test Plan

Added UTs and integration tests

meta-cla bot added the CLA Signed label

slekkala1 marked this pull request as ready for review

October 10, 2025 22:25

slekkala1 requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, terrytangyuan and yanxi0830 as code owners

October 10, 2025 22:25

ashwinb reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py Outdated

    
                      # Shields parameter received via extra_body - not yet implemented

                      if shields is not None:

                          raise NotImplementedError("Shields parameter is not yet implemented in the meta-reference provider")

                      shield_ids = extract_shield_ids(shields) if shields else []

Contributor

ashwinb Oct 13, 2025

given that we would like to reuse the moderations API -- could we reconsider naming this parameter perhaps? I think people use guardrails much more (even in OpenAI's agents-sdk). I wonder if we should use that?

slekkala1 force-pushed the new-responses-and-safety branch 2 times, most recently from 76f0478 to 76b991c Compare

October 13, 2025 19:13

ehhuang reviewed

View reviewed changes

llama_stack/apis/agents/agents.py

    
                  :param type: The type/identifier of the guardrail.

                  """

                  type: str

Contributor

ehhuang Oct 14, 2025

just for my learning: what types are available?

Contributor Author

slekkala1 Oct 14, 2025

not sure about this part @ashwinb, I usually only know identifier for a shield, as such only supporting that for now. May be this is to allow more fields in future.

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/agents.py Outdated

    
                      include: list[str] | None = None,

                      max_infer_iters: int | None = 10,

                      shields: list | None = None,

                      guardrails: list | None = None,

Contributor

ehhuang Oct 14, 2025

could you type the list more exactly?

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/utils.py Show resolved Hide resolved

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/utils.py

    
                      if isinstance(guardrail, str):

                          guardrail_ids.append(guardrail)

                      elif isinstance(guardrail, ResponseGuardrailSpec):

                          guardrail_ids.append(guardrail.type)

Contributor

ehhuang Oct 14, 2025

this seems confusing: type being used as id. Is there a better way to name this?

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/utils.py Outdated Show resolved Hide resolved

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/streaming.py Outdated

    
                      violation_message = await run_multiple_guardrails(self.safety_api, text, self.guardrail_ids)

                      if violation_message:

                          logger.info(f"{context.capitalize()} guardrail violation: {violation_message}")

Contributor

ehhuang Oct 14, 2025

nit: just add this log to run_multiple_guardrails and we don't need this extra wrapper _apply_guardrails function

Contributor Author

slekkala1 Oct 14, 2025

indeed

Contributor Author

slekkala1 Oct 15, 2025

Just when I think I fixed all the redundancies by claude, something still slips...

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/streaming.py

    
                      # Input safety validation - check messages before processing

                      if self.guardrail_ids:

                          combined_text = interleaved_content_as_str([msg.content for msg in self.ctx.messages])

Contributor

ehhuang Oct 14, 2025

should we document somewhere that guardrails only apply to text input?

Contributor Author

slekkala1 Oct 14, 2025 •

edited

Loading

yes the shield + moderation apis dont support the image, this is known tech debt, I filed an issue for that before.

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/streaming.py Outdated Show resolved Hide resolved

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/streaming.py Outdated

    
                                              ) + tool_call.function.arguments

                          # Output Safety Validation for a chunk

                          if chat_response_content:

Contributor

ehhuang Oct 14, 2025

we should just check for self.guardrail_ids first as it makes it clear what this block is for and prevents unnecessary work

ehhuang reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/responses/streaming.py

    
                              accumulated_text = "".join(chat_response_content)

                              violation_message = await self._apply_guardrails(accumulated_text, "output")

                              if violation_message:

                                  yield await self._create_refusal_response(violation_message)

Contributor

ehhuang Oct 14, 2025

Are the output chunks already yielded by this point?

Contributor Author

slekkala1 Oct 14, 2025

ResponseTextDeltaEvent are streamed, the output chunk is not yet streamed by this point. I initially had this check within the delta in the above loop, but apparently that is too expensive, so moved to here for a chunk

Contributor Author

slekkala1 Oct 15, 2025

Updated as discussed to not emit delta events when guardrails are configured

slekkala1 force-pushed the new-responses-and-safety branch 2 times, most recently from e72c6e4 to d462e96 Compare

October 14, 2025 21:53

slekkala1 added 2 commits

October 15, 2025 06:28


          feat: Add responses and safety impl extra_body

181046f


          clean and fix tests

907db22

slekkala1 added 12 commits

October 15, 2025 06:28


          use guardrails and run_moderation api

f1e64d6


          fix tests and remove unwanted changes

9cb65b4


          add recordings

db673c1


          add recording again

e1f1ac6


          fix tests

fb4abb4


          clean

edc273c


          improve user message

bf532db


          fix test


          address comments

592b449


          fix tests

f65b770


          add explicit types

a522bfc


          skip emitting deltas

a9ebdfe

slekkala1 force-pushed the new-responses-and-safety branch from d462e96 to a9ebdfe Compare

October 15, 2025 13:28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ashwinb ashwinb left review comments

ehhuang ehhuang left review comments

yanxi0830 Awaiting requested review from yanxi0830 yanxi0830 is a code owner

hardikjshah Awaiting requested review from hardikjshah hardikjshah is a code owner

raghotham Awaiting requested review from raghotham raghotham is a code owner

terrytangyuan Awaiting requested review from terrytangyuan terrytangyuan is a code owner

leseb Awaiting requested review from leseb leseb is a code owner

bbrowning Awaiting requested review from bbrowning bbrowning is a code owner

reluctantfuturist Awaiting requested review from reluctantfuturist reluctantfuturist is a code owner

mattf Awaiting requested review from mattf mattf is a code owner

franciscojavierarceo Awaiting requested review from franciscojavierarceo franciscojavierarceo is a code owner

At least 1 approving review is required to merge this pull request.

Labels