-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add responses and safety impl extra_body #3781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Shields parameter received via extra_body - not yet implemented | ||
if shields is not None: | ||
raise NotImplementedError("Shields parameter is not yet implemented in the meta-reference provider") | ||
shield_ids = extract_shield_ids(shields) if shields else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we would like to reuse the moderations API -- could we reconsider naming this parameter perhaps? I think people use guardrails
much more (even in OpenAI's agents-sdk). I wonder if we should use that?
76f0478
to
76b991c
Compare
:param type: The type/identifier of the guardrail. | ||
""" | ||
|
||
type: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my learning: what types are available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this part @ashwinb, I usually only know identifier
for a shield, as such only supporting that for now. May be this is to allow more fields in future.
include: list[str] | None = None, | ||
max_infer_iters: int | None = 10, | ||
shields: list | None = None, | ||
guardrails: list | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you type the list
more exactly?
if isinstance(guardrail, str): | ||
guardrail_ids.append(guardrail) | ||
elif isinstance(guardrail, ResponseGuardrailSpec): | ||
guardrail_ids.append(guardrail.type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems confusing: type
being used as id
. Is there a better way to name this?
llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
|
||
violation_message = await run_multiple_guardrails(self.safety_api, text, self.guardrail_ids) | ||
if violation_message: | ||
logger.info(f"{context.capitalize()} guardrail violation: {violation_message}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just add this log to run_multiple_guardrails
and we don't need this extra wrapper _apply_guardrails
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just when I think I fixed all the redundancies by claude, something still slips...
|
||
# Input safety validation - check messages before processing | ||
if self.guardrail_ids: | ||
combined_text = interleaved_content_as_str([msg.content for msg in self.ctx.messages]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document somewhere that guardrails only apply to text input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the shield + moderation apis dont support the image, this is known tech debt, I filed an issue for that before.
llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
Outdated
Show resolved
Hide resolved
) + tool_call.function.arguments | ||
|
||
# Output Safety Validation for a chunk | ||
if chat_response_content: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just check for self.guardrail_ids
first as it makes it clear what this block is for and prevents unnecessary work
accumulated_text = "".join(chat_response_content) | ||
violation_message = await self._apply_guardrails(accumulated_text, "output") | ||
if violation_message: | ||
yield await self._create_refusal_response(violation_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the output chunks already yielded by this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResponseTextDeltaEvent
are streamed, the output chunk is not yet streamed by this point. I initially had this check within the delta in the above loop, but apparently that is too expensive, so moved to here for a chunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as discussed to not emit delta events when guardrails are configured
e72c6e4
to
d462e96
Compare
d462e96
to
a9ebdfe
Compare
What does this PR do?
Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from #3768 (sorry for carrying over to this one)
Test Plan
Added UTs and integration tests