Skip to content

Commit d422bcf

Browse files
committed
update docstrings
1 parent c632625 commit d422bcf

File tree

1 file changed

+43
-7
lines changed

1 file changed

+43
-7
lines changed

src/cleanlab_codex/validator.py

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@
2323
class BadResponseThresholds(BaseModel):
2424
"""Config for determining if a response is bad.
2525
Each key is an evaluation metric and the value is a threshold such that if the score is below the threshold, the response is bad.
26+
27+
Default Thresholds:
28+
- trustworthiness: 0.5
29+
- response_helpfulness: 0.5
30+
- Any custom eval: 0.5 (if not explicitly specified in bad_response_thresholds)
2631
"""
2732

2833
trustworthiness: float = Field(
@@ -82,15 +87,41 @@ def __init__(
8287
trustworthy_rag_config: Optional[dict[str, Any]] = None,
8388
bad_response_thresholds: Optional[dict[str, float]] = None,
8489
):
85-
"""Evaluates the quality of responses generated in RAG applications and remediates them if needed.
90+
"""Real-time detection and remediation of bad responses in RAG applications, powered by Cleanlab's TrustworthyRAG and Codex.
8691
87-
This object combines Cleanlab's various Evals with thresholding to detect bad responses and remediates them with Codex.
92+
This object combines Cleanlab's TrustworthyRAG evaluation scores with configurable thresholds to detect potentially bad responses
93+
in your RAG application. When a bad response is detected, it automatically attempts to remediate by retrieving an expert-provided
94+
answer from your Codex project.
95+
96+
For most use cases, we recommend using the `validate()` method which provides a complete validation workflow including
97+
both detection and Codex remediation. The `detect()` method is available separately for testing and threshold tuning purposes
98+
without triggering a Codex lookup.
99+
100+
By default, this uses the same default configurations as [`TrustworthyRAG`](/tlm/api/python/utils.rag/#class-trustworthyrag), except:
101+
- Explanations are returned in logs for better debugging
102+
- Only the `response_helpfulness` eval is run
88103
89104
Args:
90-
codex_access_key (str): The [access key](/codex/web_tutorials/create_project/#access-keys) for a Codex project.
91-
tlm_api_key (Optional[str]): The API key for [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag).
92-
trustworthy_rag_config (Optional[dict[str, Any]]): Optional initialization arguments for [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag), which is used to detect response issues.
93-
bad_response_thresholds (Optional[dict[str, float]]): Detection score thresholds used to flag whether or not a response is considered bad. Each key in this dict corresponds to an Eval from TrustworthyRAG, and the value indicates a threshold below which scores from this Eval are considered detected issues. A response is flagged as bad if any issues are detected for it.
105+
codex_access_key (str): The [access key](/codex/web_tutorials/create_project/#access-keys) for a Codex project. Used to retrieve expert-provided answers
106+
when bad responses are detected.
107+
108+
tlm_api_key (str, optional): API key for accessing [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag). If not provided, this must be specified
109+
in trustworthy_rag_config.
110+
111+
trustworthy_rag_config (dict[str, Any], optional): Optional initialization arguments for [TrustworthyRAG](/tlm/api/python/utils.rag/#class-trustworthyrag),
112+
which is used to detect response issues. If not provided, default configuration will be used.
113+
114+
bad_response_thresholds (dict[str, float], optional): Detection score thresholds used to flag whether
115+
a response is considered bad. Each key corresponds to an Eval from TrustworthyRAG, and the value
116+
indicates a threshold (between 0 and 1) below which scores are considered detected issues. A response
117+
is flagged as bad if any issues are detected. If not provided, default thresholds will be used. See
118+
[`BadResponseThresholds`](/codex/api/python/validator/#class-badresponsethresholds) for more details.
119+
120+
Raises:
121+
ValueError: If both tlm_api_key and api_key in trustworthy_rag_config are provided.
122+
ValueError: If bad_response_thresholds contains thresholds for non-existent evaluation metrics.
123+
TypeError: If any threshold value is not a number.
124+
ValueError: If any threshold value is not between 0 and 1.
94125
"""
95126
trustworthy_rag_config = trustworthy_rag_config or get_default_trustworthyrag_config()
96127
if tlm_api_key is not None and "api_key" in trustworthy_rag_config:
@@ -157,7 +188,12 @@ def detect(
157188
prompt: Optional[str] = None,
158189
form_prompt: Optional[Callable[[str, str], str]] = None,
159190
) -> tuple[ThresholdedTrustworthyRAGScore, bool]:
160-
"""Evaluate the response quality using TrustworthyRAG and determine if it is a bad response via thresholding.
191+
"""Score response quality using TrustworthyRAG and flag bad responses based on configured thresholds.
192+
193+
Note:
194+
This method is primarily intended for testing and threshold tuning purposes. For production use cases,
195+
we recommend using the `validate()` method which provides a complete validation workflow including
196+
Codex remediation.
161197
162198
Args:
163199
query (str): The user query that was used to generate the response.

0 commit comments

Comments
 (0)