Skip to content

Conversation

radu-mocanu
Copy link
Contributor

@radu-mocanu radu-mocanu commented Oct 13, 2025

Done:

  • added new evaluators
  • support new schema
  • wired up platform evaluators
  • support for custom evaluators

Tip: check calculator sample

ToDo:

  • rename from coded to legacy evarywhere

How to test:
example contains evaluator spec:

 {
  "version": "1.0",
  "id": "DenialCodeContains",
  "description": "Checks if the response text includes the expected denial code.",
  "evaluatorTypeId": "uipath-contains",
  "evaluatorConfig": {
    "name": "Denial Code Contains",
    "targetOutputKey": "report",
    "negated": true,
    "ignoreCase": false,
    "defaultEvaluationCriteria": {
      "searchText": "mock report"
    }
  }
}

example eval set:

{
  "version": "1.0",
  "id": "ClaimDenialReview",
  "name": "Claim Denial Review",
  "evaluatorRefs": [
    "DenialCodeContains"
  ],
  "evaluations": [
    {
      "id": "denial-default",
      "name": "Respond with default denial code",
      "inputs": {
        "human_input": "Customer asks for the denial code on claim XFC-01."
      },
      "evaluationCriterias": {
        "DenialCodeContains": null
      }
    },
    {
      "id": "denial-override",
      "name": "Respond with override denial code",
      "inputs": {
        "human_input": "Customer asks if claim XFC-02 was denied and why."
      },
      "evaluationCriterias": {
        "DenialCodeContains": {
          "searchText": "mock sreport 1234 lala"
        }
      }
    },
    {
      "id": "denial-skip",
      "name": "Skip denial code check",
      "inputs": {
        "human_input": "Customer checks status of claim XFC-03 with no denial expected."
      },
      "evaluationCriterias": {}
    }
  ]
}

@radu-mocanu radu-mocanu force-pushed the release/revamped-evals branch from a3e9908 to 333821b Compare October 13, 2025 14:46
mjnovice and others added 9 commits October 14, 2025 16:21
feat: wiring ExactMatch evaluator to new schema
feat: wiring JsonSimilarity evaluator to new schema
- Add version property detection to distinguish coded-evals from legacy files
- Update pull command to map coded-evals folder to local evals structure
- Update push command to upload files with version property to coded-evals folder
- Maintain backward compatibility with legacy evals folder structure
- Ensure eval command works out of the box with existing structure

fix: resolve eval set path for correct evaluator discovery

- Update load_eval_set to return both evaluation set and resolved path
- Fix evaluator discovery by using resolved path instead of original path
- Ensure eval command works with files in evals/eval-sets/ and evals/evaluators/

fix: cleaning up files

fix: address PR review comments

1. Move eval_set path resolution from runtime to CLI layer
   - Resolve path in cli_eval.py before creating runtime
   - Remove context update in runtime since path is already resolved
   - Better separation of concerns

2. Clarify directory structure comments
   - Make it explicit that os.path.join produces {self.directory}/evals/evaluators/
   - Prevent confusion about directory paths

3. Add file deletion consistency for evaluation files
   - Delete remote evaluation files when deleted locally
   - Matches behavior of source file handling
   - Ensures consistency across all file types

Addresses: #681 (review)
Addresses: #681 (review)
Addresses: #681 (comment)
feat: adding pull and push for coded-evals folder files
feat: wiring LLM judge evaluators to new schema
@Chibionos Chibionos changed the title Release/revamped evals feat: new eval schema changes Oct 16, 2025
@Chibionos Chibionos requested a review from akshaylive October 16, 2025 12:16
…_evals

feat: Cherry-pick progress on parallelization of eval runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants