feat: new eval schema changes #685

radu-mocanu · 2025-10-13T14:18:35Z

Done:

added new evaluators
support new schema
wired up platform evaluators
support for custom evaluators

Tip: check calculator sample

ToDo:

rename from coded to legacy evarywhere

How to test:
example contains evaluator spec:

 {
  "version": "1.0",
  "id": "DenialCodeContains",
  "description": "Checks if the response text includes the expected denial code.",
  "evaluatorTypeId": "uipath-contains",
  "evaluatorConfig": {
    "name": "Denial Code Contains",
    "targetOutputKey": "report",
    "negated": true,
    "ignoreCase": false,
    "defaultEvaluationCriteria": {
      "searchText": "mock report"
    }
  }
}

example eval set:

{
  "version": "1.0",
  "id": "ClaimDenialReview",
  "name": "Claim Denial Review",
  "evaluatorRefs": [
    "DenialCodeContains"
  ],
  "evaluations": [
    {
      "id": "denial-default",
      "name": "Respond with default denial code",
      "inputs": {
        "human_input": "Customer asks for the denial code on claim XFC-01."
      },
      "evaluationCriterias": {
        "DenialCodeContains": null
      }
    },
    {
      "id": "denial-override",
      "name": "Respond with override denial code",
      "inputs": {
        "human_input": "Customer asks if claim XFC-02 was denied and why."
      },
      "evaluationCriterias": {
        "DenialCodeContains": {
          "searchText": "mock sreport 1234 lala"
        }
      }
    },
    {
      "id": "denial-skip",
      "name": "Skip denial code check",
      "inputs": {
        "human_input": "Customer checks status of claim XFC-03 with no denial expected."
      },
      "evaluationCriterias": {}
    }
  ]
}

…issues Address Copilot comments for coded evals

feat: wiring ExactMatch evaluator to new schema

feat: wiring JsonSimilarity evaluator to new schema

- Add version property detection to distinguish coded-evals from legacy files - Update pull command to map coded-evals folder to local evals structure - Update push command to upload files with version property to coded-evals folder - Maintain backward compatibility with legacy evals folder structure - Ensure eval command works out of the box with existing structure fix: resolve eval set path for correct evaluator discovery - Update load_eval_set to return both evaluation set and resolved path - Fix evaluator discovery by using resolved path instead of original path - Ensure eval command works with files in evals/eval-sets/ and evals/evaluators/ fix: cleaning up files fix: address PR review comments 1. Move eval_set path resolution from runtime to CLI layer - Resolve path in cli_eval.py before creating runtime - Remove context update in runtime since path is already resolved - Better separation of concerns 2. Clarify directory structure comments - Make it explicit that os.path.join produces {self.directory}/evals/evaluators/ - Prevent confusion about directory paths 3. Add file deletion consistency for evaluation files - Delete remote evaluation files when deleted locally - Matches behavior of source file handling - Ensures consistency across all file types Addresses: #681 (review) Addresses: #681 (review) Addresses: #681 (comment)

feat: adding pull and push for coded-evals folder files

feat: wiring LLM judge evaluators to new schema

…_evals feat: Cherry-pick progress on parallelization of eval runs

andrei-rusu and others added 4 commits October 10, 2025 16:54

add initial version of revamped coded evaluators

2e24fe0

fix copilot and linting issues

0b44a7c

Merge pull request #677 from UiPath/fix/andreiru/coded_evals_copilot_…

1b016c1

…issues Address Copilot comments for coded evals

feat: new eval schema support + contain evaluator wiring

333821b

radu-mocanu force-pushed the release/revamped-evals branch from a3e9908 to 333821b Compare October 13, 2025 14:46

mjnovice and others added 9 commits October 14, 2025 16:21

feat: wiring ExactMatch evaluator to new schema

860c88f

Merge pull request #690 from UiPath/mj/wire-exact-match

fd324b0

feat: wiring ExactMatch evaluator to new schema

feat: wiring JsonSimilarity evaluator to new schema

74bc147

Merge pull request #692 from UiPath/mj/wire-json-similarity

ef92ef5

feat: wiring JsonSimilarity evaluator to new schema

feat: wiring LLM judge evaluators to new schema

e05bd98

Merge pull request #681 from UiPath/feat/updating-push-pull

2205d60

feat: adding pull and push for coded-evals folder files

feat: progress on parallelization of eval runs

7a2937d

Merge pull request #697 from UiPath/mj/wire-llm-as-a-judge

931a4c6

feat: wiring LLM judge evaluators to new schema

Chibionos changed the title ~~Release/revamped evals~~ feat: new eval schema changes Oct 16, 2025

Chibionos requested a review from akshaylive October 16, 2025 12:16

Merge pull request #704 from UiPath/dev/andreiru/cherry_pick_parallel…

962f07d

…_evals feat: Cherry-pick progress on parallelization of eval runs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: new eval schema changes #685

feat: new eval schema changes #685

Uh oh!

radu-mocanu commented Oct 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: new eval schema changes #685

Are you sure you want to change the base?

feat: new eval schema changes #685

Uh oh!

Conversation

radu-mocanu commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

radu-mocanu commented Oct 13, 2025 •

edited

Loading