feat(eval): add evaluation runtime tracking and detailed state metadata #308
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enhanced Metadata Export in vf-eval
Description
This PR enhances the metadata export functionality in the
vf-eval
CLI tool to provide more comprehensive information when running evaluations with the-s
flag. The additional metadata will be useful for analysis and display on the hub. solves #307Key Changes
eval_runtime_seconds
field in metadataparsed_answer
field containing results from Parser.parse_answer()state_metadata
field containing:total_rollouts
field to track total number of rollouts performedType of Change
Testing
Test Coverage
Checklist
Additional Notes
Potential Additional Enhancements
More Detailed Performance Metrics:
Environment-Specific Metadata:
Model Response Analysis:
Enhanced Error Tracking:
System Information: