Skip to content

Conversation

@dominicchapman
Copy link
Member

@dominicchapman dominicchapman commented Nov 19, 2025

New "evaluation" content:

  • Updated workflow language from "Measure" to "Evaluate" to better reflect our approach
  • Reorganized evaluation content into a dedicated section with six focused pages (overview, setup, write evaluations, flags & experiments, run evaluations, analyze results)

Other changes:

  • Concepts: Added definitions for flags and experiments; integrated AI capability architecture spectrum (single-turn → workflows → single-agent → multi-agent)
  • Create: De-emphasized experimental prompt management features while clarifying Axiom's current focus on evaluation and observability; added references to Vercel AI SDK examples and Mastra as framework alternatives
  • Iterate: Complete rewrite introducing the systematic improvement loop; added sections on user feedback capture and domain expert annotation workflows (marked as coming soon); reorganized failure categorization by severity for better prioritization
  • Quickstart: Updated to reference evaluation framework and CLI authentication; improved "What's next" guidance

This was referenced Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants