Commit d698d66
authored
【Hackathon 9th Sprint No.9】feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities (#363)
* feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities
This commit implements the Error-aware Speedup Score (ES_t) metric from
Section 3.2.2 of the technical report (arXiv:2510.24035), along with the
mathematical proofs from Appendix B and C that establish the sample-level
validity of both S_t and ES_t metrics.
Key Features:
=============
1. Appendix B Implementation - Sample-level proof for S_t:
- Micro-level calculation: geometric mean of rectified speedups for all samples
- Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
- Cross-validation: both methods produce identical results, proving S_t
is equivalent to the geometric mean of sample-level rectified speedups
2. Appendix C Implementation - Sample-level proof for ES_t:
- Micro-level calculation: geometric mean of error-aware rectified speedups
- Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
- Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
- Cross-validation: validates that ES_t is the geometric mean of
error-aware rectified speedups, where failure samples use type-specific
dynamic penalties instead of fixed penalty b
3. Error-aware design (Section 3.2.2):
- Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
- Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates
runtime crashes, t≥3 tolerates all errors
- Dynamic penalty γ_t adapts based on error type distribution and tolerance level
4. Independent verification script:
- verify_macro_params.py: calculates and prints all macro parameters
(alpha, beta, gamma, lambda, eta, pi) independently
- Enables validation of plot_ESt results by computing each parameter separately
5. Mandatory validation mechanism:
- plot_ESt.py: enforces macro/micro result matching before adoption
- Rejects results if validation fails, ensuring calculation correctness
6. Code refactoring for maintainability:
- macro_statistics.py: dedicated module for macro parameter calculations
- Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
- Reduced nesting levels in analysis_util.py by extracting helper functions
- Simplified scan_all_folders and added .txt file support
- Improved code organization following software engineering best practices
Technical Details:
==================
- Micro calculation: processes each sample individually, applies rectified
speedup rules, then computes geometric mean
- Macro calculation: uses aggregated statistics (correct count, speedup
distributions, error type proportions) to compute expected values
- Validation: compares micro and macro results with tolerance threshold (1e-6)
- All calculations verified against real benchmark data (118 samples)
Files Changed:
==============
- graph_net/analysis_util.py: refactored with helper functions, integrated
macro_statistics module, reduced nesting, simplified scan_all_folders
- graph_net/macro_statistics.py: new module for macro parameter calculations
- graph_net/plot_ESt.py: added mandatory macro/micro validation
- graph_net/verify_macro_params.py: new independent verification script
All code passes pre-commit checks, compiles successfully, and has been
validated with real benchmark data.
* refactor: rename macro to aggregated and improve code quality
This commit refactors the evaluation metrics calculation code with the following improvements:
1. Terminology refactoring: macro -> aggregated
- Rename macro_statistics.py to samples_statistics.py
- Rename verify_macro_params.py to verify_aggregated_params.py
- Update all variable and function names accordingly
2. Code structure improvements
- Extract verification logic in plot_ESt.py into separate functions
* compare_single_tolerance_level (12 lines)
* print_verification_result (1 line)
* verify_aggregated_micro_consistency (28 lines, meets ≤30 line requirement)
- Refactor verify_aggregated_params.py to use functional programming style
* Replace structured loops with list comprehensions
* Use Counter for error type counting
* Reduce multiple traversals to single pass where possible
3. Reduce function parameter coupling
- calculate_beta: derive slowdown_speedups internally from correct_speedups
- calculate_lambda: derive correct_count internally from correct_speedups
- calculate_eta: derive statistics internally from correct_speedups
4. Decouple error type handling
- calculate_pi: accept error_type_counts (dict) instead of hardcoded types
- calculate_gamma: accept generic parameters (tolerance, get_pi, errno_tolerances)
- Support user-defined error codes instead of hardcoded error types
5. Code quality improvements
- Use explicit len() checks instead of implicit boolean conversion
- Use modern Python type hints (list/tuple instead of typing.List/Tuple)
- Improve code readability and maintainability
All changes have been verified and pass pre-commit checks.
* style: apply black formatting to samples_statistics.py and verify_aggregated_params.py
* refactor: unify error type to errno mapping for better sorting
- Replace error_type_counts (dict[str, int]) with errno2count (dict[int, int])
- Add get_errno_from_error_type() to map error type strings to errno (1, 2, 3)
- Add get_error_type_from_errno() for reverse mapping when error type strings are needed
- Update calculate_pi() to use errno2count and return dict[int, float]
- Update calculate_all_aggregated_parameters() to use errno2count and errno_tolerance_thresholds
- Update analysis_util.py and verify_aggregated_params.py to use errno2count
- Improve code maintainability by using integer errno for sorting and comparison
* refactor: split tolerance report generation
* refactor: improve naming and semantics for ES calculation
- Rename verify_es_match_at_tolerance to compare_aggregated_es_and_microscopic_es
- Replace tolerance_level with tolerance parameter
- Replace tolerance_threshold with atol/rtol to avoid confusion
- Rename verify_aggregated_microscopic_consistency to get_verified_aggregated_es_values
- Change return type to dict only (remove all_matched)
- Rename verified_scores to verified_es_values
- Replace micro with microscopic throughout
- Rename check_sample_correctness to get_sample_correctness
- Rename t1 variables to first_errno_tolerance
- Rename es_components to es_constructor_params
- Rename calculate_parameters_for_tolerance to calculate_es_constructor_params_for_tolerance
- Rename custom_map to errno_tolerance_overrides
- Rename errno_as_tolerances to errno2tolerance
- Add enable_aggregation_mode command line option
* feat: add aggregated ES(t) plotting and verification
- Modified plot_ES_results to return fig, ax, all_x_coords for external plotting
- Added manual plotting of aggregated ES(t) curves in main function
- Both microscopic and aggregated curves are plotted on the same graph
- Aggregated curves use dashed lines with square markers for distinction
- All verification checks pass with floating-point precision differences (1.39e-17)
* fix: move ax.legend outside aggregation condition block
- Move ax.legend() outside the aggregation mode condition block
- Ensure legend is always displayed regardless of aggregation mode
- Fix issue where legend was missing when aggregation mode is disabled1 parent 5133cb7 commit d698d66
File tree
4 files changed
+1047
-6
lines changed- graph_net
4 files changed
+1047
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
| 167 | + | |
167 | 168 | | |
168 | 169 | | |
169 | 170 | | |
| |||
330 | 331 | | |
331 | 332 | | |
332 | 333 | | |
| 334 | + | |
333 | 335 | | |
334 | 336 | | |
| 337 | + | |
335 | 338 | | |
336 | 339 | | |
| 340 | + | |
337 | 341 | | |
338 | 342 | | |
339 | 343 | | |
| |||
648 | 652 | | |
649 | 653 | | |
650 | 654 | | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
6 | 163 | | |
7 | 164 | | |
8 | 165 | | |
| |||
93 | 250 | | |
94 | 251 | | |
95 | 252 | | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
| 253 | + | |
100 | 254 | | |
101 | 255 | | |
102 | 256 | | |
| |||
130 | 284 | | |
131 | 285 | | |
132 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
133 | 299 | | |
134 | 300 | | |
135 | 301 | | |
| |||
138 | 304 | | |
139 | 305 | | |
140 | 306 | | |
141 | | - | |
| 307 | + | |
142 | 308 | | |
| 309 | + | |
| 310 | + | |
143 | 311 | | |
144 | 312 | | |
145 | 313 | | |
146 | 314 | | |
147 | 315 | | |
148 | 316 | | |
149 | 317 | | |
| 318 | + | |
| 319 | + | |
150 | 320 | | |
151 | 321 | | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
152 | 350 | | |
153 | 351 | | |
154 | 352 | | |
155 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
156 | 431 | | |
157 | 432 | | |
158 | 433 | | |
| |||
0 commit comments