feat(benchmark): add `benchmark_test` and `benchmark_state_test` test type #1945

LouisTsai-Csie · 2025-07-24T07:59:30Z

🗒️ Description

As EIP-7825 is introduced in Fusaka upgrade, most of the legacy test case would fail. This issue add two test wrappers, benchmark_test and benchmark_state_test, to replace pure blockchain_test and state_test test type.

🔗 Related Issues or PRs

Issue #1896

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

LouisTsai-Csie · 2025-08-14T16:30:25Z

There are some issue in generating the fixture. I compare to the newly created fixture, and the size is much larger than the original one. This should not happen and there should be the same content, so the same size. But this is not a big problem now.

The major issue now is to resolve the failing test in CI, which I could not reproduce now locally.

CPerezz · 2025-08-29T12:59:04Z

This can come in handy for benchmark tests as basically they force the consumption of all the gas available. And that condition forces us to implement padding techniques to consume EXACTLY all the gas available in a block.

When in reality, for a benchmark, we don't care about this at all.
PRs affected:

LouisTsai-Csie · 2025-08-29T16:09:39Z

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

CPerezz · 2025-08-30T12:07:43Z

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

But you can emit a warning if needed. Why does it need to be a failure not spending ALL the gas exactly? I agree it has to be within a bound. Sure. But to the unit in precision is really different. Specially when you have to account for mem expansion and other costs. It's almost impossible to not need padding.

I'm not advocating to remove this completely. But to relax it maybe. Or at least, it would be useful to know why does it need to fail specifically? When and Why was this introduced?

LouisTsai-Csie · 2025-08-30T15:58:47Z

@CPerezz Thank you for explanation, it is very clear! I will review the features included again and discuss with the team.

As you see this is still a draft and we welcome any feedback, we also want to know what does stateless client team need for benchmarking, what's your consideration when benchmarking?

CPerezz · 2025-09-01T05:33:49Z

@LouisTsai-Csie So I'm just speaking in regards of "State bottlenecks" project. Which is within the stateless-consensus team. Our goal is to measure how different client impls behave when under heavy load and different state sizes among other things.

For that, we need these kind of benchmarks. But it results quite tricky to match perfectly the gas spent. And it's not required at all to be spent. 1% of wiggle room is enough to consider the benchmark useful even if it doesn't spend all the gas of the block.

marioevz · 2025-09-08T18:56:14Z

src/ethereum_test_specs/benchmark.py

+    pre: Alloc
+    post: Alloc
+    tx: Optional[Transaction] = None
+    blocks: Optional[List[Block]] = None


Re #2112, I think we could have setup_tx and setup_blocks perhaps which contain transactions that are specifically part of the benchmark setup.

The main problem I see is that, currently we do pre.fund_eoa for both (1) accounts that send these setup transactions and (2) accounts that send the actual benchmarking workload transactions, and they are indistinguishable at the moment.

One option could be to add a field to pre.fund_eoa that indicates whether the account is meant to send setup transactions or workload transactions, so we can fund this transaction only in the setup phase of execute:

setup_account = pre.fund_eoa(account_type="setup")

Downside being that the test writer needs to be cognizant of this and properly label all accounts.

Just spitballing here but what if we have context managers manage each phase for benchmark tests?

@pytest.mark.benchmark def test_some_benchmark(benchmark, pre, blockchain_test): with benchmark.setup(): # Auto-tagged as setup setup_contract = pre.deploy_contract(...) contract_under_test = pre.deploy_contract(code=..., storage=..., stub="...") setup_acct = pre.fund_eoa() setup_block = Block(txs=[ Transaction(...), Transaction(...), ]) with benchmark.execution(): # Auto-tagged as execution acct1 = pre.fund_eoa() # for execute remote this is the seed / private key sender? execution_block = Block(txs=[ Transaction(...), ]) blockchain_test(...)

One possible way I've used this in the past is tracking certain contexts with ContextVar. This can be reset with every test and could be used in a try / finally sort of block. Downside (but maybe a plus?) is you also have to be explicit about each phase and this may not always work out to be so deterministic 🤔. These are things that would have to be determined anyway though I think with any sort of phase management.

This would be a very nice solution. If we could make it so that the default context is execution (or workload perhaps?) I think that would be great.

marioevz

After going through the current implementation and thinking about it I think this PR is mostly on the right track.

My suggestions would be:

We have a single new spec benchmark_tests that receives setup_txs and workload_txs, or a generator.
We have multiple generator subclasses all of which subclass BenchmarkCodeGenerator and and implement generate_setup_txs and generate_workload_txs (and perhaps deploy_contracts).
Internally benchmark_tests takes setup_txs (or calls generator.generate_setup_txs()) and, if any, generates a first setup block, and then takes workload_txs (or calls generator.generate_workload_txs()) and puts them in the a different block.

marioevz · 2025-09-09T21:53:22Z

src/ethereum_test_specs/benchmark_state.py

I'm leaning more towards removing benchmark_state and leaving only benchmark, because it feels like the state format is heavily constrained by the transaction gas limit cap, and it's simply more work to introduce two different formats and it's also confusing to testers who would have to know which one to use each time.

marioevz · 2025-09-09T22:02:41Z

src/ethereum_test_tools/benchmark_code_generator.py

+class BenchmarkCodeGenerator(ABC):
+    """Abstract base class for generating benchmark bytecode."""
+
+    def __init__(
+        self,
+        fork: Fork,
+        attack_block: Bytecode,
+        setup: Optional[Bytecode] = None,
+    ):
+        """Initialize with fork, attack block, and optional setup bytecode."""
+        self.fork = fork
+        self.setup = setup or Bytecode()
+        self.attack_block = attack_block


If we decide to stick with this kind of abstract class, we can refactor this to be dataclass.

LouisTsai-Csie self-assigned this Jul 24, 2025

LouisTsai-Csie added feature:benchmark type:feat type: Feature labels Jul 24, 2025

LouisTsai-Csie mentioned this pull request Aug 5, 2025

test: add max block size test using access lists #1932

Merged

8 tasks

LouisTsai-Csie force-pushed the benchmark-test-type branch 2 times, most recently from 641036c to af00ec2 Compare August 8, 2025 10:07

LouisTsai-Csie marked this pull request as ready for review August 11, 2025 09:52

LouisTsai-Csie force-pushed the benchmark-test-type branch from af00ec2 to de7f485 Compare August 14, 2025 12:50

LouisTsai-Csie marked this pull request as draft August 14, 2025 16:30

LouisTsai-Csie mentioned this pull request Sep 2, 2025

test(benchmark): implement CREATE2 addressing for bloatnet tests #2090

Draft

6 tasks

marioevz reviewed Sep 8, 2025

View reviewed changes

LouisTsai-Csie added 3 commits September 9, 2025 14:57

feat: wrap blockchain test for benchmark

3b5b8ed

feat: wrap state test for benchmark

27429f0

feat(benchmark): add code generator to generate transaction

688e861

LouisTsai-Csie force-pushed the benchmark-test-type branch from de7f485 to 688e861 Compare September 9, 2025 06:57

fix: resolve typing issue

4e61bca

marioevz reviewed Sep 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmark): add `benchmark_test` and `benchmark_state_test` test type #1945

feat(benchmark): add `benchmark_test` and `benchmark_state_test` test type #1945

Uh oh!

LouisTsai-Csie commented Jul 24, 2025 •

edited

Loading

Uh oh!

LouisTsai-Csie commented Aug 14, 2025

Uh oh!

CPerezz commented Aug 29, 2025

Uh oh!

LouisTsai-Csie commented Aug 29, 2025

Uh oh!

CPerezz commented Aug 30, 2025

Uh oh!

LouisTsai-Csie commented Aug 30, 2025

Uh oh!

CPerezz commented Sep 1, 2025

Uh oh!

marioevz Sep 8, 2025

Uh oh!

fselmo Sep 9, 2025 •

edited

Loading

Uh oh!

marioevz Sep 9, 2025

Uh oh!

marioevz left a comment

Uh oh!

marioevz Sep 9, 2025

Uh oh!

marioevz Sep 9, 2025

Uh oh!

Uh oh!

feat(benchmark): add benchmark_test and benchmark_state_test test type #1945

Are you sure you want to change the base?

feat(benchmark): add benchmark_test and benchmark_state_test test type #1945

Uh oh!

Conversation

LouisTsai-Csie commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

LouisTsai-Csie commented Aug 14, 2025

Uh oh!

CPerezz commented Aug 29, 2025

Uh oh!

LouisTsai-Csie commented Aug 29, 2025

Uh oh!

CPerezz commented Aug 30, 2025

Uh oh!

LouisTsai-Csie commented Aug 30, 2025

Uh oh!

CPerezz commented Sep 1, 2025

Uh oh!

marioevz Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

fselmo Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marioevz Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

marioevz Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat(benchmark): add `benchmark_test` and `benchmark_state_test` test type #1945

feat(benchmark): add `benchmark_test` and `benchmark_state_test` test type #1945

LouisTsai-Csie commented Jul 24, 2025 •

edited

Loading

fselmo Sep 9, 2025 •

edited

Loading