Skip to content

Commit 6b6923e

Browse files
committed
Add draft EIP: Contract Bytecode Deduplication Discount
This proposal introduces a gas discount for contract deployments when the bytecode being deployed already exists in the state. The mechanism extends EIP-2930 access lists with an optional checkCodeHash flag to enable deterministic deduplication checks without breaking consensus. Key features: - Access-list based deduplication via checkCodeHash flag - Avoids GAS_CODE_DEPOSIT * L costs for duplicate deployments - Solves database divergence issues across different sync modes - Becomes particularly relevant with EIP-8037's increased gas costs This EIP is extracted from the original EIP-8037 proposal to allow independent review and adoption.
1 parent 46c6c4a commit 6b6923e

File tree

1 file changed

+226
-0
lines changed

1 file changed

+226
-0
lines changed

EIPS/eip-draft.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
eip: draft
3+
title: Contract Bytecode Deduplication Discount
4+
description: Reduces gas costs for deploying duplicate contract bytecode via access-list based mechanism
5+
author: Carlos Perez (@CPerezz)
6+
discussions-to: https://ethereum-magicians.org/t/eip-8037-state-creation-gas-cost-increase/25694
7+
status: Draft
8+
type: Standards Track
9+
category: Core
10+
created: 2025-10-22
11+
requires: 2930
12+
---
13+
14+
## Abstract
15+
16+
This proposal introduces a gas discount for contract deployments when the bytecode being deployed already exists in the state. By extending EIP-2930 access lists with an optional `checkCodeHash` flag, transactions can signal which existing contract addresses should be checked for bytecode duplication. When a match is found, the deployment avoids paying `GAS_CODE_DEPOSIT * L` costs since clients already store the bytecode and only need to link the new account to the existing code hash.
17+
18+
This EIP becomes particularly relevant with the adoption of EIP-8037, which increases `GAS_CODE_DEPOSIT` from 200 to 1,900 gas per byte. Under EIP-8037, deploying a 24kB contract would cost approximately 46.6M gas for code deposit alone, making the deduplication discount economically significant for applications that deploy identical bytecode multiple times.
19+
20+
## Motivation
21+
22+
Currently, deploying duplicate bytecode costs the same as deploying new bytecode, even though Ethereum clients don't store duplicated code in their databases. When the same bytecode is deployed multiple times, clients store only one copy and have multiple accounts point to the same code hash. Under EIP-8037's proposed gas costs, deploying a 24kB contract costs approximately 46.6M gas for code deposit alone (`1,900 × 24,576`). This charge is unfair for duplicate deployments where no additional storage is consumed.
23+
24+
A naive "check if code exists in database" approach would break consensus because different nodes have different database contents due to mostly Sync-mode differences:
25+
- Full-sync nodes: Retain all historical code, including from reverted/reorged transactions
26+
- Snap-sync nodes: Only store code reachable from the current state trie
27+
28+
Empirical analysis reveals that approximately 27,869 bytecodes existed in full-synced node databases with no live account pointing to them (as of the Cancun fork). A database lookup `CodeExists(hash)` would yield different results on different nodes, causing different gas costs and breaking consensus.
29+
30+
This proposal solves the problem by making deduplication checks explicit and deterministic through access lists, ensuring all nodes compute identical gas costs regardless of their database state. (Notice here that even if fully-synced clients have more codes, there are no accounts whose codeHash actually is referencing them. Thus, users can't profit from such discounts which keeps consensus safe).
31+
32+
## Specification
33+
34+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
35+
36+
### Access List Extension
37+
38+
EIP-2930 access list tuples are extended with an optional `checkCodeHash` boolean field:
39+
40+
```json
41+
{
42+
"address": "0x...",
43+
"storageKeys": ["0x..."],
44+
"checkCodeHash": true
45+
}
46+
```
47+
48+
### Consensus semantics:
49+
50+
- The `checkCodeHash` field is OPTIONAL. If omitted, it defaults to `false`.
51+
- Transactions with `checkCodeHash` fields are valid both pre-fork and post-fork.
52+
- Pre-fork: Nodes MUST ignore the `checkCodeHash` field and MUST NOT grant deduplication discounts.
53+
- Post-fork: Nodes MUST process the `checkCodeHash` field as specified below.
54+
55+
### CodeHash Access-Set Construction
56+
57+
Before transaction execution begins, build a set `W` (the "CodeHash Access-Set") as follows:
58+
59+
```
60+
W = { codeHash(a) | a ∈ accessList, a.checkCodeHash = true, a exists in state, a has code }
61+
```
62+
63+
Where:
64+
- `W` is built from state at the **start** of transaction execution (before any state changes)
65+
- Only addresses that **already exist** in the state contribute to `W`
66+
- Only addresses that **have deployed code** (non-empty code) contribute to `W`
67+
- Empty accounts or accounts with no code do NOT contribute their code hash to `W`
68+
69+
### Contract Creation Gas Accounting
70+
71+
When a contract creation transaction or opcode (`CREATE`/`CREATE2`) successfully completes and returns bytecode `B` of length `L`, compute `H = keccak256(B)` and apply the following gas charges:
72+
73+
**Deduplication check:**
74+
- If `H ∈ W`: Bytecode is a duplicate
75+
- Do NOT charge `GAS_CODE_DEPOSIT * L`
76+
- Link the new account's `codeHash` to the existing code hash `H`
77+
- The bytecode `B` is NOT persisted (it already exists and it's the current behaviour)
78+
- If `H ∉ W`: Bytecode is new
79+
- Charge `GAS_CODE_DEPOSIT * L`
80+
- Persist bytecode `B` under hash `H`
81+
- Link the new account's `codeHash` to `H`
82+
83+
**Gas costs:**
84+
- The cost of reading `codeHash` for access-listed addresses is already covered by EIP-2929/2930 access costs (intrinsic access-list cost and cold→warm state access charges).
85+
- No additional gas cost is introduced for the deduplication check itself.
86+
87+
### Implementation Pseudocode
88+
89+
```python
90+
# Before transaction execution:
91+
W = set()
92+
for tuple in tx.access_list:
93+
warm(tuple.address) # per EIP-2930/EIP-2929 rules
94+
if tuple.checkCodeHash == true:
95+
acc = load_account(tuple.address)
96+
if acc exists and acc.code is not empty:
97+
W.add(acc.codeHash)
98+
99+
# On successful CREATE/CREATE2:
100+
H = keccak256(B)
101+
if H in W:
102+
# Duplicate: no deposit gas
103+
link_codehash(new_account, H)
104+
else:
105+
# New bytecode: charge and persist
106+
charge(GAS_CODE_DEPOSIT * len(B))
107+
persist_code(H, B)
108+
link_codehash(new_account, H)
109+
```
110+
111+
## Rationale
112+
113+
### Why Access-List Based Deduplication?
114+
115+
The access-list approach provides several critical properties:
116+
117+
1. Deterministic behavior:
118+
The result depends only on the transaction's access list and current state, not on local database contents. All nodes compute the same gas cost.
119+
120+
2. No reverse index requirement:
121+
Unlike other approaches, this doesn't require maintaining a `codeHash → [accounts]` reverse index, which would add significant complexity and storage overhead.
122+
123+
3. Leverages existing infrastructure:
124+
Builds on EIP-2930 access lists and EIP-2929 access costs, requiring minimal protocol changes.
125+
126+
4. Explicit opt-in:
127+
Transactions must explicitly indicate which addresses to check. This prevents unexpected behavior and gives users/wallets control over gas optimization.
128+
129+
5. Forward compatibility:
130+
Pre-fork nodes ignore `checkCodeHash` and never grant discounts. Post-fork, all nodes enforce identical behavior. Wallets can optionally add the field to optimize gas, but its absence doesn't invalidate transactions.
131+
132+
6. Avoiding have a code-root for state:
133+
At this point, clients handle code storage on their own ways. They don't have any consensus on the deployed existing codes (besides that all of the ones referenced in account's codehash fields exist).
134+
Changing this seems a lot more complex and unnecessary.
135+
136+
### Same-Block Deployments
137+
138+
Sequential transaction execution ensures that a deployment storing new code makes it visible to later transactions in the same block:
139+
140+
1. Transaction `T_A` deploys bytecode `B` at address `X`
141+
- Pays full `GAS_CODE_DEPOSIT * L` (no prior contract has this bytecode)
142+
- Code is stored under hash `H = keccak256(B)`
143+
144+
2. Later transaction `T_B` in the same block deploys the same bytecode `B`:
145+
- `T_B` includes address `X` in its access list with `checkCodeHash: true`
146+
- When `T_B` executes, `W` is built from the current state (including `T_A`'s changes)
147+
- Since `X` now exists, `W` contains `H`
148+
- `T_B`'s deployment gets the discount
149+
150+
> While this only tries to formalize the behaviour, it's important to remark that this kind of behaviour is complex. As it requires control over tx ordering in order to abuse. And Builders can't modify the Acess List as it is already signed with the Tx. Nevertheless, this could happen, thus is formalized here.
151+
152+
### Edge Case: Simultaneous New Deployments
153+
154+
If two transactions in the same block both deploy identical new bytecode and neither references an existing contract with that bytecode in their access lists, both will pay full `GAS_CODE_DEPOSIT * L`. This is acceptable because:
155+
156+
- The first deployment cannot be known at transaction construction time
157+
- This scenario is extremely rare in practice
158+
- The complexity of special handling is not worth the minimal benefit
159+
160+
## Backwards Compatibility
161+
162+
This proposal requires a scheduled network upgrade but is designed to be forward-compatible with existing transactions.
163+
164+
**Transaction compatibility:**
165+
- Transactions with `checkCodeHash` fields are syntactically valid both pre-fork and post-fork
166+
- Pre-fork: The field is ignored; all deployments pay full costs
167+
- Post-fork: The field determines deduplication behavior
168+
169+
**Wallet and tooling updates:**
170+
- RPC methods like `eth_estimateGas` MUST account for potential deduplication discounts
171+
- Wallets SHOULD provide UI for users to specify deduplication targets
172+
- Transaction builders MAY automatically detect duplicate deployments and add appropriate access list entries
173+
174+
**Node implementation:**
175+
- Clients MUST ignore `checkCodeHash` pre-fork
176+
- Clients MUST enforce deduplication semantics post-fork
177+
- No changes to state trie structure or database schema are required
178+
179+
### Example Transaction
180+
181+
Deploying a contract with the same bytecode as the contract at `0x1234...5678`:
182+
183+
```json
184+
{
185+
"from": "0xabcd...ef00",
186+
"to": null,
187+
"data": "0x608060405234801561001...",
188+
"accessList": [
189+
{
190+
"address": "0x1234567890123456789012345678901234567890",
191+
"storageKeys": [],
192+
"checkCodeHash": true
193+
}
194+
]
195+
}
196+
```
197+
198+
If the deployed bytecode hash matches `codeHash(0x1234...5678)`, the deployment receives the deduplication discount.
199+
200+
## Security Considerations
201+
202+
### Gas Cost Accuracy
203+
204+
The deduplication mechanism ensures that gas costs accurately reflect actual resource consumption. Duplicate deployments don't consume additional storage, so they shouldn't pay storage costs.
205+
206+
### Denial of Service
207+
208+
The access-list mechanism prevents DoS attacks because:
209+
- The cost of reading `codeHash` is already covered by EIP-2929/2930
210+
- No additional state lookups or database queries are required
211+
- The deduplication check is O(1) (set membership test)
212+
213+
### Access List Size
214+
215+
Large access lists with many `checkCodeHash: true` entries could increase transaction size, but:
216+
- Access lists are already part of transaction calldata and priced accordingly
217+
- The `checkCodeHash` field adds minimal bytes
218+
- Users have economic incentive to only include necessary entries
219+
220+
### State Divergence
221+
222+
The explicit access-list approach prevents state divergence issues that would arise from implicit database lookups. All nodes compute identical gas costs regardless of their sync mode or database contents.
223+
224+
## Copyright
225+
226+
Copyright and related rights waived via [CC0](../LICENSE.md).

0 commit comments

Comments
 (0)