Skip to content

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented Oct 7, 2025

Rewrite HF test core logic in Go.

Explain how you tested your changes:

  • Successfully executed new HF test

Checklist:

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? None

@georgeee georgeee force-pushed the georgeee/hf-test-go branch 2 times, most recently from 69c09d7 to ec89fba Compare October 9, 2025 20:26
@georgeee georgeee force-pushed the georgeee/hf-test-go branch from ec89fba to 9bd5c76 Compare October 9, 2025 20:32
echo "Creates a quick-epoch-turnaround configuration in localnet/ and launches two Mina nodes" >&2
echo "Usage: $0 [-m|--mina $MINA_EXE] [-i|--tx-interval $TX_INTERVAL] [-d|--delay-min $DELAY_MIN] [-s|--slot $SLOT] [--develop] [-c|--config ./config.json] [--slot-tx-end 100] [--slot-chain-end 130] [--genesis-ledger-dir ./genesis]" >&2
echo "Consider reading script's code for information on optional arguments" >&2
usage() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were introduced to make output of the script nicer, not directly related to rewrite to Go

"${NODE_ARGS_1[@]}" \
--block-producer-key "$PWD/$CONF_DIR/bp" \
--config-directory "$PWD/localnet/runtime_1" \
--run-snark-worker "$(cat $CONF_DIR/bp.pub)" --work-selection seq \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fix of a pre-existing glitch, not directly related to rewrite to Go

@georgeee georgeee changed the title Hardfork test rewrite prototype Rewrite hardfork test to Go Oct 9, 2025
@georgeee georgeee marked this pull request as ready for review October 9, 2025 20:34
@georgeee georgeee requested review from a team as code owners October 9, 2025 20:34
@georgeee
Copy link
Member Author

georgeee commented Oct 9, 2025

!ci-build-me

@georgeee
Copy link
Member Author

georgeee commented Oct 9, 2025

!ci-nightly-me

@georgeee
Copy link
Member Author

georgeee commented Oct 9, 2025

!ci-bypass-changelog

@glyh
Copy link
Member

glyh commented Oct 13, 2025

That's a lot of code

@glyh
Copy link
Member

glyh commented Oct 13, 2025

@glyh
Copy link
Member

glyh commented Oct 13, 2025

It seems there's issue:

    [ "Sys_error", "/etc/localtime: No such file or directory" ],

we should probably fix this inside our hosted nix image gcr.io/o1labs-192920/nix-unstable:1.0.0?

@glyh
Copy link
Member

glyh commented Oct 15, 2025

Some ideas:

  • We should utilize git worktree so to not waste time on pulling same repo again.

EDIT: implemented

@cjjdespres
Copy link
Member

It seems there's issue:

    [ "Sys_error", "/etc/localtime: No such file or directory" ],

we should probably fix this inside our hosted nix image gcr.io/o1labs-192920/nix-unstable:1.0.0?

I think so? I noticed this exchange on slack:

I see the following unhandled exception at the end:
"sexp":["monitor.ml.Error",["unknown zone",["zone",":/etc/localtime"]]
I think libp2p helper exiting is not the root cause.
I saw this error previously on NixOS-based images when tzdata wasn't installed.

and response

I also noticed that localtime error, but I was unsure if that was actually the fatal error. Nevertheless, I already had checked with them if all was ok with /etc/localtime and it seemed so. But in the meantime, they tried setting the environment variable TZ, which seemed to solve the issue. Thank you!

So setting TZ=UTC in the test environment or installing tzdata in the image might work.

Copy link
Member

@cjjdespres cjjdespres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one test failure that I thought I'd point out now. I haven't gone through much of the code yet.

I'm unsure why there were tzdata exceptions in the nightly run, because the logs have this in them:

+ [[ 1 -gt 0 ]]
+ ln -sf /usr/share/zoneinfo/UTC /etc/localtime
+ chown -R root /workdir
+ git config --global --add safe.directory /workdir
+ git fetch

so I think the build-and-test.sh script is supposed to correct for the lack of /etc/timezone already. Maybe the agents don't always have /usr/share/zoneinfo/UTC?

The regular CI failed in the dev unit tests like this:

This run has ID `99BF9CDD-C904-49A0-981F-F1DD407607A5`.
[OK]                Root          0   closing stable root, reload as converting.
[OK]                Root          1   moving a root.
[FAIL]              Root          2   make checkpointing a root.
-- Root.002 [make checkpointing a root.] Failed --
in `/workdir/_build/default/src/lib/mina_ledger/test/_build/_tests/99BF9CDD-C904-49A0-981F-F1DD407607A5/root.002.output`:
ASSERT make checkpointing a root timed out after 2s
[exception] (monitor.ml.Error
  ("Alcotest__Core.Check_error(\"Error make checkpointing a root timed out after 2s.\")")

which is really bizarre, but is very likely unrelated to this PR. It's possible that there was something slow in IO?

echo "Running HF test with SLOT_TX_END=$SLOT_TX_END"
"$SCRIPT_DIR"/test.sh compatible-devnet{/bin/mina,-genesis/bin/runtime_genesis_ledger} fork-devnet{/bin/mina,-genesis/bin/runtime_genesis_ledger} && echo "HF test completed successfully"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Bash: shellcheck" CI job is failing like this:

In ./scripts/hardfork/build-and-test.sh line 53:
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
^--------^ SC2034 (warning): SCRIPT_DIR appears unused. Verify use (or export if used externally).

because this change removes the one use of SCRIPT_DIR from this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I ran ./build-and-test.sh from the scripts/hardfork/ directory itself and got this at the end, after everything had been built:

Error: failed to start main network: fork/exec /home/despresc/src/mina-reviews/scripts/hardfork/scripts/hardfork/run-localnet.sh: no such file or directory
failed to start main network: fork/exec /home/despresc/src/mina-reviews/scripts/hardfork/scripts/hardfork/run-localnet.sh: no such file or directory

So it might be good to keep using SCRIPT_DIR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retriggering dev unit test.

@cjjdespres
Copy link
Member

I tried running it locally - I think my laptop might not be able to handle this test:

Sent tx #15
2025-10-17 17:11:53 UTC [Error] Internally generated block $state_hash cannot be rebroadcast because it's not a valid time to do so ("1 slots too late")
  state_hash: "3NK9CzEuiLFR2uCeFrd29uSJ4H6911JdFMqfrp6w7i8d1UaxqJfg"
2025-10-17 17:11:54 UTC [Error] VRF was evaluated at (epoch, slot) (0 20) but the corresponding block was produced at a time corresponding to (0 21). This means that generating the block took more time than expected.
  
2025-10-17 17:11:54 UTC [Error] Validation error: external transition with state hash $state_hash was rejected for reason "invalid time"
  state_hash: "3NK9CzEuiLFR2uCeFrd29uSJ4H6911JdFMqfrp6w7i8d1UaxqJfg"

with everything that gets spawned.

@glyh
Copy link
Member

glyh commented Oct 18, 2025

I have tried to run the original bash test on my laptop and succeeded once. Given we're using the same model it's interesting that a rewrite cause limit on resource.
@cjjdespres

It does take sometimes to run though.

Copy link
Member

@glyh glyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the top level of this test. I feel there's much more to review, if we want to ensure 100% correctness.

errorLogger: log.New(os.Stderr, "ERROR: ", log.LstdFlags),
debugLogger: log.New(os.Stdout, "DEBUG: ", log.LstdFlags),
isDebug: isDebug,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For forward compatibility, could we generate JSON logs so it's parsable by downstream consumer?


var blocks []BlockData

result.Get("data.bestChain").ForEach(func(_, value gjson.Result) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very verbose. I think it's better to use a 3rd party library that deals with parsing for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use gqlgen? It has client side API that could be used for our purpose. Best case we should reuse the generated gql schema in this repo.

https://github.com/99designs/gqlgen/tree/master/_examples/todo

return 0, 0, err
}

blockHeight := result.Get("data.bestChain.0.protocolState.consensusState.blockHeight").Int()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, we're manually dealing with int conversion here, and I think the error message won't be pretty here.

@@ -0,0 +1 @@
{"go.mod":"b4592235afa6583ad9fa5ace6072a5b9733165d1580bfb47954fd154da9fe6ee","go.sum":"a66a7ea9363eecf0990f36729b257548c44b114ce588890dccdf633b123974d3","vendorSha256":"sha256-2Qy9V5NOKli0bjnhOykBYShk6+TYqIJshOXdWk+9O4c="} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have this hash pinning functionality specific to nix to go into another PR. It doesn't seems to be relating to the test itself closely.

Is this nix's restriction?

mainGenesisTimestamp := config.FormatTimestamp(mainGenesisTs)

// Prepare run-localnet.sh command
cmd := exec.Command(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the fact this is a mix of go and bash. I think we're gradually migrating?

if err := t.ValidateForkConfigData(analysis.LatestNonEmptyBlock, forkConfigBytes); err != nil {
return err
}
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why there's a scope here.


// ExtractForkConfig extracts the fork configuration from the network
func (t *HardforkTest) ExtractForkConfig(port int, forkConfigPath string) ([]byte, error) {
for attempt := 1; attempt <= t.Config.ForkConfigMaxRetries; attempt++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this retry mechanism is a bit problematic. When encountering some errors, no retry happens at all

@@ -0,0 +1,193 @@
package hardfork
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these json validation is very verbose. We might want to consider simplifying them.

}()

// Sleep until fork genesis
t.Logger.Info("Sleeping for %d minutes until fork genesis...", t.Config.ForkDelay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is confusing, and it doesn't seems to correspond to any thing, and it's not in original script. suggest deleting.

}

// Wait until best chain query time
t.WaitUntilBestChainQuery(t.Config.MainSlot, t.Config.MainDelay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sleep is before the query in original script. might be a bug here.

sleep $((FORK_DELAY*60))s

earliest_str=""
while [[ "$earliest_str" == "" ]] || [[ "$earliest_str" == "," ]]; do
  earliest_str=$(get_height_and_slot_of_earliest 10303 2>/dev/null)
  sleep "$FORK_SLOT"s
done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in that script we're assuming fork genesis happen right after local net is started.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is pretty confusing because there's a sleep $((FORK_SLOT*10))s after the above script. I would suggest everything to be operating on time stamp not duration.

@glyh
Copy link
Member

glyh commented Oct 20, 2025

Running this test locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants