Skip to content

Conversation

nfrisby
Copy link
Collaborator

@nfrisby nfrisby commented Jul 21, 2025

This PR adds Linear Leios to the Haskell simulator.

@nfrisby nfrisby force-pushed the nfrisby/issue-458-linear-leios branch from 7920039 to 8b1b477 Compare July 21, 2025 19:49
@nfrisby
Copy link
Collaborator Author

nfrisby commented Jul 31, 2025

Brian ran a comparison with Rust. https://github.com/input-output-hk/ouroboros-leios/blob/bwbush/2025w31b/analysis/sims/2025w31b/analysis.ipynb

I do not have an explanation for the ~9 second delay before a generated vote begins diffusing. That needs to be figured out.

Brian found it explicitly in the log:

$ zcat sim.log.gz \
  | grep -E 'VTBundle(Generated|Sent|Received)' \
  | jq -r 'select(.message.id == "77--1") | (.time_s | tostring) + "\t" + .message.type + "\t" + .message.sender + "\t" + .message.recipient' \
  | head

56.000163       VTBundleGenerated
63.130592       VTBundleSent    node-79 node-20
63.164923       VTBundleReceived                node-20
63.193455       VTBundleSent    node-79 node-21
63.213447       VTBundleSent    node-79 node-71
63.234073       VTBundleReceived                node-21
63.256065       VTBundleReceived                node-71
63.354646       VTBundleSent    node-20 node-9
63.449198       VTBundleReceived                node-9
64.165217       VTBundleSent    node-20 node-62

@bwbush
Copy link
Collaborator

bwbush commented Jul 31, 2025

The EB validation takes five seconds--or am I reading this log wrong? That delays the time from vote generation to the start of diffusion. Wouldn't the EB be validated before it is voted upon?

{"message":{"id":"77--1","pipeline":0,"producer":"node-79","size_bytes":105,"slot":53,"type":"VTBundleGenerated","votes":{"77-0":3}},"time_s":56.000163}
{"message":{"id":"77-0","recipient":"node-20","type":"EBReceived"},"time_s":63.096319}
{"message":{"cpu_time_s":5.049999,"id":"77-0","node":"node-20","task_type":"ValEB","type":"Cpu"},"time_s":63.096319}
{"message":{"id":"77--1","msg_size_bytes":138,"recipient":"node-20","sender":"node-79","sending_s":1.35e-4,"type":"VTBundleSent"},"time_s":63.130592}

@nfrisby
Copy link
Collaborator Author

nfrisby commented Jul 31, 2025

In the example I'm looking at (EB 97-0, VB 97--1), I suspect that the EB takes 9-10 seconds to arrive to a direct neighbor node 72, and head-of-line blocking on the channel prevents the VB from arriving before that.

The EB is generated by node 97 at 493.130 and it arrives at node 72 (a direct neighbor, I think) at 503.000. The VB is generated by node 97 at 496.000 at arrives at 503.038. (edited) And that blockage would then replicate at every hop. So I think #453 plausibly explains the delay we're seeing.


The "sent" events' timestamps are also delayed for this same reason, which might be surprising. I'm not sure, but I strongly suspect the "sent" events are only emitted when the Relay mini protocol actually sends the body. The mini protocol, though, exchanges a couple messages before sending the body, and those messages are also blocked behind the EB in the channel.

@nfrisby
Copy link
Collaborator Author

nfrisby commented Jul 31, 2025

@bwbush In the example you gave, the vote arises before the EB is validated because of the exceptional case of the voting node being the one that generated the EB.

$ zcat sim.log.gz | grep -e node-79 | grep -e '77-0'
{"message":{"endorser_blocks":[],"id":"77-0","input_blocks":[],"pipeline":0,"producer":"node-79","size_bytes":10000304,"slot":53,"type":"EBGenerated"},"time_s":53.13}
{"message":{"id":"77--1","pipeline":0,"producer":"node-79","size_bytes":105,"slot":53,"type":"VTBundleGenerated","votes":{"77-0":3}},"time_s":56.000163}

I don't know why "node-79" has a numeric id of 77 🤷

whereas "node-97" has a numeric id of 97 😵‍💫

$ zcat sim.log.gz | grep -e node-97 | grep -e '97-0'
{"message":{"endorser_blocks":[],"id":"97-0","input_blocks":[],"pipeline":0,"producer":"node-97","size_bytes":10000304,"slot":493,"type":"EBGenerated"},"time_s":493.129999}
{"message":{"id":"97--1","pipeline":0,"producer":"node-97","size_bytes":105,"slot":493,"type":"VTBundleGenerated","votes":{"97-0":5}},"time_s":496.000163}

@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 8, 2025

I pushed up the Messages are now interleaved when multiplex-mini-protocols: true commit, and it seems to have addressed the unexpected 9 second delay. We're now seeing a 3 second delay, which is due to implementing the spec's $3\Delta_{hdr}$ moratorium on voting (if I'm interpreting the coordinates of the plot correctly).

https://github.com/input-output-hk/ouroboros-leios/blob/main/analysis/sims/2025w32b/analysis.ipynb

image

Thanks Brian for running the confirmation sim so quickly.

Edit: I ran 20 seeds with and without the new mux. Looks like the run times increased from a baseline of 100% with the old mux to about 110-120% with the new mux. I anticipate I could win a chunk of that back, if the lowest hanging fruit is to special case the behavior when there's only a single mini protocol whose send buffer is not empty.

@nfrisby nfrisby force-pushed the nfrisby/issue-458-linear-leios branch 4 times, most recently from 4b4f823 to 5f4d68d Compare August 11, 2025 23:33
@nfrisby nfrisby marked this pull request as ready for review August 12, 2025 00:02
- Add and respect eb-body-avg-size-bytes config file parameter.
- Prune VTs 30 seconds after their EB's slot onset instead of as Short Leios would.
- Fixup bug in submitLinearEB that wasn't adopting it if the certificate arrived first.
- Include rb_ref in GenEB output for Linear EBs.
- Partially fill in `error` stub for size calculation of certificates.
- Avoid pipeline calculation when logging VBs in the shared format.
@nfrisby nfrisby force-pushed the nfrisby/issue-458-linear-leios branch from 5f4d68d to ce6f1a7 Compare August 12, 2025 00:07
@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 12, 2025

I'm merging this, since it's a good milestone, but it's not complete. Known issues:

  • It considers EBs to be large blocks, rather than the draft CIPs' sequence of (separately-fetched) txids.
  • When the Rust simulator is also run in that mode, there are still some discrepancies, at least with CPU validation times.
  • This implementation is out of date with the (freshly updated) spec:
    • It enforces L_vote+L_diff on including an EB cert in an issued RB, which the latest spec does not require.
    • It does not enforce L_recover in any way, (which is probably fine, for the typical state).

@nfrisby nfrisby merged commit 45cbd12 into main Aug 12, 2025
12 of 13 checks passed
@nfrisby nfrisby deleted the nfrisby/issue-458-linear-leios branch August 12, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants