Skip to content

Draft specification of new mini protocols for Linear Leios #484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nfrisby
Copy link
Collaborator

@nfrisby nfrisby commented Aug 5, 2025

Closes #470

This document specifies a new set of mini protocols that can implement Linear Leios.

@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 5, 2025

@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 5, 2025

I think the biggest remaining gap in this document is what exactly honest nodes should serve to downstream peers via each of these mini protocols. That's not discussed at all, basically---it's somewhat obvious given an understanding of Linear Leios, but should be explicit.

The document already says what must not be sent, but doesn't say what should be.

@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 5, 2025

Once I get a first set of reviews from the Leios Team, I'll ping the Networking Team for their thoughts.

Copy link
Member

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments and no time left to dig deeper today.

I like the level of detail and thought you have put into this! However, I think I don't agree with the composition of the mini protocols. While there is an appeal in picking them this much apart and seeing which primitives we only need, it would require more coordination across servers/clients to maintain a certain quality of service (of which seems to be plenty need in leios). With this I mean any kind of prioritization or interactions like that ominous "freshest first".

Personally, I would love to see the responsibilities of a Leios node be used more in finding requirements on the network protocols, rather than starting with what code we could re-use or what to generalize (as it was done in past write-ups here and here).

On the other hand, there is plenty of design space left after that and we could take inspiration from existing node-to-node protocols to make leios-specific protocols consistent in complexity and number of clients/servers that cardano node implementors need to add. (if praos requires mini protocols numbers 2, 3 and 4 .. leios should not require 7-15 on top?)

Thanks for writing this up!

# Introduction

This document proposes new mini protocols necessary for Linear Leios.
It takes an understanding of Linear Leios for granted; it does not define the structure and semantics of RB, EB, vote, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see this contributed to the CIP draft

# Light schema

The Light mini protocol pulls whichever payload the upstream peer wants to send next.
These payloads should be small and/or rare, so that having every upstream peer send all of them is tolerable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of gossiping and I wonder what does the pull-based nature actually bring here if we fetch it from all peers anyways? As the client would identify protocol violations if too much / the wrong data is sent anyways, this very much sounds like a push-based pub/sub protocol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Light is essentially a way for peers to push identifiers of big things they're offering (which could then be pulled via Heavy). It's still technically pull, in that the client is able to stop requesting new identifiers.

But, in general, the expectation is that a Light client would send tens of MsgRequestNextLight before any response comes (ie pipeline the mini protocol).


Remarks.

- EbRelayBody and EbRelayTx are separate mini protocols because an upstream peer should be able to serve new EBs even while it's already serving some txs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to serve transactions of an EB and a body of a different EB at the same time to the same downstream peer?

In fact, this would even make more sense if combined IMO. For example, a client may only ever request transactions for an EB it got relayed already or request a new EB body.

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this would even make more sense if combined IMO. For example, a client may only ever request transactions for an EB it got relayed already or request a new EB body.

I see the temptation to enforce that, but I don't see any benefit to doing so. An honest server is going to offer its EBs to all of its downstream peers as soon as possible anyway, so why bother constraining their behavior based on what we've actually offered to them?

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to serve transactions of an EB and a body of a different EB at the same time to the same downstream peer?

Suppose two EBs arise within a couple slots. If I have a downstream peer who needs to get the first EB's txs from me (no one else has offered them) but could get the second EB's txs from other peers (eg they're already in mempools), then it seems plausible that they need both some txs from me as well as a second EB from me. And both should arrive as soon as possible for the sake of $\Delta_{EB}$.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An honest server is going to offer its EBs to all of its downstream peers as soon as possible anyway, so why bother constraining their behavior based on what we've actually offered to them?

I think you misunderstood me. I would not want to limit what we offer. But push the burden what to request (either txs or bodies) from us at a given time to the downstream client within a protocol instance. This would allow us to specify per peer limits more specifically for the responsibility of "EB diffusion" and we'd need to coordinate and balance things less across multiple servers/clients.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not want to limit what we offer.

Right, so the downstream peer can soon request everything we have. So why punish them for requesting something slightly earlier than we told them they could?


- EbRelayBody and EbRelayTx are separate mini protocols because an upstream peer should be able to serve new EBs even while it's already serving some txs.
- EbRelayHeader is separate from EbRelayBody and EbRelayTxs because of the _freshest first delivery_ rule (FFD).
An upstream peer must be able to offer newer EBs even while sending some older EBs/txs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name EbRelayHeader and you say above it announces possession of an EB to downstream peers.

  1. If we want to enforce availability of EB bodies and transactions (with the aim of a minimal $\Delta_{EB}$), it would feel more natural to me that the same protocol that announces possession is also the one serving the contents. I'm not 100% on what would be simpler to implement, but it feels like the more local the decision can be made, the better?

  2. I don't understand how FFD comes into play here (in fact, I might not understand FFD). There is a separate protocol already for gossiping EB headers (EbPublicize) and a node would know about EBs that way (the quickest). Now freshest first delivery for EBs would mean that a node should prioritize downloading the freshest EB. I could see that happening with a node finding upstream peers that have that EB (request possession?) and fetching it from them (similar like BlockFetch?). Do we even need EbRelay* if we have the gossip and EbFetch* for EB bodies and txs?

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would feel more natural to me that the same protocol that announces possession is also the one serving the contents

Does it help if you instead think of it as the "super mini protocol" EbRelay, which is cooperatively implemented by EbRelay_Header, EbRelay_Body, and EbRelay_Tx?

Edit: if that's compelling, then there's only four new protocols EbPublicize, EbRelay, VoteRelay, and EbFetch. And EbFetch is just an intentionally-explicit copy of EbRelay with slightly different triggers.

Edit 2: And moreover, all four are self-contained in the way you said feels more natural.

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how FFD comes into play here

My understanding is merely that FreshestFirstDelivery means that, whenever a node is aware of more than one EB it could allocate resources to fetch, it should allocate those resources to the ones with the greater slots. I don't necessarily know how to consider FFD across multiple objects---but maybe we don't need to, since all other objects are small and diffused by other proposed mini protocols and so can operate concurrently with the EB-diffusing mini protocols.

(Edit: In particular, if I know some fresher EBs exist but none of my peers have offered those EBs to me, I think I should still start fetching the freshest of the EBs that I can already start fetching---just idling until the fresher EBs are actually available seems unwise.)

There is a separate protocol already for gossiping EB headers (EbPublicize) and a node would know about EBs that way (the quickest). Now freshest first delivery for EBs would mean that a node should prioritize downloading the freshest EB.

  • EbPublicize is not directly related to "FreshestFirst for EBs". EbPublicize is orthogonal to the diffusion of EBs; it's only used to influence voting (you don't vote for an EB if its election has been equivocated).
  • EbHeaderRelay is what tells you which EBs (and their txs, in this write-up) are available for downloading. So it's possible that EbHeaderRelay has announced multiple EBs are now available. You should download as many as you can. But if you can't immediately download them all, then FreshestFirst specifies that you should prioritize the younger ones.

One of the reasons EbFetch* is specified separately from EbRelay* is because---our working theory is that---it should override FreshestFirst (and as I spelled out in a different thread, it being a different mini protocol means that its messages can't be blocked by unfinished EbRelay work).

As soon as a node realizes it does not already have the EB certified by some RB that it needs to validate, it should be able to instantly request that EB from upstream peers who have already offered that RB via ChainSync.
If EbRelay* and EbFetch* were not separate protocols, then the urgent EbFetch* requests might be blocked behind by the coincident EbRelay* requests.
There might be better ways to avoid this risk, but the current method seems suitable for a specification: simple and explicit.
- TODO a client might wisely decompose and rate-limit its EbRelayTx requests so that it can pause them if EbFetchTx suddenly has a significant number of txs to request from the same peer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to be able to fetch an EB as quick as possible. But wouldn't this be even more the case if we'd have one protocol for this purpose?

For example:

  1. A node is currently downloading some EB E1 it thinks it needs
    • this means it fetches the body first (let's say 640kb for 20k tx refs)
    • and then it fetches the missing txs (each up to 16kB) across multiple peers; worst case a couple of megabytes in total
    • in summary, it is busy fetching the EB in many chunks
  2. Now it receives an RB, which certifies an EB E2 it does not have, and selecting that chain makes it very urgent to get E2 (not even sure this could happen, but whatever)
    • the application logic (chain selection?) would put E2 into some priority queue on what to fetch next
    • network clients would start to fetch E2 body and transactions from peers

In this situation, if EbRelay* would be used for 1. and EbFetch* for 2. the traffic from 1. would even compete with each other!

If we use the same protocol for 1. and 2., network clients that get work from that queue would stop downloading of E1 transactions (the body is already downloaded or in flight from 1+ peers as we can't cancel requests).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this situation, if EbRelay* would be used for 1. and EbFetch* for 2. the traffic from 1. would even compete with each other!

Yes, them competing with each other is exactly what we want! The alternative is that fetching EB2 might not be able to start until (some aspect of) fetching EB1 has finished---ie E2 would be delayed.

We can still truncate/pause the EB1 work as soon as possible. But the parts we cannot cancel should not prevent the EB2 part from starting. They should be able to happen in parallel. With today's ouroboros-network framework's mux, separate mini protocols make that automatic.

Copy link
Member

@ch1bo ch1bo Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With competing I meant: If we don't stop 1., then 2. would be further delayed than it could be. Maybe I'm just worried about interactions across multiple client components?

the parts we cannot cancel should not prevent the EB2 part from starting

You mentioned pipelining within a protocol above. This would be possible here too, right? i.e. the client that fetches EBs as fast as possible would be able to send the request for EB2 while some tx or body requests are still in flight. Allowing two EB body requests at any given time sounds like a plausible protocol design here and would allow us to always start fetching E2 even when we are currently downloading the body of E1.

I think it's appealing to me if we could pack the whole EB fetching and scheduling complexity into a single protocol client (instead of spreading that responsibility across more "components").

One situation when this would be a really bad idea is when we would have a lot of big requests in 1. that are already pending and we couldn't act on doing 2. because we operate at our limits. However, I don't think this is the case here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Bookkeeping: Will raised a similar concern in a different thread: #484 (comment))

| VoteRelayId | Light | n/a | triplet of EB-slot, EB-issuer, and vote-issuer along with an optional RB header | the identified valid vote is now available from this upstream peer |
| VoteRelayBody | Heavy | triplet of EB-slot, EB-issuer, and vote-issuer | a vote | one of the votes that VoteRelayId indicated |
| EbFetchBody | Heavy | pair of EB-slot and EB-hash | an EB | one of the EBs that ChainSync indicated |
| EbFetchTx | Heavy | triplet of EB-slot, EB-hash, and sequence of indices | set of txs | some of the txs that EbFetchBody indicated |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried by the number of mini protocols. I see your point of finding repeating patterns of this Light/Heavy schema, but there seems to be a lot of duplication still? At the same time, this radical separation requires logic to interact across servers/clients which could be kept more local otherwise (see my comments below for example).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there seems to be a lot of duplication still

Could you elaborate? The only duplication from my perspective is the intentional "redundancy" I called out in one of the remarks.

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here's an example you already gave.

Do we even need EbRelay* if we have the gossip and EbFetch* for EB bodies and txs?

EbRelayBody and EbRelayTx definitely have the same exact shape as EbFetchBody and EbFetchTx. But as I mentioned in the other thread (#484 (comment)), that duplication is intentional redundancy, so that comparatively low-urgency EBs/txs being in-flight don't prevent us from fetching high-urgency EBs/txs from the same peer.

EbPublicize would then ignore equivocation that happens on the actual chain, since the equovicating headers would use hot keys with lower precedence.

Therefore, Praos's operational certificate issue number mechanism should be further constrained, in order to enable a reasonable bound on the EbPublicize traffic.
The proposed new constraint is that the operational certificate issue number cannot be incremented more than once per stability window (ie 36 hr on Cardano today), and EbPublicize would be relaxed as follows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting proposal. It sounds very much related to the key registration / protocol setup of Leios though and only tangentially relevant to the network mini protocol(s). It would be great to define what makes a valid EB opportunity and state there why its nice that validating it needs only an immutable ledger state (and what consequence on op cert increments this has). Then, in the network protocol definition it would suffice to say, that at most three EB opportunities may be publicized and to validate the response one needs an immutable ledger state.

.. only applies once we want to integrate this work into the CIP

Copy link
Collaborator Author

@nfrisby nfrisby Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds very much related to the key registration / protocol setup of Leios though and only tangentially relevant to the network mini protocol(s)

Yes, that's true. It arose here because I'm arguing here that the mini protocols' resource utilization is bounded, and I have to choose a concrete mechanism to achieve that, and there isn't a suitable one without this proposal.

But if we all agree on it, then it does seem better to (eventually) "upstream" it to the "Linear Leios spec" side of the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pagio should we chat about this?

@nfrisby
Copy link
Collaborator Author

nfrisby commented Aug 6, 2025

I've replied to all of Sebastian's comments. Hopefully, there's merely text/ideas in my responses that I should insert into/expand upon appropriate places within the document (and maybe the Linear Leios spec too----and eventually the CIP of course, once we're in agreement).

@nfrisby nfrisby requested review from pagio and will-break-it August 6, 2025 13:26
| New mini protocol | Schema | Identifier | Payload | Payload semantics |
| - | - | - | - | - |
| EbPublicize | Light | n/a | an RB header | the first or second valid RB header this upstream peer has seen with some pair of slot and RB issuer is now available from this upstream peer |
| EbRelayHeader | Light | n/a | an RB header | the EB body and the txs it identifies are now available from this upstream peer |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pagio

My plan here is: the node fetches only the first EB whose header arrives via EbRelayHeader even if a different header for the same EB opportunity already arrived via EbPublicize.

Is that still compliant with your specification? I think it is, because the node is fetching the first EB it could fetch per EB opportunity, but never more than one per EB opportunity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to fetch the EB for the first header your heard. If the two headers you mention above arrive in less than 3\Delta_hdr time, then it does not matter as no certificate will be created.
If this is not the case, then you should fetch the EB corresponding to the header that arrived early (as this is the one that may get certified). Does this make sense ?

Copy link
Collaborator Author

@nfrisby nfrisby Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted more on Slack. My takeaways:

  • Giorgos (and the spec) says that the node should only request the EB offered via EbRelayHeader if its hash matches the first header to arrive via EbPublicize.
  • The motivation for that rule is the assumption that the first implication is more likely to hold than the second implication. (Which helps achieve the goal that most honest nodes vote for the same EB, in the presence of late equivocation eg.)
    • "I received the header for P first" implies that "most honest stake received the header for P first".
    • "I received the body for Q first" implies that "most honest stake received the body for Q first".

My intuition is stubbornly unresponsive about that assumption 🤷:)

Copy link
Contributor

@will-break-it will-break-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great level of detail, but I share @ch1bo's concern about protocol granularity in two key respects:

(1) Complexity & availability
My concern isn't just about complexity for its own sake, but about prioritization mechanisms across protocol boundaries as @ch1bo has commented previously. With 8 separate mini-protocols handling different aspects of the same workflow (EB announcement → EB fetching → vote collection → certificate creation), there's a risk of suboptimal scheduling.

Example: If EbFetch needs urgent access to fetch a certified EB for chain validation, but the same peer is already serving large payloads via EbRelay, how does the node prioritize? The current design separates these to avoid blocking, but this creates a coordination problem across protocol boundaries, which relates to my second point following.

(2) Future complexity
The granular design may not provide a solid foundation for future protocol versions that will significantly increase message volume. For example, the current 4x header diffusion (EbPublicize, ChainSync, EbRelayHeader, VoteRelayId) is manageable given Linear Leios's limited block opportunities tied to RB rate (~0.05). However, as the CIP proposes when we move to decoupled EB or IB production with potentially 8-10x higher block rates, this redundancy could become a bottleneck. Though, this might be a future concern. But we have a chance to architect the set of protocols with this in mind.

An upstream peer must be able to offer newer EBs even while sending some older EBs/txs.
- EbRelay* and EbFetch* are separate mini protocols because EbFetch* is usually dormant but has utmost urgency when it's active.
As soon as a node realizes it does not already have the EB certified by some RB that it needs to validate, it should be able to instantly request that EB from upstream peers who have already offered that RB via ChainSync.
If EbRelay* and EbFetch* were not separate protocols, then the urgent EbFetch* requests might be blocked behind by the coincident EbRelay* requests.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-break-it wrote (in a top-level comment)

My concern isn't just about complexity for its own sake, but about prioritization mechanisms across protocol boundaries as [Sebastian] has commented previously. With 8 separate mini-protocols handling different aspects of the same workflow (EB announcement → EB fetching → vote collection → certificate creation), there's a risk of suboptimal scheduling.

Example: If EbFetch needs urgent access to fetch a certified EB for chain validation, but the same peer is already serving large payloads via EbRelay, how does the node prioritize? The current design separates these to avoid blocking, but this creates a coordination problem across protocol boundaries,

  • I feel like this discussion point is going in circles. I had wrote the bullet point I'm associating this Comment thread in an attempt to highlight that separate mini protocols actually makes it possible (via the existing ouroboros-network mux) to avoid suboptimal scheduling. I'll expand it into a more concrete discussion, to try to convey that point, since both you and Sebastian have been worried instead that the separate mini protocols will make it more difficult to avoid suboptimal scheduling.
  • The three Light mini protocols (EbPublicize, EbRelayHeader, and VoteRelayId) are entirely independent and also high priority, so their natural ourboros-network mux scheduling is already ideal. That leaves five instead of eight which might require coordination.
  • Today's node has five node-to-node mini protocols for Praos. An equal number for Leios doesn't raise alarm bells for me.

Copy link
Collaborator Author

@nfrisby nfrisby Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, as I'm working through the details (we chatted a bit during the Consensus Office Hours), I'm now doubting whether separating EbFetch* and EbRelay* causes the existing mux to provide the benefits I was hoping for.

Imagine that EBs just contain txs (or equivalently that we're fetching EBs for which we don't have any of the txs and their sets don't overlap).

  • Suppose an EbRelayBody server still has X bytes to send of some requested EB.
  • But then an EbFetchBody arrives that means the server needs to send another EB that is Y bytes.

Assuming that Y >= X, regardless of whether the mux is able to interleave the X bytes and the Y bytes, the EbFetchBody reply will arrive after X+Y bytes have been sent, modulo one "slice" (~12 kibibytes). The interleaving would only help if it were biased, which today's mux doesn't do.

However, if the mux were to do biased interleaving, then now timeouts for the lower-priority mini protocol would be naive unless they depend on whether a high-priority mini protocol were simultaneously active.

This is getting more complicated than I had been envisioning. So at this point, maybe consolidating some of the complexity within a combined EbRelayFetch mini protocol might help. I'll think it through now. (Today's mux already gives us all the benefits we'd want for EbRelayHeader merely by separating it---since the analogous X and Y are related by Y << X).


# Possible Extensions

## Avoid Redundant RB Headers
Copy link
Collaborator Author

@nfrisby nfrisby Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-break-it wrote (in a top-level comment)

The granular design may not provide a solid foundation for future protocol versions that will significantly increase message volume.

Good note, thanks. For now, I'm laser focused on Linear Leios. But at some point we absolutely should look forward.

For example, the current 4x header diffusion (EbPublicize, ChainSync, EbRelayHeader, VoteRelayId) is manageable given Linear Leios's limited block opportunities tied to RB rate (~0.05). However, as the CIP proposes when we move to decoupled EB or IB production with potentially 8-10x higher block rates, this redundancy could become a bottleneck. Though, this might be a future concern. But we have a chance to architect the set of protocols with this in mind.

For this specific concern, I added the section I'm attaching the GitHub comment to in order to show that we can avoid that redundancy under normal circumstances at least for honest EBs (ie those that don't equivocate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to diffuse RB headers for equivocation detection, independent of ChainSync etc
4 participants