Skip to content

Conversation

martinflorian-da
Copy link
Contributor

@martinflorian-da martinflorian-da commented Aug 27, 2025

Fixes #576

image

Fixes #576

[static]

Signed-off-by: Martin Florian <[email protected]>
@martinflorian-da martinflorian-da marked this pull request as ready for review August 27, 2025 16:04
Copy link
Contributor

@moritzkiefer-da moritzkiefer-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think overall it makes sense to document this but the current documentation sounds too scary imho. We need to make it clear that waste is expected due to network blibs, ordering layer unreliance, restarts, … and only increases should be something to worry about. Otherwise we will get a bunch of confused users that start worrying about every non-zero value which isn't helpful for anyone.

--------------

`Wasted traffic` is defined as synchronizer events that have been sequenced but will not be delivered to their recipients.
Wasted traffic is problematic for validators because of traffic fees:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Wasted traffic is problematic for validators because of traffic fees:
Wasted traffic can be problematic for validators because of traffic fees:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want this to sound too scary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded more than that


Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due
to their individual configuration and/or the specific applications using their validators.
The Splice distribution contains a :ref:`Grafana dashboard <metrics_grafana_dashboards>` about `Synchronizer Fees (validator view)` that can be helpful in addition to inspecting logs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that rejected traffic is the same as wasted traffic? I believe it is from a quick look at the code but not entirely sure if it doesn't also include the stuff that gets rejected before sequencing already. If you are not sure I'd double check with Thibault.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbf I wasn't entirely sure, hence the vague wording, but looking at our own comments I am now pretty sure that this is what this panel is for:

"description": "Shows the traffic cost per second of requests that were sequenced but not delivered successfully by the sequencer. The reason for rejection should be visible in the next graph.",

Validator perspective
+++++++++++++++++++++

Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds too scary. Investigating any single non-zero wasted traffic is a waste of time. You have BFT sequencer connections + request amplification. You also have an ordering layer that can drop messages (this does not change for dabft). You are gonna waste some amount of traffic. So I think we need to make that explicit and tell people that some amount of it is expected but if they see a sudden increase something is probably off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also have an ordering layer that can drop messages (this does not change for dabft).

Trying to understand how this can lead to wasted traffic.... Once it's sequenced (precondition for counting it as wasted traffic), the ordering layer dropping messages shouldn't matter anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm right a full drop doesn't matter. A delay or replay does though and we do observe that occasionally.

[ci]

Signed-off-by: Martin Florian <[email protected]>
[static]

Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs

[ci]

Signed-off-by: Martin Florian <[email protected]>
[ci]

Signed-off-by: Martin Florian <[email protected]>
[ci]

Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs

[ci]

Signed-off-by: Martin Florian <[email protected]>
[ci]

Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs

Signed-off-by: Martin Florian <[email protected]>
[ci]

Signed-off-by: Martin Florian <[email protected]>
…to martinflorian-da/hls-576-doc-wasted-traffic

Signed-off-by: Martin Florian <[email protected]>
[static]

Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
@martinflorian-da
Copy link
Contributor Author

Excellent points @moritzkiefer-da , thank you! I overhauled it quite a bit now. Can I trouble you for another read? (Also updated the screenshot in the description FYI.)

Copy link
Contributor

@moritzkiefer-da moritzkiefer-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice thank you! Thanks for taking the time to address my suggestions

Validator perspective
+++++++++++++++++++++

Validator operators are encouraged to monitor the rate of failed submissions on their validators and investigate the causes of repeatedly failing submissions eagerly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document how they can do so? It feels a bit weird to talk about monitoring failed submissions but then we only tell people how to monitor wasted traffic after you explain that not all failed submisions are wasted traffic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmpf so I was thinking of them monitoring it via what their apps do... Let me just remove the explicit mention of monitoring... not sure that I want to expand this a lot further right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm keeping that part

Validator operators are encouraged to investigate the causes of repeatedly failing submissions.

...because it seems like sometimes stating obvious things is helpful...

The Splice distribution contains a :ref:`Grafana dashboard <metrics_grafana_dashboards>` about `Synchronizer Fees (validator view)`,
to assist in monitoring traffic-related metrics.
The `Rejected Event Traffic` panel on this dashboard is especially relevant for determining the rate of wasted traffic.
(Hover on the ⓘ symbols in panel headers for precise descriptions of the shown data.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive unicode skills

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly impressed that it rendered as expected. It did!

[static]

Signed-off-by: Martin Florian <[email protected]>
@martinflorian-da martinflorian-da enabled auto-merge (squash) August 29, 2025 09:58
@martinflorian-da martinflorian-da merged commit 86ebed5 into main Aug 29, 2025
40 checks passed
@martinflorian-da martinflorian-da deleted the martinflorian-da/hls-576-doc-wasted-traffic branch August 29, 2025 10:08
hrischuk-da pushed a commit to hrischuk-da/splice that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document what Wasted traffic cost means; maybe how to monitor and what to do about it
2 participants