-
Notifications
You must be signed in to change notification settings - Fork 30
Add docs about wasted traffic #2047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs about wasted traffic #2047
Conversation
Fixes #576 [static] Signed-off-by: Martin Florian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think overall it makes sense to document this but the current documentation sounds too scary imho. We need to make it clear that waste is expected due to network blibs, ordering layer unreliance, restarts, … and only increases should be something to worry about. Otherwise we will get a bunch of confused users that start worrying about every non-zero value which isn't helpful for anyone.
docs/src/deployment/traffic.rst
Outdated
-------------- | ||
|
||
`Wasted traffic` is defined as synchronizer events that have been sequenced but will not be delivered to their recipients. | ||
Wasted traffic is problematic for validators because of traffic fees: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasted traffic is problematic for validators because of traffic fees: | |
Wasted traffic can be problematic for validators because of traffic fees: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want this to sound too scary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded more than that
docs/src/deployment/traffic.rst
Outdated
|
||
Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due | ||
to their individual configuration and/or the specific applications using their validators. | ||
The Splice distribution contains a :ref:`Grafana dashboard <metrics_grafana_dashboards>` about `Synchronizer Fees (validator view)` that can be helpful in addition to inspecting logs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that rejected traffic is the same as wasted traffic? I believe it is from a quick look at the code but not entirely sure if it doesn't also include the stuff that gets rejected before sequencing already. If you are not sure I'd double check with Thibault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbf I wasn't entirely sure, hence the vague wording, but looking at our own comments I am now pretty sure that this is what this panel is for:
splice/cluster/pulumi/infra/grafana-dashboards/canton-network/synchronizer-fees-validator.json
Line 475 in fee65f5
"description": "Shows the traffic cost per second of requests that were sequenced but not delivered successfully by the sequencer. The reason for rejection should be visible in the next graph.", |
docs/src/deployment/traffic.rst
Outdated
Validator perspective | ||
+++++++++++++++++++++ | ||
|
||
Validator operators are encouraged to investigate failed submissions eagerly to avoid systemic causes for wasted traffic that are due |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sounds too scary. Investigating any single non-zero wasted traffic is a waste of time. You have BFT sequencer connections + request amplification. You also have an ordering layer that can drop messages (this does not change for dabft). You are gonna waste some amount of traffic. So I think we need to make that explicit and tell people that some amount of it is expected but if they see a sudden increase something is probably off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also have an ordering layer that can drop messages (this does not change for dabft).
Trying to understand how this can lead to wasted traffic.... Once it's sequenced (precondition for counting it as wasted traffic), the ordering layer dropping messages shouldn't matter anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm right a full drop doesn't matter. A delay or replay does though and we do observe that occasionally.
[ci] Signed-off-by: Martin Florian <[email protected]>
[static] Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs [ci] Signed-off-by: Martin Florian <[email protected]>
[ci] Signed-off-by: Martin Florian <[email protected]>
[ci] Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs [ci] Signed-off-by: Martin Florian <[email protected]>
[ci] Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
…2048-improve-validator-dr-docs Signed-off-by: Martin Florian <[email protected]>
[ci] Signed-off-by: Martin Florian <[email protected]>
…to martinflorian-da/hls-576-doc-wasted-traffic Signed-off-by: Martin Florian <[email protected]>
[static] Signed-off-by: Martin Florian <[email protected]>
…576-doc-wasted-traffic Signed-off-by: Martin Florian <[email protected]>
Signed-off-by: Martin Florian <[email protected]>
Excellent points @moritzkiefer-da , thank you! I overhauled it quite a bit now. Can I trouble you for another read? (Also updated the screenshot in the description FYI.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice thank you! Thanks for taking the time to address my suggestions
docs/src/deployment/traffic.rst
Outdated
Validator perspective | ||
+++++++++++++++++++++ | ||
|
||
Validator operators are encouraged to monitor the rate of failed submissions on their validators and investigate the causes of repeatedly failing submissions eagerly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we document how they can do so? It feels a bit weird to talk about monitoring failed submissions but then we only tell people how to monitor wasted traffic after you explain that not all failed submisions are wasted traffic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmpf so I was thinking of them monitoring it via what their apps do... Let me just remove the explicit mention of monitoring... not sure that I want to expand this a lot further right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping that part
Validator operators are encouraged to investigate the causes of repeatedly failing submissions.
...because it seems like sometimes stating obvious things is helpful...
The Splice distribution contains a :ref:`Grafana dashboard <metrics_grafana_dashboards>` about `Synchronizer Fees (validator view)`, | ||
to assist in monitoring traffic-related metrics. | ||
The `Rejected Event Traffic` panel on this dashboard is especially relevant for determining the rate of wasted traffic. | ||
(Hover on the ⓘ symbols in panel headers for precise descriptions of the shown data.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impressive unicode skills
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly impressed that it rendered as expected. It did!
[static] Signed-off-by: Martin Florian <[email protected]>
Fixes hyperledger-labs#576 Signed-off-by: Martin Florian <[email protected]> Signed-off-by: hrischuk-da <[email protected]>
Fixes #576