-
Notifications
You must be signed in to change notification settings - Fork 148
go/worker/storage: Refactor state sync worker pt1 #6306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for oasisprotocol-oasis-core canceled.
|
a25e3c2 to
c22279e
Compare
|
For the non-synchronized documentation I should prepare a PR and merge intermediately after? |
|
Let me fix the linting first and then I open it again. :) |
c22279e to
ce420f1
Compare
619ca25 to
b65ffef
Compare
1f62446 to
b6a785c
Compare
| ) | ||
|
|
||
| ctx, cancel := context.WithCancel(w.ctx) | ||
| ctx, cancel := context.WithTimeout(w.ctx, diffResponseTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this one as we discussed in the other threads by simply copying the client internal timeout (defensive).
But this should probably be reset per call or increased 2x given that is now used for calling legacy and new protocol internally...
update: tempted to drop this commit alltogether :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this down to getDiff now and reset it.
b6a785c to
2d5d1e0
Compare
peternose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
| ctx, cancel := context.WithCancel(ctx) | ||
| defer cancel() | ||
|
|
||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not wg.Go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also removed redundant select in this go routine (captured by static check not part of our CI lint).
08b1526 to
406b4ff
Compare
Also rename node to worker, to avoid confusion.
Previously, genesis checkpoint was created right after the state may be initialized from the runtime descriptor. If this is not the case it is first fetched from the peers. Thus, we should force the genesis checkpoint only after checkpoint sync finishes. Finally, redundant nil check was removed.
This is desirable, so that worker that initilize a new checkpointer don't require accepting context, but instead the lifetime and initialization of checkpointer is handled by the worker's Serve method.
Previously if the context was not canceled the fetcher might be sending the diffs on a channel that cannot be emptied, since we are already out of the main for loop, resulting in wg.Wait to never complete.
This also serves as step towards passing the context explicitly.
In addition, committee storage worker now implements the Service interface. The corresponding BackgroundService methods (already not used) have been removed. Similarly, the storage worker was internally refactored to Service interface to ease eventual removal of the BackgroundService. Additionally, observe that the parent (storage worker) is registered as background service, thus upon error inside committee worker there is no need to manually request the node shutdown. Finally, panicking has been replaced with error. Semantic changed slightly: Previously storage worker would wait for all committee workers to finish. Now it will terminate when the first one finishes. This was already the case if the committee worker panicked.
406b4ff to
91ed0da
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6306 +/- ##
==========================================
+ Coverage 64.51% 64.56% +0.05%
==========================================
Files 697 698 +1
Lines 67847 67826 -21
==========================================
+ Hits 43770 43792 +22
+ Misses 19089 19046 -43
Partials 4988 4988 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What was done:
Follow-up:
I propose factoring out checkpointer and availability nudger (#6308) into separate workers. Could be one PR for each.
This will make diff sync part easier to reason about and thus optimize.