-
Notifications
You must be signed in to change notification settings - Fork 714
WIP: feat(refresh): implement refresh progress tracking for refresh table (for state switch) #23601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| impl RefreshProgressTracker { | ||
| /// Start tracking a new refresh operation | ||
| pub fn start_refresh( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we used this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan to merge this function with RefreshManager
| /// Map from table_id to refresh progress | ||
| progress_map: HashMap<TableId, RefreshProgress>, | ||
| /// Map from actor_id to table_id for quick lookup | ||
| actor_to_table: HashMap<ActorId, TableId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realize that RefreshProgressTracker needs to handle scaling as well (no matter online scaling or offline scaling), which means we need to know when to update the actor maps to keep it consistent with the latest table's parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be not? As we will cancel the refresh op across recovery, I think we can assume during the run, the actors and parallelism do not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. To assume the parallelism do not change during the running time, I think we need to ban online scaling to the batch refreshable table cc @shanicky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can skip or ban specific tables during online scaling and manual scaling.
During offline scaling, is it confirmed that there are no jobs related to those tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During offline scaling, is it confirmed that there are no jobs related to those tables?
Even though there are jobs related to these tables, I think offline scaling is always safe to perform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying job is always safe to scale. The refresh process will not persist any states, so it is also safe to scale, at the cost of re-running refresh.
…own risk (#23630) Co-authored-by: Li0k <[email protected]>
…ize` (#23622) Signed-off-by: Bugen Zhao <[email protected]>
…23625) Signed-off-by: Peng Chen <[email protected]>
…osition_delete.slt (#23591) Co-authored-by: xxhZs <[email protected]>
Signed-off-by: Shanicky Chen <[email protected]> Signed-off-by: Peng Chen <[email protected]>
Co-authored-by: Claude <[email protected]>
chenzl25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
- Updated `refresh_table` method to accept shared actor information for better tracking. - Introduced `SingleTableRefreshProgressTracker` to manage expected and finished actors during refresh operations. - Modified barrier reporting to use `TableId` instead of `u32` for associated source IDs, improving type safety and clarity. - Adjusted various executor implementations to align with the new source ID handling.
f38862e to
4563087
Compare
|
Looks like this PR extends new SQL syntax or updates existing ones. Make sure that:
|
|
Do we merge the changes altogether into #23527 instead so this PR is no longer needed? |
I messed up the git history in the pr, made some trouble merging into the original one. Will open a new one instead. |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Added a new module for tracking the progress of refresh operations across parallel actors. This includes the introduction of a
RefreshProgressstructure to monitor the state of each actor during the refresh process. The changes also integrate this tracking into the existing barrier control flow, allowing for better coordination and reporting of refresh completion states. Additionally, updated related files to accommodate the new refresh progress functionality.Checklist
Documentation
Release note