WIP: feat(refresh): implement refresh progress tracking for refresh table (for state switch) #23601

tabVersion · 2025-10-29T06:59:42Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Added a new module for tracking the progress of refresh operations across parallel actors. This includes the introduction of a RefreshProgress structure to monitor the state of each actor during the refresh process. The changes also integrate this tracking into the existing barrier control flow, allowing for better coordination and reporting of refresh completion states. Additionally, updated related files to accommodate the new refresh progress functionality.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

My PR needs documentation updates.

Release note

chenzl25 · 2025-11-03T04:08:15Z

src/meta/src/barrier/refresh_progress.rs

+
+impl RefreshProgressTracker {
+    /// Start tracking a new refresh operation
+    pub fn start_refresh(


Have we used this method?

Plan to merge this function with RefreshManager

chenzl25 · 2025-11-03T04:15:45Z

src/meta/src/barrier/refresh_progress.rs

+    /// Map from table_id to refresh progress
+    progress_map: HashMap<TableId, RefreshProgress>,
+    /// Map from actor_id to table_id for quick lookup
+    actor_to_table: HashMap<ActorId, TableId>,


I just realize that RefreshProgressTracker needs to handle scaling as well (no matter online scaling or offline scaling), which means we need to know when to update the actor maps to keep it consistent with the latest table's parallelism.

Might be not? As we will cancel the refresh op across recovery, I think we can assume during the run, the actors and parallelism do not change.

OK. To assume the parallelism do not change during the running time, I think we need to ban online scaling to the batch refreshable table cc @shanicky

We can skip or ban specific tables during online scaling and manual scaling.

During offline scaling, is it confirmed that there are no jobs related to those tables?

During offline scaling, is it confirmed that there are no jobs related to those tables?

Even though there are jobs related to these tables, I think offline scaling is always safe to perform.

The underlying job is always safe to scale. The refresh process will not persist any states, so it is also safe to scale, at the cost of re-running refresh.

…own risk (#23630) Co-authored-by: Li0k <[email protected]>

…23631)

…ize` (#23622) Signed-off-by: Bugen Zhao <[email protected]>

…23625) Signed-off-by: Peng Chen <[email protected]>

…osition_delete.slt (#23591) Co-authored-by: xxhZs <[email protected]>

Signed-off-by: Shanicky Chen <[email protected]> Signed-off-by: Peng Chen <[email protected]>

Co-authored-by: Claude <[email protected]>

chenzl25

Generally LGTM

- Updated `refresh_table` method to accept shared actor information for better tracking. - Introduced `SingleTableRefreshProgressTracker` to manage expected and finished actors during refresh operations. - Modified barrier reporting to use `TableId` instead of `u32` for associated source IDs, improving type safety and clarity. - Adjusted various executor implementations to align with the new source ID handling.

graphite-app · 2025-11-04T11:18:15Z

Looks like this PR extends new SQL syntax or updates existing ones. Make sure that:

Test cases about the new/updated syntax are added in src/sqlparser/tests/testdata. Especially, double check the formatted_sql is still a valid SQL #20713
The meaning of each enum variant is documented in PR description. Additionally, document what it means when each optional clause is omitted.

hzxa21 · 2025-11-04T13:12:12Z

Do we merge the changes altogether into #23527 instead so this PR is no longer needed?

tabVersion · 2025-11-05T06:05:33Z

Do we merge the changes altogether into #23527 instead so this PR is no longer needed?

I messed up the git history in the pr, made some trouble merging into the original one. Will open a new one instead.

tabVersion changed the title ~~feat(refresh): implement refresh progress tracking for refresh table (for state switch)~~ WIP: feat(refresh): implement refresh progress tracking for refresh table (for state switch) Oct 29, 2025

github-actions bot added type/feature Type: New feature. ci/run-e2e-single-node-tests ci/run-e2e-test-other-backends ci/run-e2e-iceberg-tests Invalid PR Title labels Oct 29, 2025

tabVersion mentioned this pull request Oct 29, 2025

feat(iceberg): support refreshable batch iceberg table #23527

Merged

8 tasks

chenzl25 reviewed Nov 3, 2025

View reviewed changes

chenzl25 requested review from shanicky and wenym1 November 3, 2025 04:15

wenym1 and others added 11 commits November 3, 2025 06:25

refactor: remove SerDe implementation on TableId (#23629)

6665de8

feat(stream): add license check for locality backfill (#23632)

e295d96

feat: Introduce SmallFiles Compaction for all sink types. (#23626)

627ab9f

feat: allow enabling storage ttl for non-append-only table on user's …

49ea155

…own risk (#23630) Co-authored-by: Li0k <[email protected]>

refactor: unify table id and separate strong typed table and job id (#…

c4f85a0

…23631)

feat(streaming): better handling of Update on conflict in `Material…

23234a6

…ize` (#23622) Signed-off-by: Bugen Zhao <[email protected]>

refactor: remove actorstatus enum and simplify actor state handling (#…

6910e7a

…23625) Signed-off-by: Peng Chen <[email protected]>

test(iceberg): add validation for PositionDeletes in iceberg_source_p…

e0df7e7

…osition_delete.slt (#23591) Co-authored-by: xxhZs <[email protected]>

feat: add SQL support to alter fragment parallelism (#23523)

c0d596c

Signed-off-by: Shanicky Chen <[email protected]> Signed-off-by: Peng Chen <[email protected]>

feat(pgwire): implement OAuth audience validation (#23155)

c2fd947

Co-authored-by: Claude <[email protected]>

Merge remote-tracking branch 'origin' into tab/batch-iceberg-3

7e6037b

chenzl25 approved these changes Nov 4, 2025

View reviewed changes

tabVersion force-pushed the tab/refine-track branch from f38862e to 4563087 Compare November 4, 2025 11:18

tabVersion requested a review from a team as a code owner November 4, 2025 11:18

github-actions bot added ci/run-s3-source-tests ci/main-cron/run-selected labels Nov 4, 2025

tabVersion closed this Nov 4, 2025

tabVersion mentioned this pull request Nov 5, 2025

feat: implement refresh progress tracking for refresh table #23671

Merged

8 tasks

WIP: feat(refresh): implement refresh progress tracking for refresh table (for state switch) #23601

WIP: feat(refresh): implement refresh progress tracking for refresh table (for state switch) #23601

Uh oh!

Conversation

tabVersion commented Oct 29, 2025

What's changed and what's your intention?

Checklist

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenzl25 left a comment

Choose a reason for hiding this comment

Uh oh!

graphite-app bot commented Nov 4, 2025

Uh oh!

hzxa21 commented Nov 4, 2025

Uh oh!

tabVersion commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants