Phoenix-7568 - Adding Replication Log Replay Implementation #2278

Himanshu-g81 · 2025-08-26T15:18:28Z

High Level Description of major Replication Replay Componenets added in this PR

ReplicationLogReplayService - A singleton class that has single thread which gets all the HA groups and start Replication Replay for each using ReplicationReplay.get(conf, replicationGroup).startReplay(); every 60 seconds (configurable). Note that startReplay() of ReplicationReplay is idempotent. This is hooked into RS start / stop path in PhoenixRegionServerEndpoint.java
ReplicationLogReplay - Responsible for handling replication replay lifecycle for single HA Group. It initialize the file system from which replay needs to be done for this HA Group. The init method also initialize ReplicationReplayLogDiscovery (and respective ReplicationLogFileTrackerReplay for the group) - Responsibilites of these 2 componenets are described below.
ReplicationLogFileTracker - Abstract class to deal with all file system interactions (getNewFiles, markInProgress, markCompleted, markInProgress). It has one implementation currently for standby cluster (ReplicationLogReplayFileTracker) overriding directory (IN directory) and metric source. Similar implementation can be added for store and forward mode (with OUT as directory) and custom metric source.
ReplicationLogDiscovery - Abstract class responsible for logic of processing files round by round. It contains ReplicationLogFileTracker. It creates a thread pool (with all properties configurable, i.e. thread count, scheduling interval, etc) to process the log files round by round (details - this is salesforce internal doc, will update once design is published). The process method is abstract and implemented for standby cluster replay (as ReplicationReplayLogDiscovery) to apply mutations on target. Similar implementation can be added for store and forward mode - to just copy file to standby cluster.
ReplicationShardDirectoryManager - Encapsulates the logic of shard directory management (for both active and DR cluster). Only root directory needs to be given during initialization. Changes to leverage it in source are also part of this PR.

Simplified Sequence Diagram

Himanshu-g81 · 2025-10-06T08:20:52Z

Force pushed due to rebase with upstream feature branch.

apurtell · 2025-10-07T21:07:21Z

Looks like there are lots of blank, checkstyle, and spotbugs issues. Can we fix those first? Or are they false positives?

Copilot

Pull Request Overview

This PR implements major replication log replay infrastructure components for Phoenix, providing HA-aware standby cluster functionality for processing replication logs with state-aware processing.

Key changes:

Added state-aware replication log replay service and discovery components
Implemented shard-based directory management for distributed log file processing
Added comprehensive metrics tracking for replication operations

Reviewed Changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
ReplicationLogReplayService.java	Singleton service managing replication replay for all HA groups with configurable scheduling
ReplicationLogReplay.java	HA group-specific replay coordinator with singleton pattern and lifecycle management
ReplicationLogDiscoveryReplay.java	State-aware discovery implementation handling SYNC/DEGRADED/SYNCED_RECOVERY states with listener integration
ReplicationShardDirectoryManager.java	Time-based shard directory mapping for distributed file processing
ReplicationLogTracker.java	File lifecycle management with retry logic and UUID-based tracking
ReplicationRound.java	Time window representation for batch processing
Various metrics classes	Comprehensive monitoring and JMX integration
Test files	Extensive unit test coverage for all major components

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

...-server/src/main/java/org/apache/phoenix/replication/reader/ReplicationLogReplayService.java

...-core-server/src/main/java/org/apache/phoenix/replication/StoreAndForwardLogGroupWriter.java

Copilot · 2025-10-07T21:07:47Z

...e/src/test/java/org/apache/phoenix/replication/reader/ReplicationLogDiscoveryReplayTest.java

+
+        @Override
+        protected void processRound(ReplicationRound replicationRound) throws IOException {
+            System.out.println("Processing Round: " + replicationRound);


Using System.out.println() for logging in test code. Consider using a proper logging framework (SLF4J) for consistency with the rest of the codebase.

Definitely don't do this.

Copilot · 2025-10-07T21:07:47Z

...e/src/test/java/org/apache/phoenix/replication/reader/ReplicationLogDiscoveryReplayTest.java

+            // Simulate state change by listener after certain number of rounds
+            roundsProcessed++;
+            if (stateChangeAfterRounds > 0 && roundsProcessed == stateChangeAfterRounds && newStateAfterRounds != null) {
+                System.out.println("Rounds Processed: " + roundsProcessed + " - " + newStateAfterRounds);


Using System.out.println() for logging in test code. Consider using a proper logging framework (SLF4J) for consistency with the rest of the codebase.

Copilot · 2025-10-07T21:07:48Z

phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogDiscoveryTest.java

+        System.out.println("Processed files");
+        for (Path file : processedFiles) {
+            System.out.println(file);
+        }
+


Using System.out.println() for debug output in test code. This debug output should be removed or replaced with proper logging to avoid cluttering test output.

Suggested change

System.out.println("Processed files");

for (Path file : processedFiles) {

System.out.println(file);

}

apurtell

Address checkstyle, spotbugs, and copilot findings, please.

Himanshu-g81 · 2025-10-21T15:24:31Z

Address checkstyle, spotbugs, and copilot findings, please.

Sure, addressed all of those (except false positives of spotbugs).

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogTracker.java

tkhurana · 2025-10-21T22:36:00Z

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogTracker.java

+     * Enum representing the type of replication log directory.
+     * IN: Directory created on standby cluster for Incoming replication log files
+     * OUT: Directory created on primary cluster for Outgoing replication log files
+     */
+    public enum DirectoryType {
+        IN("in"),
+        OUT("out");
+
+        private final String name;
+
+        DirectoryType(final String name) {
+            this.name = name;
+        }
+
+        public String getName() {
+            return this.name;
+        }
+    }


I think we should get rid of this enum. It is not really an enum. What if we use a different terminology. These are just string constants but they don't need to be defined here but at a higher level. The ReplicationLogTracker doesn't need to know about it. It should take a path as input and just work with it.

...ver/src/main/java/org/apache/phoenix/replication/metrics/MetricsReplicationLogDiscovery.java

tkhurana · 2025-10-22T21:54:58Z

...in/java/org/apache/phoenix/replication/metrics/MetricsReplicationLogDiscoveryReplayImpl.java

+            MetricsReplicationLogDiscoveryImpl.METRICS_CONTEXT,
+            MetricsReplicationLogDiscoveryReplayImpl.METRICS_JMX_CONTEXT
+                    + ",haGroup=" + haGroupName);
+        super.groupMetricsContext =


Can this also be moved to the constructor instead of a standalone call ?

...main/java/org/apache/phoenix/replication/metrics/MetricsReplicationLogTrackerReplayImpl.java

...in/java/org/apache/phoenix/replication/metrics/MetricsReplicationLogDiscoveryReplayImpl.java

tkhurana · 2025-10-25T00:22:06Z

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogDiscovery.java

+        Optional<Long> minTimestampFromInProgressFiles =
+                getMinTimestampFromInProgressFiles();
+        if (minTimestampFromInProgressFiles.isPresent()) {
+            LOG.info("Initializing lastRoundProcessed from IN PROGRESS files with minimum "


IN should not be hardcoded here

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogDiscovery.java

…s files

Himanshu-g81 · 2025-10-28T15:26:30Z

Forced push due to rebase with upstream changes.

tkhurana · 2025-10-28T22:03:41Z

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogTracker.java

+
    protected String getInProgressLogSubDirectoryName() {
-        return getNewLogSubDirectoryName() + "_progress";
+        return getInSubDirectoryName() + "_progress";


Shouldn't this be a more generic name instead of getInSubDirectoryName

tkhurana · 2025-10-28T22:06:30Z

@Himanshu-g81 There are test failures which need to be fixed

https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-2278/20/artifact/yetus-general-check/output/patch-unit-root.txt

[ERROR] ReplicationLogDiscoveryTest.testProcessInProgressDirectoryWithIntermittentFailure Â» OutOfMemory Java heap space
[ERROR] ReplicationLogReplayTest.testInit:71 Â» IO HAGroupStoreClient is not initialized for HA group: testGroup
[ERROR] ReplicationLogReplayTest.testReplicationReplayCacheRemovalOnClose:119 Â» Runtime Failed to initialize ReplicationLogReplay
[ERROR] ReplicationLogReplayTest.testReplicationReplayInstanceCaching:94 Â» Runtime Failed to initialize ReplicationLogReplay

Himanshu-g81 mentioned this pull request Aug 26, 2025

[DRAFT] Phoenix-7568 Replication Log Replay #2243

Closed

Himanshu-g81 force-pushed the PHOENIX-7568-replication-log-replay-impl branch from 9e066e8 to 276fc91 Compare October 6, 2025 08:15

apurtell requested a review from Copilot October 7, 2025 21:06

Copilot AI reviewed Oct 7, 2025

View reviewed changes

apurtell requested changes Oct 7, 2025

View reviewed changes

tkhurana reviewed Oct 21, 2025

View reviewed changes

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogTracker.java Show resolved Hide resolved

tkhurana reviewed Oct 21, 2025

View reviewed changes

tkhurana reviewed Oct 22, 2025

View reviewed changes

...ver/src/main/java/org/apache/phoenix/replication/metrics/MetricsReplicationLogDiscovery.java Show resolved Hide resolved

tkhurana reviewed Oct 22, 2025

View reviewed changes

tkhurana reviewed Oct 25, 2025

View reviewed changes

phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogDiscovery.java Outdated Show resolved Hide resolved

Himanshu Gwalani added 15 commits October 28, 2025 20:54

Phoenix-7568 Replication Log Replay Implementation

3a7aad3

Removing redundant stop call

12f845d

Making LogFilterTracker as non abstract class

e3d386e

Merging StateTracker and LogDiscovery

114c899

Refactoring file names and format

fb65a2b

Fixing lindt issues

fda30f5

Fixing indentation

78e1522

Adding comments at class level

97ceab8

Fixing lines longer than 100 characters

95d93cf

Adding listner support for replication replay

e67e2ce

Adding more comprehensive test cases and updating comments

eb81a1f

Adding delay of one round from current time for processing in progres…

dca9cd7

…s files

Fixing lint issues

dcb5517

Fixing lint issues

aec6c79

Fixing spotbugs

5482307

Himanshu Gwalani added 6 commits October 28, 2025 20:54

Adding failover support during Replay

d0c4e31

Fixing whitespace and lint issues

5832a60

Fixing whitespace

a1a428e

Updating comments for process in progress file method

d8bc987

Addressing review comments

9f583fa

Incorporating new upstream changes for HAStore

c0cb77b

Himanshu-g81 force-pushed the PHOENIX-7568-replication-log-replay-impl branch from fdc75f4 to c0cb77b Compare October 28, 2025 15:25

tkhurana reviewed Oct 28, 2025

View reviewed changes

Fixing test cases

e296fe9

tkhurana approved these changes Oct 29, 2025

View reviewed changes

tkhurana merged commit fb204f7 into apache:PHOENIX-7562-feature Oct 29, 2025

Phoenix-7568 - Adding Replication Log Replay Implementation #2278

Phoenix-7568 - Adding Replication Log Replay Implementation #2278

Uh oh!

Conversation

Himanshu-g81 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Himanshu-g81 commented Oct 6, 2025

Uh oh!

apurtell commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apurtell Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apurtell Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apurtell Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apurtell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Himanshu-g81 commented Oct 21, 2025

Uh oh!

Uh oh!

tkhurana Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tkhurana Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tkhurana Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Himanshu-g81 commented Oct 28, 2025

Uh oh!

tkhurana Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

tkhurana commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Himanshu-g81 commented Aug 26, 2025 •

edited

Loading

apurtell commented Oct 7, 2025 •

edited

Loading

apurtell left a comment •

edited

Loading