Skip to content

Conversation

@Himanshu-g81
Copy link
Contributor

@Himanshu-g81 Himanshu-g81 commented Aug 26, 2025

High Level Description of major Replication Replay Componenets added in this PR

  1. ReplicationLogReplayService - A singleton class that has single thread which gets all the HA groups and start Replication Replay for each using ReplicationReplay.get(conf, replicationGroup).startReplay(); every 60 seconds (configurable). Note that startReplay() of ReplicationReplay is idempotent. This is hooked into RS start / stop path in PhoenixRegionServerEndpoint.java

  2. ReplicationLogReplay - Responsible for handling replication replay lifecycle for single HA Group. It initialize the file system from which replay needs to be done for this HA Group. The init method also initialize ReplicationReplayLogDiscovery (and respective ReplicationLogFileTrackerReplay for the group) - Responsibilites of these 2 componenets are described below.

  3. ReplicationLogFileTracker - Abstract class to deal with all file system interactions (getNewFiles, markInProgress, markCompleted, markInProgress). It has one implementation currently for standby cluster (ReplicationLogReplayFileTracker) overriding directory (IN directory) and metric source. Similar implementation can be added for store and forward mode (with OUT as directory) and custom metric source.

  4. ReplicationLogDiscovery - Abstract class responsible for logic of processing files round by round. It contains ReplicationLogFileTracker. It creates a thread pool (with all properties configurable, i.e. thread count, scheduling interval, etc) to process the log files round by round (details - this is salesforce internal doc, will update once design is published). The process method is abstract and implemented for standby cluster replay (as ReplicationReplayLogDiscovery) to apply mutations on target. Similar implementation can be added for store and forward mode - to just copy file to standby cluster.

  5. ReplicationShardDirectoryManager - Encapsulates the logic of shard directory management (for both active and DR cluster). Only root directory needs to be given during initialization. Changes to leverage it in source are also part of this PR.

Simplified Sequence Diagram
sequence_diagram

@Himanshu-g81 Himanshu-g81 force-pushed the PHOENIX-7568-replication-log-replay-impl branch from 9e066e8 to 276fc91 Compare October 6, 2025 08:15
@Himanshu-g81
Copy link
Contributor Author

Force pushed due to rebase with upstream feature branch.

@apurtell apurtell requested a review from Copilot October 7, 2025 21:06
@apurtell
Copy link
Contributor

apurtell commented Oct 7, 2025

Looks like there are lots of blank, checkstyle, and spotbugs issues. Can we fix those first? Or are they false positives?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements major replication log replay infrastructure components for Phoenix, providing HA-aware standby cluster functionality for processing replication logs with state-aware processing.

Key changes:

  • Added state-aware replication log replay service and discovery components
  • Implemented shard-based directory management for distributed log file processing
  • Added comprehensive metrics tracking for replication operations

Reviewed Changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ReplicationLogReplayService.java Singleton service managing replication replay for all HA groups with configurable scheduling
ReplicationLogReplay.java HA group-specific replay coordinator with singleton pattern and lifecycle management
ReplicationLogDiscoveryReplay.java State-aware discovery implementation handling SYNC/DEGRADED/SYNCED_RECOVERY states with listener integration
ReplicationShardDirectoryManager.java Time-based shard directory mapping for distributed file processing
ReplicationLogTracker.java File lifecycle management with retry logic and UUID-based tracking
ReplicationRound.java Time window representation for batch processing
Various metrics classes Comprehensive monitoring and JMX integration
Test files Extensive unit test coverage for all major components

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


@Override
protected void processRound(ReplicationRound replicationRound) throws IOException {
System.out.println("Processing Round: " + replicationRound);
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using System.out.println() for logging in test code. Consider using a proper logging framework (SLF4J) for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely don't do this.

// Simulate state change by listener after certain number of rounds
roundsProcessed++;
if (stateChangeAfterRounds > 0 && roundsProcessed == stateChangeAfterRounds && newStateAfterRounds != null) {
System.out.println("Rounds Processed: " + roundsProcessed + " - " + newStateAfterRounds);
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using System.out.println() for logging in test code. Consider using a proper logging framework (SLF4J) for consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Comment on lines 268 to 272
System.out.println("Processed files");
for (Path file : processedFiles) {
System.out.println(file);
}

Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using System.out.println() for debug output in test code. This debug output should be removed or replaced with proper logging to avoid cluttering test output.

Suggested change
System.out.println("Processed files");
for (Path file : processedFiles) {
System.out.println(file);
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor

@apurtell apurtell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address checkstyle, spotbugs, and copilot findings, please.

@Himanshu-g81
Copy link
Contributor Author

Address checkstyle, spotbugs, and copilot findings, please.

Sure, addressed all of those (except false positives of spotbugs).

Comment on lines 500 to 517
* Enum representing the type of replication log directory.
* IN: Directory created on standby cluster for Incoming replication log files
* OUT: Directory created on primary cluster for Outgoing replication log files
*/
public enum DirectoryType {
IN("in"),
OUT("out");

private final String name;

DirectoryType(final String name) {
this.name = name;
}

public String getName() {
return this.name;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of this enum. It is not really an enum. What if we use a different terminology. These are just string constants but they don't need to be defined here but at a higher level. The ReplicationLogTracker doesn't need to know about it. It should take a path as input and just work with it.

MetricsReplicationLogDiscoveryImpl.METRICS_CONTEXT,
MetricsReplicationLogDiscoveryReplayImpl.METRICS_JMX_CONTEXT
+ ",haGroup=" + haGroupName);
super.groupMetricsContext =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be moved to the constructor instead of a standalone call ?

Optional<Long> minTimestampFromInProgressFiles =
getMinTimestampFromInProgressFiles();
if (minTimestampFromInProgressFiles.isPresent()) {
LOG.info("Initializing lastRoundProcessed from IN PROGRESS files with minimum "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN should not be hardcoded here

@Himanshu-g81 Himanshu-g81 force-pushed the PHOENIX-7568-replication-log-replay-impl branch from fdc75f4 to c0cb77b Compare October 28, 2025 15:25
@Himanshu-g81
Copy link
Contributor Author

Forced push due to rebase with upstream changes.


protected String getInProgressLogSubDirectoryName() {
return getNewLogSubDirectoryName() + "_progress";
return getInSubDirectoryName() + "_progress";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a more generic name instead of getInSubDirectoryName

@tkhurana
Copy link
Contributor

@Himanshu-g81 There are test failures which need to be fixed

https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-2278/20/artifact/yetus-general-check/output/patch-unit-root.txt

[ERROR] ReplicationLogDiscoveryTest.testProcessInProgressDirectoryWithIntermittentFailure » OutOfMemory Java heap space
[ERROR] ReplicationLogReplayTest.testInit:71 » IO HAGroupStoreClient is not initialized for HA group: testGroup
[ERROR] ReplicationLogReplayTest.testReplicationReplayCacheRemovalOnClose:119 » Runtime Failed to initialize ReplicationLogReplay
[ERROR] ReplicationLogReplayTest.testReplicationReplayInstanceCaching:94 » Runtime Failed to initialize ReplicationLogReplay

@tkhurana tkhurana merged commit fb204f7 into apache:PHOENIX-7562-feature Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants