Skip to content

[merged segment warmer] Optimize redundant pending merged segment cleanup to avoid the listAll operation on the primary shard #18845

@guojialiang92

Description

@guojialiang92

Is your feature request related to a problem? Please describe

@ashking94 and I had a discussion in 18720's comment, and we can optimize the logic for cleaning up redundant pending merged segments in subsequent PRs. Thereby avoiding listAll operations.

Describe the solution you'd like

I paste the key discussion here for easy reading.

As I understand, we only want to prevent the deletion of such segments that are produced as part of the background merges and not yet part of the reader yet.

Yes.

I think this can be easily solved by having a variable that tracks segment files that have been published for pre-warm but not yet refreshed.

For the convenience of subsequent discussions, we can name it unSearchableMergedSegmentCheckpointList, which is maintained by the primary shard.

unSearchableMergedSegmentCheckpointList tracks segment files that have been published for pre-warm but not yet refreshed. Therefore, the merged segments it contains cannot be deleted.

I think we can add list of segments to be sent from primary to replica just before the publish merged segment action is invoked, and removed from the list when the publish refresh action is invoked.

Perhaps we need a dedicated refresh listener and remove MergedSegmentCheckpoint from unSearchableMergedSegmentCheckpointList when executing RefreshListener#afterRefresh.

If we perform the remove operation when PublishCheckpointAction is invoked, we may miss some checkpoints. For example, in RemoteStoreRefreshListener, PublishCheckpointAction is only invoked when the upload is successful.

The computeReferencedSegmentsCheckpoint can then use this list to add the pending merged segments files that are currently not part of segmentInfosGatedCloseable.

After introducing this proposal, MergedSegmentCheckpoint in unsearchableMergedSegmentCheckpointList need to be cleaned up to avoid memory leak.

Can we remove MergedSegmentCheckpoint from unSearchableMergedSegmentCheckpointList only when it is searchable (which means that the segmentInfosGatedCloseable holds the file)?

Yes.

I was thinking if publish_referenced_segments action can act as that hook where we do the cleanup of segments that are present in segmentInfosGatedCloseable from unSearchableMergedSegmentCheckpointList.

I think we still need a regularly scheduled task, as we still need to clean up expired checkpoints.

Also, around the memory leak point, we may have to take time information on when the checkpoint was added and clean it up after it is older than X mins/hours at the same time when the above cleanup is happening?

Yes, but the default value may be quite large. Meanwhile, when users need to reduce the configuration, they will understand the risk that merged segments are no longer tracked, which may result in useful pending merged segments on replica nodes being cleaned up.

Related component

Indexing:Replication

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Indexing:ReplicationIssues and PRs related to core replication framework eg segrepenhancementEnhancement or improvement to existing feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions