Skip to content

Conversation

@msfroh
Copy link
Contributor

@msfroh msfroh commented Jul 16, 2025

Description

Currently, the remote store implementation is all or nothing. If you want anything stored in the remote store, you pretty much need to store everything in the remote store.

This change adds an explicit setting so expert users can say, "No thanks, I don't want any of this remote cluster state or remote translog stuff. I just want segments replicated to a remote store." I needed to hack away at some of the existing logic that has embraced this "all or nothing" assumption.

I still can't bring up a search replica, because I can't seem to recover from remote store without translog recovery, but I can get a primary to push segments to the remote store. I can bring up a search replica! That had nothing to do with remote store configuration, but rather logic on SearchReplicaAllocationDecider that says search replicas must live on search nodes. I removed that rule if the cluster has no dedicated search nodes.

Related Issues

Resolves #18669

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@msfroh
Copy link
Contributor Author

msfroh commented Jul 16, 2025

Current state

Start up a local cluster with two nodes using:

./gradlew run -PnumNodes=2 -Dtests.opensearch.node.attr.remote_store.segment.repository=my-repo-1 \
                       -Dtests.opensearch.node.attr.remote_store.repository.my-repo-1.type=fs \
                       -Dtests.opensearch.node.attr.remote_store.repository.my-repo-1.settings.location=/tmp/remote_repository \
                       -Dtests.opensearch.path.repo=/tmp/remote_repository \
                       -Dtests.opensearch.node.attr.remote_store.segments_only=true

Create an index with a single shard and a single search replica:

curl -XPUT 'http://localhost:9200/myindex' -H 'Content-Type: application/json' \
    -d '{"settings":{"index":{"number_of_shards":1, "number_of_replicas":0, "number_of_search_replicas":1}}}'

Write a document to the index:

curl -X POST -H 'Content-Type: application/json' http://localhost:9200/myindex/_doc/1 -d '{"title": "Document 1"}'

The primary shard is created, but the search replica is unassigned and the recovery source is EMPTY_STORE (not REMOTE_STORE as I was hoping). Here's the routing table from cluster state:

"routing_table" : {
    "indices" : {
      "myindex" : {
        "shards" : {
          "0" : [
            {
              "state" : "STARTED",
              "primary" : true,
              "searchOnly" : false,
              "node" : "ryr1VOOxQRKta-2keyO-pw",
              "relocating_node" : null,
              "shard" : 0,
              "index" : "myindex",
              "allocation_id" : {
                "id" : "87i4fH1mSPSihnxFLNd-JA"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "searchOnly" : true,
              "node" : null,
              "relocating_node" : null,
              "shard" : 0,
              "index" : "myindex",
              "recovery_source" : {
                "type" : "EMPTY_STORE"
              },
              "unassigned_info" : {
                "reason" : "INDEX_CREATED",
                "at" : "2025-07-16T21:45:44.816Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"
              }
            }
          ]
        }
      }
    }
  }

The primary shard is writing segments to the remote store, which is nice:

% find /tmp/remote_repository                                                                           
/tmp/remote_repository
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/lock_files
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/data
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/data/_0.cfs__VzozFZgBzN5Ws4qkerGQ
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/data/_0.si__VTozFZgBzN5Ws4qkerGH
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/data/segments_3__UTozFZgBzN5Ws4qkYrGF
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/data/_0.cfe__UzozFZgBzN5Ws4qkerF9
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/metadata
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/metadata/metadata__9223372036854775806__9223372036854775804__9223372036854775805__9223372036854775805__1457225135__9223370284152423786__2
/tmp/remote_repository/oI-sevm5SkSNBhyQhzlDow/0/segments/metadata/metadata__9223372036854775806__9223372036854775804__9223372036854775805__9223372036854775806__1457225135__9223370284152429938__2

@github-actions
Copy link
Contributor

❌ Gradle check result for 4bef91a: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh
Copy link
Contributor Author

msfroh commented Jul 17, 2025

Aha! The search replica wasn't getting allocated because we added an allocation decider that only assigns a search replica if it's a search-only node. But what if a cluster doesn't have dedicated search nodes?

I've changed the logic to let me allocate a search replica to a node if a) the node is a search node, or b) there are no search nodes in the cluster. With that change, I'm able to get remote store based replication working with two nodes running on my laptop.

@msfroh
Copy link
Contributor Author

msfroh commented Jul 17, 2025

@linuxpi -- I'd appreciate your eyes on this, as it partially undoes your work from #8719, but only if someone explicitly says that they want to use remote store for segments only.

@github-actions
Copy link
Contributor

❌ Gradle check result for 80444d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh msfroh force-pushed the segment_only_remote_store branch from 9748f24 to 9cda00f Compare July 29, 2025 22:39
@msfroh
Copy link
Contributor Author

msfroh commented Jul 29, 2025

Thanks a lot, @shwetathareja! I've tried to incorporate your feedback.

To make things a little better, I've renamed the new node attribute to remote_store.mode, which currently only has two options: segments_only and default. If we're in segments_only mode, then we just populate the segments repository (and ignore any other repository attributes).

I was able to confirm that I could run a couple of nodes on my laptop using a shared fs repository to propagate writes from a primary on one node to a search replica on the other using segment replication.

@github-actions
Copy link
Contributor

❌ Gradle check result for 9cda00f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Bukhtawar
Copy link
Contributor

Sorry, just catching up. The current snapshots(pinned timestamps v2)are built on top of segments and translogs since we don't perform a per shard flush prior to snapshots(which is what allows us to scale better). So with this change we would also have to move back snapshots to support the shallow snapshot capability

@msfroh
Copy link
Contributor Author

msfroh commented Jul 30, 2025

@Bukhtawar, this change is to allow us to use segrep with pull-based ingestion and a clusterless architecture (without cluster managers). Since snapshots run through cluster managers, we can't do them anyway. Also, we don't have a translog, since the event stream that we're pulling from takes on that role.

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the replication mode changes with segments only remote store based indexes? Would it fall back to full request replication model?

@msfroh
Copy link
Contributor Author

msfroh commented Jul 31, 2025

Thanks @ashking94! I made those changes that you called out.

@github-actions
Copy link
Contributor

❌ Gradle check result for b6fb4ba: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Michael Froh <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2025

✅ Gradle check result for 1883267: SUCCESS

@msfroh msfroh merged commit a9b6d7a into opensearch-project:main Aug 2, 2025
30 of 31 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Storage Project Board Aug 2, 2025
sunqijun1 pushed a commit to sunqijun1/OpenSearch that referenced this pull request Aug 4, 2025
…ct#18773)

Currently, the remote store implementation is all or nothing. If you
want anything stored in the remote store, you pretty much need to
store everything in the remote store.

This change adds an explicit setting so expert users can say, "No,
thanks, I don't want any of this remote cluster state, remote translog
stuff. I just want segments replicated to a remote store."

---------

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: sunqijun.jun <[email protected]>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…ct#18773)

Currently, the remote store implementation is all or nothing. If you
want anything stored in the remote store, you pretty much need to
store everything in the remote store.

This change adds an explicit setting so expert users can say, "No,
thanks, I don't want any of this remote cluster state, remote translog
stuff. I just want segments replicated to a remote store."

---------

Signed-off-by: Michael Froh <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…ct#18773)

Currently, the remote store implementation is all or nothing. If you
want anything stored in the remote store, you pretty much need to
store everything in the remote store.

This change adds an explicit setting so expert users can say, "No,
thanks, I don't want any of this remote cluster state, remote translog
stuff. I just want segments replicated to a remote store."

---------

Signed-off-by: Michael Froh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[BUG] Cannot use remote store-based segment replication without enabling remote cluster state

6 participants