Skip to content

OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lsh1215
Copy link

@lsh1215 lsh1215 commented Aug 22, 2025

OpenSearch: Document ID management for AWS OpenSearch Serverless (manageDocumentIds)

Summary

  • AWS OpenSearch Serverless vector collections do not allow indexing with custom document IDs (issue: Document ID is not supported when adding embeddings to AWS OpenSearch #3818).
  • OpenSearchVectorStore#doAdd(List<Document>) was updated to make document ID handling configurable. When manageDocumentIds=false, the index request omits the ID so that OpenSearch auto-generates it.
  • The change is verified with unit and integration tests.

Background

  • Error observed: "Document ID is not supported in create/index operation request".
  • Root cause: AWS OpenSearch Serverless (time series/vector collections) disallows custom document IDs and upserts.
  • Goal: Allow clients to opt out of explicit IDs so OpenSearch can auto-generate them during indexing.

Changes

  • File: org.springframework.ai.vectorstore.opensearch.OpenSearchVectorStore
    • Method: doAdd(List<Document> documents)
    • Behavior:
      • manageDocumentIds=true (default): index with explicit IDs (backward-compatible)
      • manageDocumentIds=false: omit ID so that OpenSearch auto-generates it
// doAdd excerpt
if (this.manageDocumentIds) {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).id(openSearchDocument.id()).document(openSearchDocument)));
}
else {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).document(openSearchDocument)));
}

Usage

OpenSearchVectorStore store = OpenSearchVectorStore
    .builder(openSearchClient, embeddingModel)
    .initializeSchema(true)
    .manageDocumentIds(false) // AWS OpenSearch Serverless compatible
    .build();

Testing

Unit tests

  • File: OpenSearchVectorStoreTest
  • Verifies:
    • manageDocumentIds=true: BulkRequest contains explicit IDs
    • manageDocumentIds=false: BulkRequest omits IDs (auto-generated)
    • Single and multiple document cases
    • Embedding model error propagation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -Dtest=OpenSearchVectorStoreTest test

Integration tests

  • File: OpenSearchVectorStoreIT
  • Environment: Testcontainers OpenSearch + OpenAiEmbeddingModel
  • Verifies:
    • manageDocumentIds=false: indexing/search without explicit IDs (AWS Serverless compatible)
    • manageDocumentIds=true: explicit IDs and delete-by-ID
    • Indexing, similarity search, and content/metadata preservation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -am -Dtest=OpenSearchVectorStoreIT test

Caveats and compatibility

  • With manageDocumentIds=false, OpenSearch auto-generates IDs. ID-based deletion may therefore be limited; prefer filter-based deletion in this mode.
  • Existing behavior (explicit IDs) is preserved when manageDocumentIds=true.

Related PR and Next Steps

I believe the failing integration tests in this PR are related to the schema and search path improvements being introduced in [PR #1121].

To move forward, I see two potential paths:

  1. If [PR Enhanced OpenSearchVectorStore - Squashed #1121] is merged first, I will rebase my changes on top of it and resolve any conflicts or test failures.
  2. Alternatively, I can pull the necessary changes from [PR Enhanced OpenSearchVectorStore - Squashed #1121] into this PR to fix the integration tests directly.

Please let me know which approach you prefer. I'm happy to proceed with either option to get this resolved.


Related issue

…ITs; AWS Serverless compat.

- Update OpenSearchVectorStore#doAdd to omit explicit document IDs when manageDocumentIds=false, enabling AWS OpenSearch Serverless compatibility
- Add unit tests for document ID management logic in doAdd
- Add integration tests covering explicit/non-explicit ID modes and delete-by-ID behavior

Closes spring-projectsgh-3818

Signed-off-by: sanghun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant