OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

lsh1215 · 2025-08-22T09:35:55Z

OpenSearch: Document ID management for AWS OpenSearch Serverless (manageDocumentIds)

Summary

AWS OpenSearch Serverless vector collections do not allow indexing with custom document IDs (issue: Document ID is not supported when adding embeddings to AWS OpenSearch #3818).
OpenSearchVectorStore#doAdd(List<Document>) was updated to make document ID handling configurable. When manageDocumentIds=false, the index request omits the ID so that OpenSearch auto-generates it.
The change is verified with unit and integration tests.

Background

Error observed: "Document ID is not supported in create/index operation request".
Root cause: AWS OpenSearch Serverless (time series/vector collections) disallows custom document IDs and upserts.
Goal: Allow clients to opt out of explicit IDs so OpenSearch can auto-generate them during indexing.

Changes

File: org.springframework.ai.vectorstore.opensearch.OpenSearchVectorStore
- Method: doAdd(List<Document> documents)
- Behavior:
  - manageDocumentIds=true (default): index with explicit IDs (backward-compatible)
  - manageDocumentIds=false: omit ID so that OpenSearch auto-generates it

// doAdd excerpt
if (this.manageDocumentIds) {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).id(openSearchDocument.id()).document(openSearchDocument)));
}
else {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).document(openSearchDocument)));
}

Usage

OpenSearchVectorStore store = OpenSearchVectorStore
    .builder(openSearchClient, embeddingModel)
    .initializeSchema(true)
    .manageDocumentIds(false) // AWS OpenSearch Serverless compatible
    .build();

Testing

Unit tests

File: OpenSearchVectorStoreTest
Verifies:
- manageDocumentIds=true: BulkRequest contains explicit IDs
- manageDocumentIds=false: BulkRequest omits IDs (auto-generated)
- Single and multiple document cases
- Embedding model error propagation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -Dtest=OpenSearchVectorStoreTest test

Integration tests

File: OpenSearchVectorStoreIT
Environment: Testcontainers OpenSearch + OpenAiEmbeddingModel
Verifies:
- manageDocumentIds=false: indexing/search without explicit IDs (AWS Serverless compatible)
- manageDocumentIds=true: explicit IDs and delete-by-ID
- Indexing, similarity search, and content/metadata preservation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -am -Dtest=OpenSearchVectorStoreIT test

Caveats and compatibility

With manageDocumentIds=false, OpenSearch auto-generates IDs. ID-based deletion may therefore be limited; prefer filter-based deletion in this mode.
Existing behavior (explicit IDs) is preserved when manageDocumentIds=true.

Related PR and Next Steps

I believe the failing integration tests in this PR are related to the schema and search path improvements being introduced in [PR #1121].

To move forward, I see two potential paths:

If [PR Enhanced OpenSearchVectorStore - Squashed #1121] is merged first, I will rebase my changes on top of it and resolve any conflicts or test failures.
Alternatively, I can pull the necessary changes from [PR Enhanced OpenSearchVectorStore - Squashed #1121] into this PR to fix the integration tests directly.

Please let me know which approach you prefer. I'm happy to proceed with either option to get this resolved.

Related issue

See issue Document ID is not supported when adding embeddings to AWS OpenSearch #3818

…ITs; AWS Serverless compat. - Update OpenSearchVectorStore#doAdd to omit explicit document IDs when manageDocumentIds=false, enabling AWS OpenSearch Serverless compatibility - Add unit tests for document ID management logic in doAdd - Add integration tests covering explicit/non-explicit ID modes and delete-by-ID behavior Closes spring-projectsgh-3818 Signed-off-by: sanghun <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

lsh1215 commented Aug 22, 2025

Uh oh!

Uh oh!

OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

Are you sure you want to change the base?

OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220

Conversation

lsh1215 commented Aug 22, 2025

OpenSearch: Document ID management for AWS OpenSearch Serverless (manageDocumentIds)

Summary

Background

Changes

Usage

Testing

Unit tests

Integration tests

Caveats and compatibility

Related PR and Next Steps

Related issue

Uh oh!

Uh oh!