Skip to content

Configurable Inference timeout during Query time #131551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Samiul-TheSoccerFan
Copy link
Contributor

This PR focuses on introducing user configurable inference timeout settings and use that as timeout during inference calls. Currently, it is hardcoded to 10s and the goal is to make it configurable.

Setup

PUT _inference/sparse_embedding/my-elser-model
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  },
  "task_settings": {}
}

PUT my-semantic-index-5
{
  "mappings": {
    "properties": {
      "writer": {
        "type": "semantic_text",
        "inference_id": "my-elser-model"
      },
      "reader": {
        "type": "semantic_text"
      }
    }
  }
}

PUT my-semantic-index-6
{
  "mappings": {
    "properties": {
      "writer": {
        "type": "semantic_text",
        "inference_id": "my-elser-model"
      },
      "reader": {
        "type": "semantic_text"
      }
    }
  }
}

POST my-semantic-index-5/_doc/1
{
  "writer": "Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

POST my-semantic-index-5/_doc/2
{
  "writer": "Another Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

POST my-semantic-index-5/_doc/3
{
   "writer": "Another Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

POST my-semantic-index-6/_doc/1
{
  "writer": "Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

POST my-semantic-index-6/_doc/2
{
  "writer": "Another Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

POST my-semantic-index-6/_doc/3
{
   "writer": "Another Little Red Riding Hood",
  "reader": ["inference test", "another inference test"]
}

GET the default settings:

GET /my-semantic-index-5/_settings

GET /my-semantic-index-5/_settings?include_defaults=true

GET /my-semantic-index-6/_settings

GET /my-semantic-index-6/_settings?include_defaults=true

Update the inference timeout value:

PUT /my-semantic-index-6/_settings
{
  "index": {
    "semantic_text": {
      "inference_timeout": "1s"
    }
  }
}

GET the updated settings:

GET /my-semantic-index-5/_settings

GET /my-semantic-index-5/_settings?include_defaults=true

GET /my-semantic-index-6/_settings

GET /my-semantic-index-6/_settings?include_defaults=true

@Samiul-TheSoccerFan Samiul-TheSoccerFan added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team v9.2.0 labels Jul 18, 2025
@elasticsearchmachine elasticsearchmachine added the Team:SearchOrg Meta label for the Search Org (Enterprise Search) label Jul 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

@Samiul-TheSoccerFan
Copy link
Contributor Author

@Mikep86 Do we need to ping ML team in the PR too?

@Mikep86
Copy link
Contributor

Mikep86 commented Jul 18, 2025

@Samiul-TheSoccerFan Yes, we should ping the ML team since it touches code they own

@Samiul-TheSoccerFan Samiul-TheSoccerFan added Team:ML Meta label for the ML team :ml Machine learning labels Jul 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@Samiul-TheSoccerFan Samiul-TheSoccerFan requested review from a team July 18, 2025 17:53
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, if you could take a look that'd be great!

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, but some more tests need to be updated

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few suggestions.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, I think we have unhandled edge cases and potentially divergent default values to manage.

Comment on lines 77 to 79
if (timeout == null) {
timeout = clusterService.getClusterSettings().get(InferencePlugin.INFERENCE_QUERY_TIMEOUT);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to apply this timeout if the input type is SEARCH or INTERNAL_SEARCH. Which brings up another edge case: If we allow timeout to be null now, we need to set default timeouts for the other input types as well.

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to everything except SageMakerServiceTests. I will take a look at those in a follow-up review.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a review of SageMakerServiceTests :)

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a refactor PR that will probably cause a merge conflict with this PR

In #131759 the clusterService member is removed from all the Inference Service implementations as intellij was reporting it as being unused. @Samiul-TheSoccerFan do you need the clusterService here? I'm happy to close my PR without merging otherwise if you just need it for the ElasticsearchInternalService I can work around that.

@Samiul-TheSoccerFan
Copy link
Contributor Author

@davidkyle Yes, we do need to pass the clusterService to all services so it become available to their super classes (SageMaker, and SenderService). The unused issue hopefully will be go away once we merge this PR.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little thing to fix up, then we're good to go 👍

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Samiul-TheSoccerFan Samiul-TheSoccerFan merged commit e28de98 into elastic:main Jul 28, 2025
33 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 29, 2025
…-tracking

* upstream/main: (26 commits)
  Add release notes for v9.1.0 release (elastic#131953)
  Unmute multi_node generative tests (elastic#132021)
  Avoid re-enqueueing merge tasks (elastic#132020)
  Fix file entitlements for shared data dir (elastic#131748)
  ES|QL brute force l2_norm vector function (elastic#132025)
  Make ES|QL SAMPLE not a pipeline breaker (elastic#132014)
  Speed up tail computation in MemorySegmentES91OSQVectorsScorer (elastic#132001)
  Remove deprecated usages in `TransportPutFollowAction` (elastic#132038)
  Simulate impact of shard movement using shard-level write load (elastic#131406)
  Remove RemoteClusterService.getConnections() method (elastic#131948)
  Fix off by one in ValuesBytesRefAggregator (elastic#132032)
  Use unicode strings in data generation by default (elastic#132028)
  Adding index.refresh_interval as a data stream setting (elastic#131482)
  [ES|QL] Add more Min/MaxOverTime CSV tests (elastic#131070)
  Restrict remote ENRICH after FORK (elastic#131945)
  Fix decoding of non-ascii field names in ignored source (elastic#132018)
  [docs] Use centrally maintained version variables (elastic#131939)
  Configurable Inference timeout during Query time (elastic#131551)
  ESQL: Allow pruning columns added by InlineJoin (elastic#131204)
  ESQL: Fail `profile` on text response formats (elastic#128627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:ML Meta label for the ML team Team:Search - Relevance The Search organization Search Relevance team Team:SearchOrg Meta label for the Search Org (Enterprise Search) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants