Skip to content

Conversation

@Supplementing
Copy link
Contributor

@Supplementing Supplementing commented Oct 14, 2025

Summary

Closes https://github.com/elastic/ingest-dev/issues/5992

  • Added error handling for all fleet endpoints that use ES. ES errors will now bubble up and be properly returned along with their status code response from ES. This will stop errors from being 500 when they are actually something else, we just werent properly handling them at the top level.
  • Also added additional test coverage to verify the handler will throw/succeed when its supposed to, and added some simple integration tests to verify the entire flow works as intended.
image

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Identify risks

Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.

@Supplementing Supplementing requested a review from a team as a code owner October 14, 2025 21:39
@Supplementing Supplementing added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 14, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@Supplementing
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not loving this approach. it's extremely specific to this Fleet API & this type of ES error.

I can see from testing a direct ES query that this kind of error already returns 400 from ES. example curl:

curl -X GET -H "Content-Type: application/json" -d '{"sort":[{"invalid_field":{"order":"desc"}}]}' "http://elastic:changeme@localhost:9200/.kibana_ingest/_search" -v

can we explore a solution where the status code & error is just passed directly from ES? even better if we can handle it generically for all agent APIs (since they access ES directly instead of with soClient)

@Supplementing
Copy link
Contributor Author

I'm not loving this approach. it's extremely specific to this Fleet API & this type of ES error.

can we explore a solution where the status code & error is just passed directly from ES? even better if we can handle it generically for all agent APIs (since they access ES directly instead of with soClient)

Yeah, I like the idea of making it less specific and more inclusive. I think the issue comes from that direct usage of ES and errors not being handled/bubbled up correctly. Let me do some digging to see what needs to be done to fix it higher up and across all the API's

@Supplementing
Copy link
Contributor Author

Supplementing commented Oct 16, 2025

I decided to remove the specific check on the handler level and instead extended the FleetError class with a new FleetElasticsearchValidationError that will get caught at the top level for all the endpoints if an error is thrown directly by ES.

I also kept some of the transformation logic as the ES errors can be very generic at the top level with things like 'all shards failed', which is not super helpful to determine the root cause. This approach returns nested errors a bit better.

Nevermind, I removed the opinionated casting and just the ES errors bubble as intended.

This can be tested with the sortField query param on the GET api/fleet/agents endpoint, and also other endpoints such as GET api/fleet/package_policies. They all should handle ES errors gracefully now cc: @jen-huang

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few cleanup areas, will come back to do some testing

@Supplementing
Copy link
Contributor Author

@elasticmachine merge upstream

@Supplementing Supplementing enabled auto-merge (squash) October 17, 2025 14:58
@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #48 / Package policies Package Policy - upgrade when upgrading to a version where a variable has been removed "after each" hook for "successfully upgrades package policy"
  • [job] [logs] FTR Configs #28 / task_manager scheduling and running tasks should update schedule for existing task when calling ensureScheduled with a different schedule

Metrics [docs]

✅ unchanged

History

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit wrt to name, but otherwise LGTM. Ty for the changes!

@Supplementing Supplementing merged commit 00cb862 into elastic:main Oct 17, 2025
12 checks passed
@jen-huang
Copy link
Contributor

jen-huang commented Oct 17, 2025

I didn't realize this had auto-merge on. @Supplementing see above about the naming ^

@Supplementing
Copy link
Contributor Author

I didn't realize this had auto-merge on. @Supplementing see above about the naming ^

See quick PR here to rectify: #239640

nickpeihl pushed a commit to nickpeihl/kibana that referenced this pull request Oct 23, 2025
…pi/fleet/agents` (elastic#239017)

## Summary

Closes elastic/ingest-dev#5992

- Added error handling for all fleet endpoints that use ES. ES errors
will now bubble up and be properly returned along with their status code
response from ES. This will stop errors from being `500` when they are
actually something else, we just werent properly handling them at the
top level.
- Also added additional test coverage to verify the handler will
throw/succeed when its supposed to, and added some simple integration
tests to verify the entire flow works as intended.

<img width="811" height="236" alt="image"
src="https://github.com/user-attachments/assets/dbc3b831-6984-427a-9762-399ad3b8e1b9"
/>



### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: kibanamachine <[email protected]>
NicholasPeretti pushed a commit to NicholasPeretti/kibana that referenced this pull request Oct 27, 2025
…pi/fleet/agents` (elastic#239017)

## Summary

Closes elastic/ingest-dev#5992

- Added error handling for all fleet endpoints that use ES. ES errors
will now bubble up and be properly returned along with their status code
response from ES. This will stop errors from being `500` when they are
actually something else, we just werent properly handling them at the
top level.
- Also added additional test coverage to verify the handler will
throw/succeed when its supposed to, and added some simple integration
tests to verify the entire flow works as intended.

<img width="811" height="236" alt="image"
src="https://github.com/user-attachments/assets/dbc3b831-6984-427a-9762-399ad3b8e1b9"
/>



### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants