Faster terms aggregation and field names retrieval #43

alycejenni · 2025-12-30T15:10:13Z

Added options to iter_terms for random sampling, which can be used when the exact counts aren't important. By default the sampling is off and everything works as before.

Also adds a manager method to get a list of field names and types from the mapping, which should be significantly faster than using get_parsed_fields or get_data_fields if counts and rankings don't matter at all.

If the exact counts aren't important, we can use a random sample to get a decent estimate of fields and their usage but significantly faster. The default leaves the behaviour as before (no sampling).

Tests field ranking over a larger number of random records.

Avoids having to constantly rebuild containers when developing and running tests repeatedly.

Useful for very quickly retrieving a basic list of all fields from the latest index. Returns field type information, but type counts are all set to 1.

alycejenni added 8 commits December 30, 2025 12:08

feat: allow optional sampling for iter_terms

138b98a

If the exact counts aren't important, we can use a random sample to get a decent estimate of fields and their usage but significantly faster. The default leaves the behaviour as before (no sampling).

feat: allow passing kwargs to iter_terms via get fields methods

eb1e5de

test: add test for iter_terms sampling

ffc5160

Tests field ranking over a larger number of random records.

chore: remove docker version

b8fad4f

ci: mount source folder to test volume

a75c550

Avoids having to constantly rebuild containers when developing and running tests repeatedly.

feat: add method to get field names on latest index

eb237fd

Useful for very quickly retrieving a basic list of all fields from the latest index. Returns field type information, but type counts are all set to 1.

fix: remove search and field from iter_terms kwargs

35fdac1

fix: iterate on dict items

46ed3fb

alycejenni merged commit ef41e09 into dev Dec 30, 2025
3 checks passed

alycejenni deleted the ginger/agg-speed branch December 30, 2025 15:28

alycejenni mentioned this pull request Dec 30, 2025

Release 2025-12-30 #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster terms aggregation and field names retrieval #43

Faster terms aggregation and field names retrieval #43

Uh oh!

alycejenni commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Faster terms aggregation and field names retrieval #43

Faster terms aggregation and field names retrieval #43

Uh oh!

Conversation

alycejenni commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants