Skip to content

Conversation

@alycejenni
Copy link
Member

Added options to iter_terms for random sampling, which can be used when the exact counts aren't important. By default the sampling is off and everything works as before.

Also adds a manager method to get a list of field names and types from the mapping, which should be significantly faster than using get_parsed_fields or get_data_fields if counts and rankings don't matter at all.

If the exact counts aren't important, we can use a random sample to get a decent estimate of fields and their usage but significantly faster. The default leaves the behaviour as before (no sampling).
Tests field ranking over a larger number of random records.
Avoids having to constantly rebuild containers when developing and running tests repeatedly.
Useful for very quickly retrieving a basic list of all fields from the latest index. Returns field type information, but type counts are all set to 1.
@alycejenni alycejenni merged commit ef41e09 into dev Dec 30, 2025
3 checks passed
@alycejenni alycejenni deleted the ginger/agg-speed branch December 30, 2025 15:28
@alycejenni alycejenni mentioned this pull request Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants