Skip to content

MAGE-1109: Add Batching Optimizer feature #1797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 18, 2025

Conversation

damcou
Copy link
Contributor

@damcou damcou commented Aug 7, 2025

This PR contains the Batching Optimizer CLI command which include the following process:

  • For each store that has the indexing enabled, performs a "scan" of a sample of products to get some figures about the size of the resulting records (stores can be specified with the store_id argument).
  • The product sample is defined by the percentage of "simple products" (simple, virtual, downloadable, giftcard) and "complex products" (configurable, bundle, grouped) in the catalog. (for example, a sample of 20 products from a catalog composed of 40% of simple products and 60% of complex products will have 8 simple products and 12 complex products).
  • With the sample, calculate some statistics regarding product records size (max, min, average, standard deviation)
  • According to these values, determine the optimal value of batching size for indexing requests sent to Algolia
  • Offer the possibility to update the "Maximum number of records processed per indexing job" configuration with this value with a prompt.

Copy link
Contributor

@cammonro cammonro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love where this is going. I know this is WIP - just a couple of small observations.

@damcou damcou requested a review from cammonro August 13, 2025 12:30
Copy link
Contributor

@cammonro cammonro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just fantastic work @damcou !! 🙌

I do see some issues if you wouldn't mind taking a look and had a few suggestions.

I also think we should add some language that says something along the lines that these numbers are estimates only and that indexing activity should be monitored after making changes to ensure batches are not exceeding the recommended size of 10 MB.

@damcou damcou marked this pull request as ready for review August 13, 2025 15:09
@damcou damcou requested a review from cammonro August 13, 2025 15:09
Copy link
Contributor

@cammonro cammonro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a POC for the issue you mentioned - if you think this can work then we could do something for margin as well.

Also noted one issue on the division by zero check.

Additional comments in Jira.

@damcou damcou requested a review from cammonro August 14, 2025 10:57
Copy link
Contributor

@cammonro cammonro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chebyshev for the win! 😆

Looking great. Let's go! 🚀

@damcou damcou merged commit b796bb6 into release/3.17.0-dev Aug 18, 2025
4 checks passed
@damcou damcou deleted the feat/MAGE-1109-batching-optimizer branch August 18, 2025 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants