-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
[docs-Infra] Update indexName to 'material-ui-v5' for v5 Search #47049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v5.x
Are you sure you want to change the base?
Conversation
… search parameters
Netlify deploy previewhttps://deploy-preview-47049--material-ui.netlify.app/ Bundle size report |
So the alternative would be to keep a single index, but versioned? Just creating a crawler per version that indexes in the same index. |
@Janpot Why put old data into the main index? The index has page content in it. We can have up to 20 indexes in Algolia, so why create a large monolithic index when it can be partitioned according to usage? |
I haven't thought too deeply about it, it was just intuition. But thinking a bit about it:
|
The index name can just as easily be an ENV variable.
If we are extending the idea of scaling multiple major versions to new packages, such as Base UI, a 40MB index (the size of today's index) could be considered quite significant and near the limits of fitting into a serverless function.
I think of previous major versions as an archive. They exist on a dedicated branch and receive mostly backports for serious fixes. There are also some logistical hurdles to maintaining a separate branch when packages have decoupled releases (how "v7" of MUI X is frozen to Major versions are the only time when deprecated features are deliberately removed, so with a new major version, we also get to prune the index. With a monolithic index, each major version significantly increases the index's size. With each major version, we are choosing to remove unhelpful context. You would expect the latest index to receive a lot of traffic, the previous major version to receive less, and the major versions before that to receive significantly less traffic. This is why lumping them into one seems monolithic to me. There is also precedent to the idea of splitting the index by version: v4.mui.com uses a separate index and today works correctly, even though that branch and index probably haven't been touched in a long time. If indexes combine versions, then they become dependent on the crawler or a database to create the index. We can no longer assume that it is produced by the site content we have checked out in git. An index created for a PR would depend on outside information (or it would work differently from production).
It would make sense for minor versions to share an index. They ideally have more in common with one another and will evolve. Ideally, the content would reference older minors explicitly, e.g. "The checkbox component was added in v0.5.0". Then you could search for all features released in
I think with LLMs it is important to filter out information that might be misleading. My feeling would be that even glancing at content from a previous major could confuse an LLM. A major version is meant to be cohesive, and previous majors won't be considering future capabilities or improvements. For example, maybe a page on the last major suggests using a deprecated function that has since been removed. This recommendation would be deliberately removed in the latest version; however, if you're running the older version, the recommendation remains valid, and removing it would also be incorrect.
If we needed this, I think it would be on a separate page or context from the global docs search. We could create a heavier aggregate index for this case specifically. We could also optimize this index for the particular case, maybe we would exclude the content itself, or maybe we would add more metadata. |
Uses a separate index based on https://v5.mui.com/. We remove the
master
filter because the version is set tov5
everywhere except Toolpad, which hasmaster
as its version. There are only two versions, so there's little reason to filter.Adds a new crawler that crawls once a month.
Fix: #45771