Add robots.txt to optimize AI crawler indexing for MLflow documentation #386
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
AI assistants (ChatGPT, Claude, Gemini, etc.) sometimes reference outdated MLflow documentation when answering questions about MLflow features. This happens because AI crawlers index all documentation versions equally, including legacy 1.x and 2.x versions, leading to confusion and incorrect information being provided to users.
Solution
This PR adds a
robots.txtfile to the MLflow website that optimizes AI crawler behavior by:/docs/latest/) to be indexed/docs/1.*/,/docs/2.*/,/docs/0.*/) from being indexedThe robots.txt includes specific configurations for major AI crawlers:
Implementation
The
robots.txtfile is placed inwebsite/static/directory, which is automatically copied to the root of the built site by Docusaurus. The file follows the standard robots.txt format and includes:With additional specific rules for each major AI crawler to ensure maximum compatibility.
Testing
/robots.txtendpointrobots.spec.tsto validate robots.txt contentScreenshot
The screenshot shows the robots.txt file being served correctly with all AI crawler configurations.
Impact
After this change is deployed, AI assistants will prioritize the latest MLflow documentation when crawling the site, ensuring users receive accurate and up-to-date information when asking about MLflow features. This addresses the issue where AI assistants sometimes point to older version documentation, causing confusion.
Original prompt
Fixes #385
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.