Skip to content

Conversation

tw4l
Copy link
Member

@tw4l tw4l commented Aug 5, 2025

Fixes #2765

(This is a necessary part of ensuring mongo documents don't exceed 16MB. Hopefully it's also sufficient but we'll need to see in practice if there are other fields that need to be separated out.)

This PR moves crawl and QA run logs into a separate crawl_logs mongo collection.

It adds a new backend module (without a distinct API router, but the crawls module is getting quite large and it seemed to make sense to add a separate module for the new mongo collection), as well as a migration to move crawl logs from Crawl objects into the new collection. The existing nightly test for crawl error logs is fleshed out, and a new nightly test added for behavior logs.

The migration has been tested locally. I've also verified that the new collection's indices are used by the existing crawl error and behavior log endpoints.

@tw4l tw4l requested a review from ikreymer August 5, 2025 19:07
@tw4l tw4l force-pushed the issue-2765-split-off-logs branch from 902a347 to 2ebc399 Compare August 5, 2025 19:07
Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! (Just made some minor changes to remove the unused fields, left only the out fields as deprecated)

@ikreymer ikreymer merged commit 0e1634f into main Sep 5, 2025
25 checks passed
@ikreymer ikreymer deleted the issue-2765-split-off-logs branch September 5, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Move logs out of crawl documents to ensure documents don't exceed 16 MB Mongo limit
2 participants