Skip to content

Conversation

@ahal
Copy link
Member

@ahal ahal commented Feb 25, 2025

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is
    referenced, the pull request should include the bug number in the title)
  • Scan the PR and verify that no changes (particularly to
    .circleci/config.yml) will cause environment variables (particularly
    credentials) to be exposed in test logs
  • Ensure the container image will be using permissions granted to
    telemetry-airflow
    responsibly.

ahal added 2 commits February 25, 2025 16:52
The docstring to `ensure_table` states that if there is a schema
mismatch an exception will be raised. This isn't actually true, I was
under the impression that `create_table(..., exists_ok=True)` would
raise if the schemas were different, but instead it proceeds without
error.
By instantiating the BigQueryLoader in the on_processing_complete
callback, we don't validate the tables until *after* we drain the pulse
queue. This is bad because if there's an error (like a schema mismatch),
we lose all those events forever.

This way the pulse messages don't get consumed, so the data will still
be there for consumption once the issue is fixed.
@ahal ahal requested a review from a team February 25, 2025 22:03
@ahal ahal self-assigned this Feb 25, 2025
@ahal ahal requested a review from gabrielbusta February 25, 2025 22:03
@ahal
Copy link
Member Author

ahal commented Feb 25, 2025

This PR adds some checks that would have prevented the regression I made to the ETL yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants