-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
We occasionally must increase lock_timeout
and statement_timeout
for migrations that may take longer for highly-active tables.
Here's an example in our migration template:
warehouse/warehouse/migrations/script.py.mako
Lines 28 to 48 in 66b6730
# Note: It is VERY important to ensure that a migration does not lock for a | |
# long period of time and to ensure that each individual migration does | |
# not break compatibility with the *previous* version of the code base. | |
# This is because the migrations will be ran automatically as part of the | |
# deployment process, but while the previous version of the code is still | |
# up and running. Thus backwards incompatible changes must be broken up | |
# over multiple migrations inside of multiple pull requests in order to | |
# phase them in over multiple deploys. | |
# | |
# By default, migrations cannot wait more than 4s on acquiring a lock | |
# and each individual statement cannot take more than 5s. This helps | |
# prevent situations where a slow migration takes the entire site down. | |
# | |
# If you need to increase this timeout for a migration, you can do so | |
# by adding: | |
# | |
# op.execute("SET statement_timeout = 5000") | |
# op.execute("SET lock_timeout = 4000") | |
# | |
# To whatever values are reasonable for this migration as part of your | |
# migration. |
The behavior here is often that once we see the failure, we:
- either add or increase the values in the migration
- commit, pull request, get approval, merge
- cabotage tries with the new values
This creates a longer timeline than is probably needed, especially since this is not an exact science, more of a trial-and-find-out.
Another idea is to have these statements be set always, and use envvars to control them.
Thus the process can become, once a failure is observed:
- set/increase the environment variable values
- create a new Release in cabotage
This could cut down the time needed to deploy with a new value, but the flip side is that I don't think there's a good way today to say to cabotage "if this succeeds, remove the envvar" to prevent misuse from the future deployment releases.