- 
                Notifications
    You must be signed in to change notification settings 
- Fork 405
MSC4140: finalised delayed events, and more #19038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
MSC4140: finalised delayed events, and more #19038
Conversation
- Store sent/cancelled/failed delayed events, i.e. finalised delayed events, and support looking them up to inspect whether a delayed event was sent or not. Set limits on how many finalised events to store. - Support looking up delayed events by ID - Return 200 when retrying the same action (send/cancel) on an already-finalised delayed event, or 409 for a conflicting action - Limit how many delayed events a user may have scheduled at a time
| @@ -0,0 +1 @@ | |||
| Add more support for MSC4140, namely the ability to inspect sent, cancelled, or failed delayed events, aka "finalised" delayed events. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own reference, what is delayed events being used for?
I thought this was related to VoIP stuff (calls) and the new meta is with sticky events.
Are delayed events going to be deprecated in favor of sticky events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -0,0 +1 @@ | |||
| Add more support for MSC4140, namely the ability to inspect sent, cancelled, or failed delayed events, aka "finalised" delayed events. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there Complement changes to go along with this?
I see some TestDelayedEvents Complement tests that are failing: https://github.com/element-hq/synapse/actions/runs/18411628889/job/52465384686?pr=19038
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there Complement changes to go along with this?
No, not yet. I'll add some soon.
I see some
TestDelayedEventsComplement tests that are failing
That's an unrelated failure, which has been flaky for a frustratingly long time. Maybe now is a good time to try to tackle it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem flaky. It's failed all 3 times for both SQLite and Postgres. And it's only TestDelayedEvents
| ) | ||
|  | ||
| # MSC4140: How many finalised delayed events to keep per user before deleting them. | ||
| self.msc4140_finalised_retention_limit = experimental.get( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.msc4140_finalised_retention_limit = experimental.get( | |
| self.msc4140_finalised_per_user_retention_limit = experimental.get( | 
| -- See the GNU Affero General Public License for more details: | ||
| -- <https://www.gnu.org/licenses/agpl-3.0.html>. | ||
|  | ||
| CREATE TABLE finalised_delayed_events ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the difference between delayed_events.is_processed and the state in finalised_delayed_events?
It looks like finalised_delayed_events holds more info but I'm not immediately seeing the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_processed is for delayed events that have just timed out and are in the process of being sent / persisted to their target room's DAG.  Only once a delayed event is successfully sent does it get finalised.  (Prior to this PR, it would instead get deleted from the DB.)
The purpose of tracking is_processed is to handle the edge case of the server going down after a delayed event times out, but before it gets sent.  Upon server restart, the sending of any is_processed delayed events will be retried.
I'm admittedly not a big fan of this, but I couldn't find a way to make an atomic action out of a delayed event timing out & being sent.  I also didn't want to fiddle with that in this PR, even if finalised events now have some redundancy with is_processed events.
| error bytea, | ||
| event_id TEXT, | ||
| finalised_ts BIGINT NOT NULL, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put this in delayed_events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was to reduce the number of columns in the delayed_events table, as finalised_ts is relevant only to finalised events.  In general, I wanted to keep any finalised-only columns out of the non-finalised delayed_events table, especially if more finalised-only properties get added later (which would allow a schema update to leave the non-finalised delayed_events table alone).
| -- See the GNU Affero General Public License for more details: | ||
| -- <https://www.gnu.org/licenses/agpl-3.0.html>. | ||
|  | ||
| CREATE TABLE finalised_delayed_events ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a description on what finalised means: "delayed events that have either been sent, cancelled, or were not sent due to an error" (MSC4140)
| for user_localpart in self.db_pool.simple_select_onecol_txn( | ||
| txn, | ||
| "finalised_delayed_events", | ||
| keyvalues={}, | ||
| retcol="DISTINCT(user_localpart)", | ||
| ): | ||
| self._prune_excess_finalised_delayed_events_for_user( | ||
| txn, user_localpart, retention_limit | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a N+1 query problem.
Can we do this in batches in one big query?
| -- See the GNU Affero General Public License for more details: | ||
| -- <https://www.gnu.org/licenses/agpl-3.0.html>. | ||
|  | ||
| CREATE TABLE finalised_delayed_events ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this separate table is creating the need for a lot of sub-queries (SELECT in SELECT statements) which seems like a smell.
MSC4140: finalised delayed events, and more
Dev notes
Delayed events initially introduced in #17326
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.