Skip to content

Conversation

@AndrewFerr
Copy link
Member

@AndrewFerr AndrewFerr commented Oct 10, 2025

MSC4140: finalised delayed events, and more

  • Store sent/cancelled/failed delayed events, i.e. finalised delayed events, and support looking them up to inspect whether a delayed event was sent or not. Set limits on how many finalised events to store.
  • Support looking up delayed events by ID
  • Return 200 when retrying the same action (send/cancel) on an already-finalised delayed event, or 409 for a conflicting action
  • Limit how many delayed events a user may have scheduled at a time

Dev notes

Delayed events initially introduced in #17326

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

- Store sent/cancelled/failed delayed events, i.e. finalised delayed
  events, and support looking them up to inspect whether a delayed event
  was sent or not. Set limits on how many finalised events to store.
- Support looking up delayed events by ID
- Return 200 when retrying the same action (send/cancel) on an
  already-finalised delayed event, or 409 for a conflicting action
- Limit how many delayed events a user may have scheduled at a time
@AndrewFerr AndrewFerr requested a review from a team as a code owner October 10, 2025 07:46
@@ -0,0 +1 @@
Add more support for MSC4140, namely the ability to inspect sent, cancelled, or failed delayed events, aka "finalised" delayed events.
Copy link
Contributor

@MadLittleMods MadLittleMods Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own reference, what is delayed events being used for?

I thought this was related to VoIP stuff (calls) and the new meta is with sticky events.

Are delayed events going to be deprecated in favor of sticky events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticky events are replacing not delayed events, but "owned state" i.e. MSC3757 / MSC3779.

Delayed events are still going to be used by MatrixRTC for scheduling cancellable "leave" events for disconnected clients.

https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/matrixRTC/proposals/4143-matrix-rtc.md#dependencies

@MadLittleMods MadLittleMods requested a review from a team October 22, 2025 17:23
@@ -0,0 +1 @@
Add more support for MSC4140, namely the ability to inspect sent, cancelled, or failed delayed events, aka "finalised" delayed events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there Complement changes to go along with this?

I see some TestDelayedEvents Complement tests that are failing: https://github.com/element-hq/synapse/actions/runs/18411628889/job/52465384686?pr=19038

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there Complement changes to go along with this?

No, not yet. I'll add some soon.

I see some TestDelayedEvents Complement tests that are failing

That's an unrelated failure, which has been flaky for a frustratingly long time. Maybe now is a good time to try to tackle it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem flaky. It's failed all 3 times for both SQLite and Postgres. And it's only TestDelayedEvents

)

# MSC4140: How many finalised delayed events to keep per user before deleting them.
self.msc4140_finalised_retention_limit = experimental.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.msc4140_finalised_retention_limit = experimental.get(
self.msc4140_finalised_per_user_retention_limit = experimental.get(

-- See the GNU Affero General Public License for more details:
-- <https://www.gnu.org/licenses/agpl-3.0.html>.

CREATE TABLE finalised_delayed_events (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the difference between delayed_events.is_processed and the state in finalised_delayed_events?

It looks like finalised_delayed_events holds more info but I'm not immediately seeing the difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_processed is for delayed events that have just timed out and are in the process of being sent / persisted to their target room's DAG. Only once a delayed event is successfully sent does it get finalised. (Prior to this PR, it would instead get deleted from the DB.)

The purpose of tracking is_processed is to handle the edge case of the server going down after a delayed event times out, but before it gets sent. Upon server restart, the sending of any is_processed delayed events will be retried.

I'm admittedly not a big fan of this, but I couldn't find a way to make an atomic action out of a delayed event timing out & being sent. I also didn't want to fiddle with that in this PR, even if finalised events now have some redundancy with is_processed events.

Comment on lines +17 to +19
error bytea,
event_id TEXT,
finalised_ts BIGINT NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put this in delayed_events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was to reduce the number of columns in the delayed_events table, as finalised_ts is relevant only to finalised events. In general, I wanted to keep any finalised-only columns out of the non-finalised delayed_events table, especially if more finalised-only properties get added later (which would allow a schema update to leave the non-finalised delayed_events table alone).

-- See the GNU Affero General Public License for more details:
-- <https://www.gnu.org/licenses/agpl-3.0.html>.

CREATE TABLE finalised_delayed_events (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a description on what finalised means: "delayed events that have either been sent, cancelled, or were not sent due to an error" (MSC4140)

Comment on lines +261 to +269
for user_localpart in self.db_pool.simple_select_onecol_txn(
txn,
"finalised_delayed_events",
keyvalues={},
retcol="DISTINCT(user_localpart)",
):
self._prune_excess_finalised_delayed_events_for_user(
txn, user_localpart, retention_limit
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a N+1 query problem.

Can we do this in batches in one big query?

-- See the GNU Affero General Public License for more details:
-- <https://www.gnu.org/licenses/agpl-3.0.html>.

CREATE TABLE finalised_delayed_events (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this separate table is creating the need for a lot of sub-queries (SELECT in SELECT statements) which seems like a smell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants