-
Notifications
You must be signed in to change notification settings - Fork 417
Fix sliding sync performance slow down for long lived connections. #19206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
We then filter them out before sending to the client, but it is unnecessary to do so and interferes with later changes.
This is so that clients know if they can use a cached `/members` response or not.
f67e114 to
0d6ccbe
Compare
This ensures that the set of required state doesn't keep growing as we add and remove member state. We then only load them from the DB when needed, rather than all state for all rooms when we get a request.
It was thinking the table name was `IN`, as it matched `connection_positi(on IS) NULL`.
0d6ccbe to
4984858
Compare
MadLittleMods
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully onboarded onto the concept and details to be confident in the approach.
synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql
Outdated
Show resolved
Hide resolved
| SCHEMA_FILE_REGEX = re.compile(r"^synapse/storage/schema/(.*)/delta/(.*)/(.*)$") | ||
| INDEX_CREATION_REGEX = re.compile( | ||
| r"CREATE .*INDEX .*ON ([a-z_0-9]+)", flags=re.IGNORECASE | ||
| r"CREATE .*INDEX .*ON ([a-z_0-9]+)\s+\(", flags=re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What edge case does this change solve? It doesn't seem relevant to the new indexes being added.
It looks like this just matches some extra faff at the end for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message briefly mentions it, but the regex was matching against connection_position IS NULL and thinking the table name was IN (as position ends in "on") due to the greedy nature of wildcards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superseded by #19224
| Attributes: | ||
| required_state_map_change: The updated required state map to store in | ||
| the room config, or None if there is no change. | ||
| added_state_filter: The state filter to use to fetch any additional | ||
| current state that needs to be returned to the client. | ||
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. | ||
| lazy_members_invalidated: The set of user IDs whose membership has | ||
| changed but we didn't send down, so we need to invalidate them from | ||
| the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the standard way for the docstring but some of these attributes are a bit tricky and I'd rather see the docstring when I hover the properties.
Potential to convert to the """ variant below the property itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really annoying that VSCode doesn't pick these up, it does for functions.
I'm not sure about moving style, especially since its a bit confusing that the docstring for attributes go after the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does for functions because the spec says it should do that.
For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)
String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__method are called “attribute docstrings”.
And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)
A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:
| # Limit the number of state_keys we should remember sending down the connection | ||
| # for each (room_id, user_id). We don't want to store and pull out too much data | ||
| # in the database. This is a happy-medium between remembering nothing and | ||
| # everything. We can avoid sending redundant state down the connection most of | ||
| # the time given that most rooms don't have 100 members anyway and it takes a | ||
| # while to cycle through 100 members. | ||
| # | ||
| # Only remember up to (MAX_NUMBER_PREVIOUS_STATE_KEYS_TO_REMEMBER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this whole sliding_sync_connection_lazy_members refactor really help things?
We were already supposedly limiting the amount of data that we remembered so there was a cap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few intertwined issues:
- The code assumes that the total
required_stateis small, as it reads everything on receiving a request. This makes it much easier to check for changes and reduces DB roundtrips - To do the above we do deduping of
required_statebetween rooms, which requires reading everything when persisting changes torequired_state. - Lazy loaded members stop the deduping from happening, and so the total
required_stateis much larger than expected.
By storing the lazy loaded members separately and only fetching them when we have data to send fixes the above issues. Alternatively, we could have tried to refactor things to avoid pulling out all the required_state on startup, but given the need to check for changes on required_state plus the fact that lazy loading is a special case anyway, I think this approach of storing the lazy members in a separate table is easier and clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to sync with you on this to understand properly.
I've added a meeting to the calendar on Monday (feel free to change).
| else: | ||
| # For non-limited timelines we always return all | ||
| # membership changes. This is so that clients | ||
| # who have fetched the full membership list | ||
| # already can continue to maintain it for | ||
| # non-limited syncs. | ||
| # | ||
| # This assumes that for non-limited syncs there | ||
| # won't be many membership changes that wouldn't | ||
| # have been included already (this can only | ||
| # happen if membership state was rolled back due | ||
| # to state resolution anyway). | ||
| required_state_types.append((EventTypes.Member, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bigger behavioral change.
I think this fixes #18782 🤔 - If so, we should add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, did mean to factor that out but it sneaked in as it needs to be accounted for in the lazy loading stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added with test_lazy_load_state_reset ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this only fixes it for non-limited syncs. I think we should also return state reset membership in limited timeline scenarios as well.
We should at-least leave a FIXME with a link to the issue in the if-block above.
…iously_returned in tests
Co-authored-by: Eric Eastwood <[email protected]>
When fetching previously sent lazy members we didn't filter by room, which meant that we didn't send down member events in a room if we'd previously sent that user's member event in another room.
fe94608 to
ec45e00
Compare
6c2cf0d to
2e844aa
Compare
| SCHEMA_FILE_REGEX = re.compile(r"^synapse/storage/schema/(.*)/delta/(.*)/(.*)$") | ||
| INDEX_CREATION_REGEX = re.compile( | ||
| r"CREATE .*INDEX .*ON ([a-z_0-9]+)", flags=re.IGNORECASE | ||
| r"CREATE .*INDEX .*ON ([a-z_0-9]+)\s+\(", flags=re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superseded by #19224
|
|
||
| # How often to update the last seen timestamp for lazy members. We don't want to | ||
| # update it too often as that causes DB writes. | ||
| LAZY_MEMBERS_UPDATE_INTERVAL_MS = ONE_HOUR_SECONDS * MILLISECONDS_PER_SECOND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs updating now that #19223 was merged
synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql
Outdated
Show resolved
Hide resolved
| Attributes: | ||
| required_state_map_change: The updated required state map to store in | ||
| the room config, or None if there is no change. | ||
| added_state_filter: The state filter to use to fetch any additional | ||
| current state that needs to be returned to the client. | ||
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. | ||
| lazy_members_invalidated: The set of user IDs whose membership has | ||
| changed but we didn't send down, so we need to invalidate them from | ||
| the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does for functions because the spec says it should do that.
For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)
String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__method are called “attribute docstrings”.
And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)
A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:
| When persisting this *only* contains updates to the state. | ||
| The `room_lazy_membership` field is only used when persisting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The `room_lazy_membership` field is only used when persisting. | |
| The `room_lazy_membership` field is only used when persisting (not reading from the database). |
| StateFilter.from_types([(EventTypes.Member, "@user4:test")]), | ||
| # Previous request did not include any explicit members, | ||
| # so nothing to store. | ||
| lazy_members_previously_returned=frozenset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we see @user4:test here? We're going to send @user4:test in this sync request. And the relevant code:
synapse/synapse/handlers/sliding_sync/__init__.py
Lines 1747 to 1755 in 65aebf4
| # We remember the user if either a) they haven't been | |
| # invalidated... | |
| if (EventTypes.Member, state_key) not in state_deltas: | |
| lazy_members_previously_returned.add(state_key) | |
| # ...or b) if we are going to send the delta down in this | |
| # sync. | |
| if state_key in lazy_load_user_ids: | |
| lazy_members_previously_returned.add(state_key) |
Also applies to expected_without_state_deltas
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the usage below, this also includes members that will be returned in the current sync:
# ...or b) if we are going to send the delta down in this
# sync.
Perhaps needs a better name as well as updating the description here.
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy_members_previously_returned doesn't appear to be used anywhere but tests.
Is it useful to assert there? We should remove or rename to test_only_xxx
tests/handlers/test_sliding_sync.py
Outdated
| }, | ||
| request_required_state_map={EventTypes.Member: {StateValues.LAZY}}, | ||
| previously_returned_user_state=frozenset(), | ||
| required_user_state={"@user3:test"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, we're talking about state_key_expand_lazy_keep_previous_memberships.
In the previous version of this test, we never requested @user3:test but now it appears as lazy_load_user_ids={"@user3:test"},. Or put another way, lazy_load_user_ids={"@user3:test"}, came out of thin air
This is a new test variant that should be added instead of edited into an existing test.
(still relevant)
| previous_required_state_map={EventTypes.Member: {StateValues.LAZY}}, | ||
| request_required_state_map={EventTypes.Member: {"@user4:test"}}, | ||
| previously_returned_lazy_user_ids={"@user2:test", "@user3:test"}, | ||
| lazy_load_user_ids={"@user4:test"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request is no longer lazy, why is this here?
Fixes #19175
This PR moves tracking of what lazy loaded membership we've sent to each room out of the required state table. This avoids that table from continuously growing, which massively helps performance as we pull out all matching rows for the connection when we receive a request.
The new table is only read when we have data in a room to send, so we end up reading a lot fewer rows from the DB. Though we now read from that table for every room we have events to return in, rather than once at the start of the request.
For an explanation of how the new table works, see the comment on the table schema.
The table is designed so that we can later prune old entries if we wish, but that is not implemented in this PR.
Reviewable commit-by-commit.