Skip to content

Conversation

S7evinK
Copy link
Contributor

@S7evinK S7evinK commented Apr 16, 2024

@anoadragon453:

This PR adds Pydantic model-based validation to the POST /_matrix/client/v3/keys/upload endpoint. This prevents invalid request bodies from reaching the handler function and producing internal server errors.

Requires #18996 before unit tests will pass.


I initially wanted to transform the request body model into some attrs-based domain objects, rather than the bare dicts we're passing around internally today. Alas, this made the diff explode in size, so I've reverted to just making use of the bodys raw dict. It is validated though! The main issue was that deeper in the stack we attempt to encode a portion of the dict to canonical JSON, and trying to do this with attrs objects was a nightmare.

I had also hoped to include a new UserIDType in the Pydantic model, instead of using StrictStr. But this caused a lot of faffing about with converting UserID to str and back, and we don't even end up pulling any data out of the pydantic model anyhow. So I decided to ditch that.

@S7evinK S7evinK requested a review from a team as a code owner April 16, 2024 15:00
@S7evinK S7evinK requested a review from a team November 22, 2024 08:51
@CLAassistant
Copy link

CLAassistant commented Mar 23, 2025

CLA assistant check
All committers have signed the CLA.

anoadragon453 added a commit that referenced this pull request Sep 29, 2025
As we are now well past Synapse 1.135. This was originally added in #17097
I really wanted to use a `UserID` instead of a `StrictStr` for the fields that contain a user ID. But this became too
cumbersome due to the handler code wanting to directly json-encode the request body. As `UserID` does not subclass
`str`, one would have to rebuild the entire containing object in order to json-encode it. Perhaps a future PR will do this,
as it would allow us to validate UserID's more easily at the edge.
Some extra validation of the request body.
@anoadragon453 anoadragon453 force-pushed the s7evink/validate-upload-keys-dict branch from c43ca16 to 0eaf28f Compare September 30, 2025 17:33
@anoadragon453 anoadragon453 requested a review from a team September 30, 2025 17:34
@anoadragon453
Copy link
Member

anoadragon453 commented Sep 30, 2025

This PR is now ready for re-review and has been rewritten using Pydantic to validate user input.

Trial tests failing are expected until #18996 is merged.

@anoadragon453 anoadragon453 changed the title Ensure that uploaded keys are dicts Validate the body of requests to /keys/upload Oct 1, 2025
# storing the result in the DB. There's little point in converted to a
# parsed object and then back to a dict.
body = parse_json_object_from_request(request)
validate_json_object(body, self.KeyUploadRequestBody)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use parse_and_validate_json_object_from_request(...) and pass in a concrete type to self.e2e_keys_handler.upload_keys_for_user(...)

Parse, don't validate (we shouldn't lose the type data after sussing it out)

Copy link
Contributor

@MadLittleMods MadLittleMods Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was already explained in the comment above:

# Parse the request body. Validate separately, as the handler expects a
# plain dict, rather than any parsed object.
#
# Note: It would be nice to work with a parsed object, but the handler
# needs to encode portions of the request body as canonical JSON before
# storing the result in the DB. There's little point in converted to a
# parsed object and then back to a dict.

Perhaps pragmatic and better than before so we can move forward with it ⏩


As a break-down of what upload_keys_for_user(...) does with the data:

  • upload_keys_for_user -> upload_device_keys_for_user -> encode_canonical_json
  • upload_keys_for_user -> _upload_one_time_keys_for_user, we manually iterate and re-encode anyway so a parsed object would be good here
  • upload_keys_for_user -> set_e2e_fallback_keys -> encode_canonical_json

Pydantic does support serialization so it wouldn't be that awkward if we just used a parsed object until we needed to serialize for encode_canonical_json.

If we want to avoid the deserialization/serialization, we could use TypedDict for these keys in the parsed object 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started on feeding a parsed object through, but the diff exploded in complexity; mainly from updating other methods that use upload_keys_for_user.

I believe the usual pattern for these things is to:

  • Have a model that represents the request model.
  • Manipulate/extract the data into domain/internal class instances.
  • Pick those apart and store individual attributes in the database as needed.

This endpoint was a bad example, but generally I think that's what we should follow.

Comment on lines +273 to +281
if "device_keys" in body:
# Validate the provided `user_id` and `device_id` fields in
# `device_keys` match that of the requesting user. We can't do
# this directly in the pydantic model as we don't have access
# to the requester yet.
#
# TODO: We could use ValidationInfo when we switch to Pydantic v2.
# https://docs.pydantic.dev/latest/concepts/validators/#validation-info
if body["device_keys"]["user_id"] != user_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a check/validate methods to KeyUploadRequestBody for this logic

Plays into wanting to use parse_and_validate_json_object_from_request(...) as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we can access user_id and device_id from the request object inside of a validation function?

With Pydantic v2, I'd update parse_and_validate_json_object_from_request to extract certain information from the request and place it in the validation context that validation functions could use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anoadragon453 I was thinking of an extra utility method on KeyUploadRequestBody key_upload_request.validate_keys_for_user_id(user_id), etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we call that from the servlet function? That could work, I suppose... I'd love it if such a method was called automatically for us, but that's probably the best we can do sans upgrading Pydantic.

anoadragon453 and others added 2 commits October 7, 2025 10:56
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
@anoadragon453 anoadragon453 enabled auto-merge (squash) October 7, 2025 10:15
@anoadragon453 anoadragon453 disabled auto-merge October 7, 2025 10:27
@anoadragon453 anoadragon453 merged commit 42bbff8 into develop Oct 7, 2025
42 of 44 checks passed
@anoadragon453 anoadragon453 deleted the s7evink/validate-upload-keys-dict branch October 7, 2025 10:27
anoadragon453 added a commit that referenced this pull request Oct 7, 2025
Co-authored-by: Andrew Morgan <[email protected]>
Co-authored-by: Andrew Morgan <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
anoadragon453 added a commit that referenced this pull request Oct 7, 2025
Co-authored-by: Andrew Morgan <[email protected]>
Co-authored-by: Andrew Morgan <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants