-
Notifications
You must be signed in to change notification settings - Fork 394
Validate the body of requests to /keys/upload
#17097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…validate-upload-keys-dict
…validate-upload-keys-dict
…validate-upload-keys-dict
As we are now well past Synapse 1.135. This was originally added in #17097
I really wanted to use a `UserID` instead of a `StrictStr` for the fields that contain a user ID. But this became too cumbersome due to the handler code wanting to directly json-encode the request body. As `UserID` does not subclass `str`, one would have to rebuild the entire containing object in order to json-encode it. Perhaps a future PR will do this, as it would allow us to validate UserID's more easily at the edge.
Some extra validation of the request body.
c43ca16
to
0eaf28f
Compare
This PR is now ready for re-review and has been rewritten using Pydantic to validate user input. Trial tests failing are expected until #18996 is merged. |
/keys/upload
There's little point in coverting the pydantic model back to a dict when we already have it via the parsed body.
# storing the result in the DB. There's little point in converted to a | ||
# parsed object and then back to a dict. | ||
body = parse_json_object_from_request(request) | ||
validate_json_object(body, self.KeyUploadRequestBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use parse_and_validate_json_object_from_request(...)
and pass in a concrete type to self.e2e_keys_handler.upload_keys_for_user(...)
Parse, don't validate (we shouldn't lose the type data after sussing it out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was already explained in the comment above:
synapse/synapse/rest/client/keys.py
Lines 232 to 238 in ee9768c
# Parse the request body. Validate separately, as the handler expects a | |
# plain dict, rather than any parsed object. | |
# | |
# Note: It would be nice to work with a parsed object, but the handler | |
# needs to encode portions of the request body as canonical JSON before | |
# storing the result in the DB. There's little point in converted to a | |
# parsed object and then back to a dict. |
Perhaps pragmatic and better than before so we can move forward with it ⏩
As a break-down of what upload_keys_for_user(...)
does with the data:
upload_keys_for_user
->upload_device_keys_for_user
->encode_canonical_json
upload_keys_for_user
->_upload_one_time_keys_for_user
, we manually iterate and re-encode anyway so a parsed object would be good hereupload_keys_for_user
->set_e2e_fallback_keys
->encode_canonical_json
Pydantic does support serialization so it wouldn't be that awkward if we just used a parsed object until we needed to serialize for encode_canonical_json
.
If we want to avoid the deserialization/serialization, we could use TypedDict
for these keys in the parsed object 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started on feeding a parsed object through, but the diff exploded in complexity; mainly from updating other methods that use upload_keys_for_user
.
I believe the usual pattern for these things is to:
- Have a model that represents the request model.
- Manipulate/extract the data into domain/internal class instances.
- Pick those apart and store individual attributes in the database as needed.
This endpoint was a bad example, but generally I think that's what we should follow.
if "device_keys" in body: | ||
# Validate the provided `user_id` and `device_id` fields in | ||
# `device_keys` match that of the requesting user. We can't do | ||
# this directly in the pydantic model as we don't have access | ||
# to the requester yet. | ||
# | ||
# TODO: We could use ValidationInfo when we switch to Pydantic v2. | ||
# https://docs.pydantic.dev/latest/concepts/validators/#validation-info | ||
if body["device_keys"]["user_id"] != user_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a check/validate methods to KeyUploadRequestBody
for this logic
Plays into wanting to use parse_and_validate_json_object_from_request(...)
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we can access user_id
and device_id
from the request object inside of a validation function?
With Pydantic v2, I'd update parse_and_validate_json_object_from_request
to extract certain information from the request and place it in the validation context that validation functions could use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anoadragon453 I was thinking of an extra utility method on KeyUploadRequestBody
key_upload_request.validate_keys_for_user_id(user_id)
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then we call that from the servlet function? That could work, I suppose... I'd love it if such a method was called automatically for us, but that's probably the best we can do sans upgrading Pydantic.
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Andrew Morgan <[email protected]> Co-authored-by: Andrew Morgan <[email protected]> Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Andrew Morgan <[email protected]> Co-authored-by: Andrew Morgan <[email protected]> Co-authored-by: Eric Eastwood <[email protected]>
@anoadragon453:
This PR adds Pydantic model-based validation to the
POST /_matrix/client/v3/keys/upload
endpoint. This prevents invalid request bodies from reaching the handler function and producing internal server errors.Requires #18996 before unit tests will pass.
I initially wanted to transform the request body model into some
attrs
-based domain objects, rather than the bare dicts we're passing around internally today. Alas, this made the diff explode in size, so I've reverted to just making use of thebody
s raw dict. It is validated though! The main issue was that deeper in the stack we attempt to encode a portion of the dict to canonical JSON, and trying to do this withattrs
objects was a nightmare.I had also hoped to include a new
UserIDType
in the Pydantic model, instead of usingStrictStr
. But this caused a lot of faffing about with convertingUserID
tostr
and back, and we don't even end up pulling any data out of the pydantic model anyhow. So I decided to ditch that.