Skip to content

Conversation

@aseigo
Copy link
Contributor

@aseigo aseigo commented Dec 8, 2025

As the Fetch API does not expose HTTP trailers to the Javascript runtime, grpcweb mandates that tailers are included in the message payload with the most-significant bit of the leading byte (flags) set to 1 followed by a run-length-encoded block of text that follows the same formatting as normal headers.

Most extant grpcweb libraries written in JS/TS are lenient about this and will happily forego receiving trailers. However, some are more picky about this and REQUIRE trailers (the buf.read Connect libraries are an example of this).

With this changset:

GRPC.Server follows the spec when sending protos over grpcweb, allowing errors and other custom trailers to be sent in a way that is visible to the client.

GRPC.Message recognizes trailers and parses them appropriately: it extracts partial-buffer messages using the run-length encoding bytes (which it was previously quietly ignoring, which would also allow malformed buffers due to e.g. network problems sneak through anyways), it respects the trailers flag, and returns appropriate data in each of these cases.

The GRPC client now also works with embedded trailers.

Overhead for non-grpcweb should be nominal as new code paths are hidden behind grpcweb checks, while the additional binary checks are placed in front the error paths (so errors may be nominally slower to be reached, but the happy paths should be untouched).

This has been tested with both Google's own grpc-web lbirary as well as buf.build's connect/connect-web libraries with a real-world API being served by elixir-grpc's grpc libraries.

This does need more testing (what doesn't!), and there are some decisions made in the details of the code that could be discussed.

Copy link
Contributor

@polvalente polvalente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR generally looks good, but I wanna defer to @sleipnir. I'm not sure if we should add an option for accepting those trailers only when we're parsing grpcweb.

Also, I feel like grpc_server could use a new test or test change too.

{<<1, 2, 3, 4, 5, 6, 7, 8>>, <<>>}
"""
@spec from_data(binary) :: binary
@spec from_data(binary) :: {message :: binary, rest :: binary}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change. I don't think it impacts as much if we merge before releasing 1.0, but we need to keep this in mind.

data
|> String.split("\r\n")
|> Enum.reduce(%{}, fn line, acc ->
[k, v] = String.split(line, ":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[k, v] = String.split(line, ":")
[k, v] = String.split(line, ":", parts: 2)

Otherwise this can raise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 054f370

@sleipnir
Copy link
Collaborator

sleipnir commented Dec 8, 2025

Hi @aseigo, thank you for this PR.

I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and therefore I need to evaluate the least effort merge method to proceed. Give me some time to analyze this.

I think it's worthwhile to run the benchmark against this PR as well and measure how much it adds to the hot path. Perhaps an optional feature, as Paulo suggested, would be interesting to ensure optimal performance in all cases, but I haven't evaluated this in depth, just a suggestion.

That said, I will take a closer look, as well as the other PRs in the queue this week, and I will get back to you soon with my opinions.

Thank you again, and we'll talk soon.

@aseigo
Copy link
Contributor Author

aseigo commented Dec 8, 2025

I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and

Oh, nice! Out of curiosity: Will it replace the current cowboy-based adapter, or is it using another webstack altogether (e.g. bandit)?

I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!

@sleipnir
Copy link
Collaborator

sleipnir commented Dec 8, 2025

I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and

Oh, nice! Out of curiosity: Will it replace the current cowboy-based adapter, or is it using another webstack altogether (e.g. bandit)?

I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!

I explain more here #482

@sleipnir
Copy link
Collaborator

sleipnir commented Dec 8, 2025

I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!

I don't know if I'm doing a much better job hahaha... let me know your opinion. 😄

@aseigo
Copy link
Contributor Author

aseigo commented Dec 8, 2025

Benchmarks incoming!

With this PR:

Total requests: 100000
Total time: 24.54 seconds
Requests per second: 4075.05
Average latency: 0.245 ms

Total requests: 100000
Total time: 24.876 seconds
Requests per second: 4019.95
Average latency: 0.249 ms

Total requests: 100000
Total time: 24.461 seconds
Requests per second: 4088.14
Average latency: 0.245 ms

From upstream master branch:

Total requests: 100000
Total time: 24.748 seconds
Requests per second: 4040.76
Average latency: 0.247 ms

Total requests: 100000
Total time: 24.49 seconds
Requests per second: 4083.22
Average latency: 0.245 ms

Total requests: 100000
Total time: 24.708 seconds
Requests per second: 4047.28
Average latency: 0.247 ms

This is quite repeatable, with the times between different runs being within c.a. ~1% of each other.

@polvalente
Copy link
Contributor

Benchmarks incoming!

With this PR:

Total requests: 100000
Total time: 24.54 seconds
Requests per second: 4075.05
Average latency: 0.245 ms

Total requests: 100000
Total time: 24.876 seconds
Requests per second: 4019.95
Average latency: 0.249 ms

Total requests: 100000
Total time: 24.461 seconds
Requests per second: 4088.14
Average latency: 0.245 ms

From upstream master branch:

Total requests: 100000
Total time: 24.748 seconds
Requests per second: 4040.76
Average latency: 0.247 ms

Total requests: 100000
Total time: 24.49 seconds
Requests per second: 4083.22
Average latency: 0.245 ms

Total requests: 100000
Total time: 24.708 seconds
Requests per second: 4047.28
Average latency: 0.247 ms

This is quite repeatable, with the times between different runs being within c.a. ~1% of each other.

Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.

@aseigo
Copy link
Contributor Author

aseigo commented Dec 8, 2025

I explain more here #482

Wow, that is a crazy amount of work, but it's clearly paying off! I haven't tested the PR (yet!) but have skimmed the code (and read more interesting parts with a bit more care), and so far it looks really nice. A bit unfortunate to have to implement an http2 stack, but I can see how it makes sense in this case, given how this is an absolutely core part of this framework of libraries.

I'm a big fan of thousand_island, always impressed by the performance of it given it is straight Elixir code. <3

In any case, I can see how merging in the (frankly annoying) grpcweb trailers support and your work can be a bit of a chore. Happy to see this go in in whatever order makes sense to you. IMHO the new adapter has clear priority given it stands to provide a significant performance improvement, and would be the foundation for "features" needed by e.g. grpcweb

@aseigo
Copy link
Contributor Author

aseigo commented Dec 8, 2025

Also, I feel like grpc_server could use a new test or test change too.

A small related comment:

The existing tests for GRPCWeb exercise these code paths and do catch when they fail. I can add a few more tests for variations (preferably once we've decided on the final shape of things so as to test actual code that may be merged :) ), but I was actually able to use the existing tests to drive this towards a working state.

In fact, once the tests were passing, it all Just Worked(tm), first time of trying, with the buf..build Connect libraries.

Kudos to everyone who's worked on them as they made my effort here a lot easier!

@aseigo
Copy link
Contributor Author

aseigo commented Dec 10, 2025

Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.

That's a good question.

It would probably mean having to add another param to GRPC.Message.get_message/1 (e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?

@aseigo aseigo requested a review from polvalente December 10, 2025 15:10
@polvalente
Copy link
Contributor

Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.

That's a good question.

It would probably mean having to add another param to GRPC.Message.get_message/1 (e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?

Yeah, I'm leaning towards that as well. @sleipnir do you agree?

@sleipnir
Copy link
Collaborator

Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.

That's a good question.
It would probably mean having to add another param to GRPC.Message.get_message/1 (e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?

Yeah, I'm leaning towards that as well. @sleipnir do you agree?

As long as there are tests that validate the behavior of a grpc server receiving grpc-web requests and not failing, that is, as long as the behaviors are validated, then it's fine to proceed with a single approach.

Copy link
Collaborator

@sleipnir sleipnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply.
LGTM

@sleipnir
Copy link
Collaborator

@aseigo Thanks for the PR, excellent work.

I'm just still checking which PR will go into the main branch first.

@aseigo
Copy link
Contributor Author

aseigo commented Dec 15, 2025

Working on another fix for GRPC errors (general problem, not related to webrpc), I refactored this a small amount, pulling the grpcweb trailer handling into its own function. Same code, no functional changes, just a bit cleaner and easier to work around. Hopefully it will also make it a bit clearer what would need to be done in the new adapter as well?

Cheers!

@aseigo aseigo mentioned this pull request Dec 16, 2025
aseigo and others added 6 commits December 16, 2025 18:26
As the Fetch API does not expose HTTP trailers to the Javascript runtime,
grpcweb mandates that tailers are included in the message payload with
the most-significant bit of the leading byte (flags) set to 1.

What follows is a run-length-encoded block of text that follows the same
formatting as normal headers.

Most extant grpcweb libraries written in JS/TS are lenient about this
and will happily forego receiving trailers. However, some are more
picky about this and REQUIRE trailers (the buf.read connect libraries
are an example of this).

GRPC.Server follows the spec when sending protos over grpcweb, allowing
errors and other custom trailers to be sent in a way that is visible to
the client.

GRPC.Message also now recognizes trailers and parses them appropriately:
it extracts partial-buffer messages using the run-length encoding bytes
(which it was previously quietly ignoring, which would also allow
malformed buffers due to e.g. network problems sneak through anwyays),
it respects the trailers flag, and returns appropriate data in each of
these cases.

The GRPC client now also works with embedded trailers.

Overhead for non-grpcweb should be nominal as new code paths are hidden
behind grpcweb checks, while the additional binary checks are placed in
front the error paths (so errors may be nominally slower to be reached,
but the happy paths should be untouched).
Reads cleaner, perhaps a bit more idiomatic as well.
This *may* require that the status was already sent, as well as the
state passed to `send_error_trailers` to decided whether or not to
send grpcweb trailers.

And so:

* `send_error_trailers` now uses `check_sent_resp` instead of doing that
  check itself
* `check_sent_resp` now takes an optional `status`, with 200 as the
  default
* state is passed to `send_error_trailers`
* `send_error_trailers` calls `stream_grpcweb_trailers` before
  `cowboy_req.stream_trailers` (which closes the connection as trailers
  implies fin)
@aseigo aseigo force-pushed the feature/grpcweb-trailers-in-message branch from 590cb5b to a3539ed Compare December 16, 2025 17:26
@byhemechi
Copy link

Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.

Appreciate that hard work you guys put in presumably for free 🙏, please don't take this as a "hurry up" message

@aseigo
Copy link
Contributor Author

aseigo commented Dec 16, 2025

Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.

I don't know when it will be merged exactly (that's someone else's decision), but I have some work to do on this tomorrow (a pair of tests are not passing), and then it should hopefully be ready.

If the pressing need is for development envs, you can always include my fork with this branch as a dependency in the short term until this does get merged. That is what I am currently doing, in fact (also working with the buf.build connect libs)

@sleipnir
Copy link
Collaborator

Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.

Appreciate that hard work you guys put in presumably for free 🙏, please don't take this as a "hurry up" message

Once the tests pass, We will merge it.

@aseigo, let me know if I can help with anything else. I'll handle merging this later in the PR for the new adapter.

@polvalente
Copy link
Contributor

Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.

I don't know when it will be merged exactly (that's someone else's decision), but I have some work to do on this tomorrow (a pair of tests are not passing), and then it should hopefully be ready.

If the pressing need is for development envs, you can always include my fork with this branch as a dependency in the short term until this does get merged. That is what I am currently doing, in fact (also working with the buf.build connect libs)

If the PR is itself ready, you can always use a git dependency in Mix: {:grpc, github: "aseigo", branch: "feature/grpcweb-trailers-in-message"} or :ref instead of :branch if you want to point to a specific commit hash.

This would work while we don't merge, but rest assured that this will be merged eventually. We just have a few other things external to this PR going on that we need to sort out before merging this.

@aseigo
Copy link
Contributor Author

aseigo commented Dec 16, 2025

let me know if I can help with anything else

Thanks, I appreciate it! I think it's all good, though ... I just need to work out those last two tests, which I will do tomorrow. :)

@byhemechi
Copy link

byhemechi commented Dec 16, 2025

If the PR is itself ready, you can always use a git dependency in Mix: {:grpc, github: "aseigo", branch: "feature/grpcweb-trailers-in-message"} or :ref instead of :branch if you want to point to a specific commit hash.

Yep, that's what I've been doing, was just wondering about the package because the reset of my team is very new to elixir and I don't trust them to understand how git deps work

Strangely, this gives me a bunch of dependency errors when I compile deps, everything seems to work just fine if I manually add the package subdirs. Almost certainly a layer 8 issue on my end and if things merge soon I won't put any thought into it.

Thanks for the quick replies :)

when the request isn't a grpcweb request, then it's business as usual.
otherwise, the grpcweb trailers must be sent first, as they may cause a
body to be sent.

only after checking for grpcweb can the regular trailers be sent once
the state of the req is confirmed, namely whether or not a reply has
been started already or if a full reply must be initiated.
@aseigo aseigo requested a review from sleipnir December 17, 2025 13:23
@aseigo
Copy link
Contributor Author

aseigo commented Dec 17, 2025

Ok, all tests are passing, and the errors are appearing both with native and grpcweb clients for me, including the improvements @sleipnir pushed the other day to make the {:error, ...} tuple return style work.

This is ready for a final review! :)

@sleipnir
Copy link
Collaborator

Ok, all tests are passing, and the errors are appearing both with native and grpcweb clients for me, including the improvements @sleipnir pushed the other day to make the {:error, ...} tuple return style work.

This is ready for a final review! :)

Thank you @aseigo
@polvalente and I will discuss the best way to proceed, thank you again for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants