Skip to content

Conversation

JosePineiro
Copy link

@JosePineiro JosePineiro commented Aug 22, 2025

This PR updates AsyncStaticWebHandler::handleRequest to correctly manage caching headers when template processing introduces dynamic content. The previous implementation always set ETag and Last-Modified headers based on the underlying filesystem metadata, which was misleading when templates generated non-static responses (e.g., current time). Issue #237

The new implementation:

  • Uses file-based ETag when .gz files or template callback is not present.
  • Ensures gzip files generate consistent ETag values using _getEtag.
  • For non-gzip files, generate ETag values ​​using the timestamp or file size.
  • Returns 304 Not Modified only when If-None-Match matches the valid ETag.
  • Removed use of Variable Length Array and String, ensuring the same implementation works consistently on all platforms.

This prevents browsers from incorrectly reusing cached content when template output is dynamic, ensuring correctness while still allowing efficient caching for static resources.

This PR updates AsyncStaticWebHandler::handleRequest to correctly manage caching headers when template processing introduces dynamic content. The previous implementation always set ETag and Last-Modified headers based on the underlying filesystem metadata, which was misleading when templates generated non-static responses (e.g., current time). Issue ESP32Async#237 (ESP32Async#237)

The new implementation:
- Uses file-based ETag when .gz files or template callback is not present.
- Ensures gzip files generate consistent ETag values using _getEtag.
- For non-gzip files, generate ETag values ​​using the timestamp or file size.
- Returns 304 Not Modified only when If-None-Match matches the valid ETag.
- Removed use of Variable Length Array and String, ensuring the same implementation works consistently on all platforms.

This prevents browsers from incorrectly reusing cached content when template output is dynamic, ensuring correctness while still allowing efficient caching for static resources.
Copilot

This comment was marked as outdated.

@mathieucarbou
Copy link
Member

@JosePineiro @me-no-dev @willmmiles : FYI the previous behavior was using as an etag value the last modification time or file size to speed up file serving and avoid too many file reads, especially on concurrent requests. The drawback is that this is incorrect as per what an etag should be, and also incorrect in regard to the callback option (templating).

This PR fixes that for gz file, keeps same behavior for non gz files and considers the callback value.

That's a good fix + improvement, even if at the expense of more file reading for gz files.

@mathieucarbou mathieucarbou requested a review from Copilot August 25, 2025 11:44
@mathieucarbou
Copy link
Member

Thanks a lot @JosePineiro 👍

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect ETag handling in AsyncStaticWebHandler to properly manage caching for dynamic template responses. The previous implementation always generated ETags based on file metadata, causing browsers to incorrectly cache dynamic content when templates were used.

  • Implements conditional ETag generation based on file type and template presence
  • Updates gzip file handling to extract ETag from CRC trailer data
  • Removes platform-specific Variable Length Array usage for better portability

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/literals.h Removes unused MJS file type constant and adds GZ_LEN constant for gzip file detection
src/WebHandlers.cpp Refactors handleRequest method to conditionally generate ETags and properly handle caching for template responses
src/ESPAsyncWebServer.h Adds AsyncStaticWebHandler as friend class to access private _getEtag method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@willmmiles
Copy link

@JosePineiro @me-no-dev @willmmiles : FYI the previous behavior was using as an etag value the last modification time or file size to speed up file serving and avoid too many file reads, especially on concurrent requests. The drawback is that this is incorrect as per what an etag should be, and also incorrect in regard to the callback option (templating).

This PR fixes that for gz file, keeps same behavior for non gz files and considers the callback value.

That's a good fix + improvement, even if at the expense of more file reading for gz files.

+1 for using internal checksums for GZ files. A future PR could maybe generalize this for other kinds of content with some getETagFor(file) type function.

I'm glad to see the handling of templated content is getting some attention too. The previous behaviour of treating it as static was definitely not ideal!

I do have some concerns though:

  • This PR completely overrides any "Cache-Control" set by the user for non-templated static content. While I agree that typical best practices with ETags would be to require an "If-None-Match" transaction to validate that things haven't changed, I can easily imagine that there could be a client application that preferred using caching to reduce the number of HTTP calls against the microcontroller. For rarely updated content, some potential jankiness when it does change might be seen as the lesser evil than all the extra network operations all the time. If a client had that previously configured, it will break with this PR. Do we want to insist that ETags are the only supported cache management approach for non-templated content?

  • This PR will break the semantics of setLastModified() when used with filesystems that don't support getLastWrite(). Some clever user might have tried to use setLastModified() in conjunction with the templating engine -- ie. track when variables used in the template are updated, and call setLastModified() to apply that timestamp to the response.

I think what I'd ask for is:

  • If the user has supplied a cache control policy, always send it. If not set, for fully static content with an ETag, send Cache-Control: none.
  • If the user has supplied a cache control policy, try to send Last-Modified. Use the value from setLastModified() if set. If not set, for static content only (gzip and otherwise!), use the filesystem timestamp. (If no value is available, don't send, though.)

@JosePineiro
Copy link
Author

JosePineiro commented Aug 26, 2025

@willmmiles
1 - I think "ETag" and "If-None-Match" are the most efficient and secure ways to control browser cache.
2 - If I'm not mistaken, "Last-Modified" and "If-Modified-Since" make exactly the same amount of http requests as "ETag" and "If-None-Match"
3 - If you agree with points 1 and 2, ETags should be the only supported cache management approach for non-template content.
4 - I haven't considered that any user has used LastModified() in conjunction with the template engine (i.e., tracking when variables used in the template are updated).
Is it okay to add _last_modified in this case?
5 - I'd like to include disabling the template processor if the files are binary (webm, jpg, avif, wolf2, etc.). What do you think?

@mathieucarbou
Copy link
Member

@JosePineiro : I think we all agree Etags are better but Will mentioned a good point in the fact that a user could decide to control themselves their caching based on their own rules.

And I can say that this is typically what was often happening when serving files because caching management in the lib is quite new.

So instead of forcing Etags, a better approach could be to read the user request headers and answer appropriately according to the spec in the best way we can

In funcion void AsyncStaticWebHandler::handleRequest(AsyncWebServerRequest *request) add Last-Modified header if no GZ file or have template processor.
Sugeestion of @willmmiles
@JosePineiro
Copy link
Author

I think the ETag system used in this PR is ideal for files without templates or GZIP. I can't think of any circumstances where a user would be harmed, even if they use a different system.

In template files, we can respect user input in a generic way. This way, the user would have the best of both worlds: an automatic ETagsystem when possible and a manual system controlled by the user in all other cases.

Handling ETag and Last-Modified simultaneously complicates and slows down the code. It also leads to potential conflicts whose resolution can be very obscure for the user.

If the current behavior doesn't seem right, please tell me what you think is the best options:

  • If the user defines Cache-Control or Last-Modified, disable ETag?
  • If the user defines Cache-Control or Last-Modified, add them to ETag?
    And when there is a match:
  • If If-Modified-Since is OK and ETag is bad, do we send a 304 or send the file?
  • If If-Modified-Since is bad and ETag is OK, do we send a 304 or send the file?

@willmmiles
Copy link

Re cache_control: The most critical use case for cache_control is if the user has set "max-age" with the intent to reduce the number of transactions on static content. I think this must continue to be respected for static content. For microcontrollers, re-validation is not always the best policy -- every transaction has a cost in CPU time and memory and can affect the performance of other tasks. I think this decision should be left in the hands of the application author.

Or to put it another way: sometimes it is preferable to sacrifice correctness in rare cases (ie. the cache may become stale when you perform a firmware update) for performance in the common case (everyday UI access).

So I think the logic should be:

  if (_cache_control.length()) {
     response->addHeader(T_Cache_Control, _cache_control.c_str(), false);
  } else if (*etag != '\0') {
     response->addHeader(T_Cache_Control, T_no_cache, true);
  }

Re Last-Modified: I agree that trying to juggle both "If-Match" and "If-Modified-Since" would be problematic. I do think we should find a way to respect setLastModified() in some way, though. Probably the easiest thing to do is include the user value in the ETag if it was set by the user -- that way we need only one code path. Since there's no getLastModified(), we could replace the local String _lastModified with, say, a uint64_t, and change all of the setLastModified() implementations to populate it with a hash or somesuch. And, also, if it was explicitly set by the user, we should check/send it even with templated content.

Does that make sense?

@JosePineiro
Copy link
Author

I think I understand what you're saying.
Suppose the user has defined Cache-Control: private, max-age=3600

We would send to the browser:
Cache-Control: private, max-age=3600
ETag: "abc123"

This way, the browser won't request the page for the next hour.
Once the hour has passed, the browser will send us;
If-None-Match: "abc123"

If the etag remains, we will send:
HTTP/1.1 304 Not Modified
ETag: "abc123"

What to do if Cache-Control contains "no-store" or "immutable"? In both cases, an "ETag" is pointless, but sending it won't cause any problems.

I still think that if we can use "ETag," we shouldn't send "Last-Modified". If we can't use "ETag," we should send "Last-Modified" if it's user-defined.

Does this seem like an acceptable solution to you?

@willmmiles
Copy link

willmmiles commented Aug 29, 2025

I think I understand what you're saying. Suppose the user has defined Cache-Control: private, max-age=3600

We would send to the browser: Cache-Control: private, max-age=3600 ETag: "abc123"

This way, the browser won't request the page for the next hour. Once the hour has passed, the browser will send us; If-None-Match: "abc123"

If the etag remains, we will send: HTTP/1.1 304 Not Modified ETag: "abc123"

What to do if Cache-Control contains "no-store" or "immutable"? In both cases, an "ETag" is pointless, but sending it won't cause any problems.

You got it! :)

I still think that if we can use "ETag," we shouldn't send "Last-Modified". If we can't use "ETag," we should send "Last-Modified" if it's user-defined.

Does this seem like an acceptable solution to you?

I think we can do better. If we send the user "last modified time" as the ETag, then we don't have to parse "If-Modified-Since". Since ETags have no particular required format, I believe we can get away with this simplification.

So code like:

char etag_buf[9];
char* etag = etag_buf;
const char* tempFileName = request->_tempFile.name();
const size_t lenFilename = strlen(tempFileName);

if (_last_modified.length()) {
   etag = _last_modified.c_str();
} else if (lenFilename > T__GZ_LEN && memcmp(tempFileName + lenFilename - T__GZ_LEN, T__gz, T__GZ_LEN) == 0) {
   ...

(above code is not const-correct, but conveys the idea, I hope)

If the user has defined them, "Last-Modified" and "Cache-Control" are always sent.
@JosePineiro
Copy link
Author

@willmmiles
The user-defined "Cache-Control" is always sent. If not, "no-cache" is sent. Thank you for pointing out this improvement to me.

Implemented as specified in RFC 7232, section 2.4:

  • SHOULD send an entity-tag validator unless it is not feasible to generate one. (In this function, whenever the file is GZ or we have not a Template Processor)
  • SHOULD send a Last-Modified value if it is feasible to send one. (In this function,whenever the file is GZ or we have not a Template Processor, and the user has defined it)

In RFC 7232, section 3.3:
"A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field"
I have implemented exactly this behavior, since not following this instruction could be considered a bug.

Please note that "entity-tag" is ALWAYS sent when "Last-Modified" is sent.
Therefore, in a modern browser capable of handling "entity-tag," I can't think of any circumstances under which "Last-Modified" would have any effect.
In any case, I've implemented the code to validate "Last-Modified" as you requested.

Note for @willmmiles:
Please, answer to #220

Copy link

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more little things. Thanks for your perseverence on this, I'm excited to see this merged.

@mathieucarbou
Copy link
Member

@JosePineiro : could you please tell us if you are still motivated in finishing your PR ? Or if one of us should take over it ? Thanks.

willmmiles and others added 7 commits September 27, 2025 19:05
Now that templates are detected by serveStatic, update the example
with the full set of cases.
The underlying objects are all Arduino Strings; leverage that to
simplify the code.
This allows caching and validation to happen if users are manually
managing this value, as demonstrated in the example.
Copy link
Member

@mathieucarbou mathieucarbou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new way and examples.
Also easy to follow and understand!

I just wonder if the function call should even be simplified a bit more ;-)

@willmmiles
Copy link

I do want to point out that accepting the approach in this PR does have a breaking semantic change relevant to existing code, if it wasn't clear from the changes to the examples:

Excerpt from the current example code

  // Serve the static template with a template processor
  //
  // ServeStatic static is used to serve static output which never changes over time.
  // This special endpoints automatically adds caching headers.
  // If a template processor is used, it must ensure that the outputted content will always be the same over time and never changes.
  // Otherwise, do not use serveStatic.
  // Example below: IP never changes.
  //
  // curl -v http://192.168.4.1/index.html
  server.serveStatic("/index.html", LittleFS, "/template.html").setTemplateProcessor([](const String &var) -> String {
    if (var == "USER") {
      return "Bob";
    }
    return emptyString;
  });

With this PR, the code above will not generate caching headers anymore. To get those caching headers back, the new logic requires setLastModified() once at setup time. This is really what kicked off updating the examples to reflect the new logic.

Personally I prefer the approach from this PR -- allow serveStatic to work for the majority of templating cases, with a single extra call to get static caching -- but I feel obligated to point out that it is an API change, and anyone who had followed the previous instructions will find their calls are no longer cached.

@mathieucarbou
Copy link
Member

I do want to point out that accepting the approach in this PR does have a breaking semantic change relevant to existing code, if it wasn't clear from the changes to the examples:

Excerpt from the current example code

  // Serve the static template with a template processor
  //
  // ServeStatic static is used to serve static output which never changes over time.
  // This special endpoints automatically adds caching headers.
  // If a template processor is used, it must ensure that the outputted content will always be the same over time and never changes.
  // Otherwise, do not use serveStatic.
  // Example below: IP never changes.
  //
  // curl -v http://192.168.4.1/index.html
  server.serveStatic("/index.html", LittleFS, "/template.html").setTemplateProcessor([](const String &var) -> String {
    if (var == "USER") {
      return "Bob";
    }
    return emptyString;
  });

With this PR, the code above will not generate caching headers anymore. To get those caching headers back, the new logic requires setLastModified() once at setup time. This is really what kicked off updating the examples to reflect the new logic.

Personally I prefer the approach from this PR -- allow serveStatic to work for the majority of templating cases, with a single extra call to get static caching -- but I feel obligated to point out that it is an API change, and anyone who had followed the previous instructions will find their calls are no longer cached.

It makes total sense to me, and I would even say that this is a bug fix from previous behavior : no cache header should be set by default when we have a template. This can lead to more issues than benefits.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mathieucarbou mathieucarbou merged commit 30b878d into ESP32Async:main Oct 2, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

setTemplateProcessor() causes incorrect ETag and Last-Modified headers to be returned for static files

3 participants