-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[focus-without-user-activation] Allow focus if a descendant has focus #11519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[focus-without-user-activation] Allow focus if a descendant has focus #11519
Conversation
|
Thanks @dandclark! |
|
@annevk friendly ping on this PR, I'd appreciate if you could take a look when you have some time :) |
|
I don't understand this. Let's say you have top-level page A, hosting iframe B, which in turn hosts iframe C. Top-level page A has decided not to allow iframe B to focus. So code in iframe B which calls But then iframe B can work around this, whenever it or its children have focus? Why do we let iframe B override the wishes of top-level page A in this way? Can you give a realistic example of when this is desired? I read through both w3c/webappsec-permissions-policy#273 (comment) and #10672 and cannot find any motivation for, or agreement on, this change. The closest is the resolution to "allow parent frame programmatically set focus into child iframe", but that is not what this PR does. This PR lets the child frame override the parent frame's wishes; it doesn't allow the parent frame to focus the child. |
|
@domenic thanks for having a look! I've been reading all the old discussions, let me see if I'm misunderstanding the intended behavior: Let's say we have top-level frame A, hosting iframe B, which hosts iframe C, and B and C have the policy denied. And let's say A moves focus to B. Once B has focus I think it makes sense for B to be able to move focus inside itself as it wants because it's not "stealing" focus from its parent or other frames anymore, right? I feel like a realistic example of this could be any webpage that moves focus from one element to another with Just to further clarify some behaviors, I have a PR in review to update the explainer here w3c/webappsec-permissions-policy#574 in which I try to capture all corner cases and old discussions with some pseudocode: Let me know if we're more or less on the same page about this. By the way, I also realized my change here is not right, it doesn't work for the case where A hosts iframes B and C (they're siblings), focus is on B, and A tries to focus C instead. I think A should be able to do that but, according to the spec, C is |
I'm not sure. It depends on the original intent of the proposal. If it was to prevent malicious third-party frames from moving the user's focus around without user activation, then just the fact that it got focus once is not a good license for allowing further focus movements. But, if the intent is some sort of belief that once the user has given user activation a single time, that proves the subframe trustworthy, then maybe it is OK. I also think there's a significant difference between allowing a frame to move focus within itself, and allowing it to move focus within child iframes. Especially child iframes which the parent frame has explicitly disallowed. That gives another workaround. E.g. consider the permissions policy "allow focus-without-user-activation from all sites except Do you know of specific sites that need these changes to the current policy? Otherwise, I think being more conservative might make sense. |
|
@domenic thanks for your comments! I'll reply inline.
I think the main idea for the policy was to "prevent frames from stealing focus without the user noticing or without the user's consent". w3c/webappsec-permissions-policy#273 (comment)
Is this example actually possible? Permissions policies work with whitelists instead of forbidden lists, right? As in
I'm not aware of specific sites that I could cite, but seems to me that this could be breaking any site that moves focus from one element to another and is hosted in an iframe. Anyways, I filed an issue in the WebAppSecWG hoping to bring it to the attention of developers or people who might have more info on this question. Probably moving this discussion there is better for more visibility instead of continuing it in this PR. Also let me know if WebAppSecWG is the right place to file an issue about this policy and discuss it. I've also seen some issues filed in WHATWG/html, so wasn't super sure which one is more suitable. |
You're right my exact example is not possible. However, my larger point stands, even with an allowlist approach:
|
Hmm, I feel like we might be thinking of the policy in different ways. I think the main idea for the policy was to "prevent frames from stealing focus from other frames" (w3c/webappsec-permissions-policy#273 (comment)) instead of "prevent frames from using focus APIs". With the former in mind, the wrapper frame counterexample doesn’t really apply because the outer frame already passed focus to evil2.example, which then passes focus to evil.example, so no frame is stealing focus from other frames there. Even if evil.example is denied from the policy, the wrapper frame is not really enabling evil.example to use the policy since it's not stealing focus from other places. As a similar example, if you go to outlook.com and click on the To Do icon, it loads an iframe with the To Do app, the top frame passes focus to this iframe, but then this iframe focuses the input field "Add a task". Under the "prevent frames from using focus APIs" model, this wouldn’t work unless the iframe had the policy explicitly allowed. Also, now that we're discussing this, the policy name might be misleading. The last TPAC resolution “Focus delegation should also be allowed” means that a parent frame should be able to programmatically set focus into a child iframe even without user activation. And that behavior should be preserved even when the policy is disabled. So maybe something like focus-steal-without-user-activation would better capture the intent? I'm open to discuss more suitable names here.
|
|
Discussed in #11696, feel free to re-add agenda+ when ready to discuss again. |
a15eca0 to
7f94c8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really clear to me what changed here overall to address the comments given during WHATNOT, but this seems wrong. Why would the "entry global object" (a concept we don't really want to use in new places) be the parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @annevk! Thanks for having a look, I made this commit to match the proposal because the previous change wasn't right. I updated the description to try and make it clearer what this PR's intention is.
I was taking some time to comment here because I was contacting Microsoft Teams (who's our customer) to discuss the corner case that came up during WHATNOT with this proposal: A hosting B hosting C, B and C have the policy denied, C is focused, B tries to move focus somewhere else. This PR would allow that to happen. Teams supports this behavior arguing that there might be apps relying on this, and that this wouldn't really constitute a security concern because:
- B could have other mechanisms to regain focus (deleting C for example)
- B could trick the user into typing inside an element that belongs to B (for example with a transparent div on top of C's input element)
- C could avoid this by using CSP frame-ancestors to avoid being embedded by B.
The fact that some webpages might be counting on behaviors like the case discussed here is my original motivation for this PR. I would try to avoid breaking existing sites that are embedded with this policy denied so it can be more easily adopted, while still fulfilling the market need for the policy to avoid frames from stealing focus.
Why would the "entry global object" (a concept we don't really want to use in new places) be the parent?
Sorry I wasn't aware that this concept is not meant to be used in new places. As far as I understand, the entry global object's associated document is the document that initiates the action, e.g. the one calling element.focus(). I named it parentDocument because that's its relationship with the descendantDocuments I define below in the For each. I'm open to rename it to something else if parentDocument is confusing, maybe initiatorDocument or callerDocument could be other options that come to my mind.
Otherwise, if entry global object is discouraged, what would you suggest instead? I'm thinking of having the caller document passed explicitly into the allow focus steps algorithm, although that would imply modifying every place where the algorithm is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think your assumptions hold. For all we know it's a document in a popup that calls element.focus().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what assumptions are you referring to specifically?
For all we know it's a document in a popup that calls element.focus()
Here my understanding is that the entry global object's associated document is the popup's document, then the algorithm looks at all its descendants and if any of them (or the popup's document itself) has focus, then it allows element.focus().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's wrong. We're not talking about whether the document gets to call focus(). We're talking about whether the document that focus() is called in will respect the call. So if you have a popup and it calls element.focus() in its opener, the entry document and its descendants are not that interesting.
|
@annevk, thanks for your comments so far. I'd like to make sure we're on the same page about behaviors before continuing the discussion on the technical details of the spec. So far there's high level agreement on the Permissions Policy: there's support from WebKit and a satisfied TAG review. There's also a merged spec PR on this repo. So there's only this piece of behavior that would need to be resolved before the feature is in a good state for finishing implementations and proposing shipping. As I mentioned in this comment, we got back to our customer Microsoft Teams and talked about the corner case that came up during 2025-09-25 WHATNOT #11696: A hosting B hosting C, B and C have the policy denied, C is focused, B tries to move focus somewhere else. This PR would allow that to happen. Teams supports this behavior too, arguing that there might be apps relying on this, and that this wouldn't really constitute a security concern because:
The fact that some webpages might be counting on behaviors like the case discussed here is the original motivation for this PR. It would try to avoid breaking existing sites that are embedded with this policy denied so it can be more easily adopted, while still fulfilling the market need for a policy that prevents frames from stealing focus. Just to further clarify the proposal, I added this pseudo algorithm to the description, trying to capture all possible cases of the 'allow focus steps': (the current cases being discussed would fall into the third 'if' statement above, the rest of the algorithm looks like it's currently spec'd as of now) |
|
Wouldn't the proposal be rather problematic with fullscreen? First user triggers fullscreen on C and browser tells about C being in fullscreen. Then B steals focus from C and all the keyboard events go to B? Or am I missing something (I very well could be)? |
Hey @smaug----, the intent of this feature is not to protect child frames from their parent frames. There are existing mechanisms that websites/webapps can use to prevent themselves from being iframed by untrusted origins (e.g. CSP frame-ancestors, X-Frame-Options). If C didn't trust B then it wouldn't allow itself to be iframed by it. The intent of this feature is to give website/webapp developers full control over focus when choosing to render subsets of the experience using embedded frames (e.g. Teams Platform Apps, ChatGPT Apps, etc.). |
|
Where is it documented that the current spec'ed behavior is not the intent of the feature? (other than here in this pr the proposal to change the behavior). What is requested here isn't about focus delegation, but focus stealing from descendant. |
|
Hey @smaug----, this change has been discussed several times in WHATNOT meetings and it's intended to address an edge case not considered in the original spec/implementation. Once @ffiori is back from his break, he will bring it up again in the next one to ensure there is alignment with all stakeholders. As far as I can tell, the original intent of this feature was to protect apps running in the top-level window from child frames stealing their focus. It does not prevent parents from taking that focus back. As currently implemented, an app running in the top-level window can always take focus back from a child frame. Now if this same app running in the top-level window gets embedded in an iframe, its focus logic will break because it can no longer take focus back from its children. That is what we are trying to fix here and ensure consistency. Ultimately, a parent window/frame can always force focus back to itself by either destroying the child iframe or even using an overlay to capture user input (i.e. click-jacking) so I don't believe we should be trying to prevent that with this feature. |
|
Yes, I've attended probably all those WHATNOT meetings ;). I brought up a possible issue here and I expect that someone will either explain why it is not a problem, or tweak the PR. Fullscreen is a special case and we need to be careful with it. |
|
I'm back :) Thanks @ydogandjiev for summarizing the context, and thanks @smaug---- for your interest in the policy. Regarding the fullscreen case you mentioned: if the user triggers fullscreen on C, then B is considered to have user activation because it’s an ancestor of C in the same activation chain. As a result, B can take focus since the policy allows focus when there’s user activation (see item 2 in https://html.spec.whatwg.org/#allow-focus-steps). Also, I’d like to point you to #11839, where I explain the reasoning behind this approach and the problems it addresses. There’s support from different developers there as well. @smaug----, would you change anything in the proposal? |



Adding a step in 'allow focus steps' to check if any of the inclusive descendant frames of the caller's frame is currently focused, then return true.
This part of the spec was missing after the resolution during TPAC 2024 in WHATWG meeting: w3c/webappsec-permissions-policy#273 (comment)
where it was resolved that "Focus delegation should also be allowed (allow parent frame programmatically set focus into child iframe)".
Informally speaking, with this change the 'allow focus steps' end up looking like this:
See the previous spec PR for this permissions policy for more details: #10672.
/interaction.html ( diff )