Skip to content

Conversation

@ffiori
Copy link
Contributor

@ffiori ffiori commented Jul 31, 2025

Adding a step in 'allow focus steps' to check if any of the inclusive descendant frames of the caller's frame is currently focused, then return true.

This part of the spec was missing after the resolution during TPAC 2024 in WHATWG meeting: w3c/webappsec-permissions-policy#273 (comment)

where it was resolved that "Focus delegation should also be allowed (allow parent frame programmatically set focus into child iframe)".

Informally speaking, with this change the 'allow focus steps' end up looking like this:

algorithm allow_focus(focus_setter_frame, target, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if the user initiated the action (target's frame has transient activation):
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

See the previous spec PR for this permissions policy for more details: #10672.


/interaction.html ( diff )

@ffiori ffiori changed the title Check if descendants have focus in allow focus steps [focus-without-user-activation] Allow focus if a descendant has focus Aug 1, 2025
@ffiori ffiori marked this pull request as ready for review August 1, 2025 00:08
@ffiori
Copy link
Contributor Author

ffiori commented Aug 5, 2025

Thanks @dandclark!
@annevk could you PTAL? Or let me know who else could review this? I don't have permissions to add reviewers.

@dandclark dandclark requested a review from annevk August 5, 2025 20:33
@ffiori
Copy link
Contributor Author

ffiori commented Aug 13, 2025

@annevk friendly ping on this PR, I'd appreciate if you could take a look when you have some time :)

@domenic
Copy link
Member

domenic commented Aug 20, 2025

I don't understand this.

Let's say you have top-level page A, hosting iframe B, which in turn hosts iframe C.

Top-level page A has decided not to allow iframe B to focus. So code in iframe B which calls element.focus() should do nothing, and not steal focus.

But then iframe B can work around this, whenever it or its children have focus? Why do we let iframe B override the wishes of top-level page A in this way?

Can you give a realistic example of when this is desired? I read through both w3c/webappsec-permissions-policy#273 (comment) and #10672 and cannot find any motivation for, or agreement on, this change.

The closest is the resolution to "allow parent frame programmatically set focus into child iframe", but that is not what this PR does. This PR lets the child frame override the parent frame's wishes; it doesn't allow the parent frame to focus the child.

@ffiori
Copy link
Contributor Author

ffiori commented Aug 22, 2025

@domenic thanks for having a look! I've been reading all the old discussions, let me see if I'm misunderstanding the intended behavior:

Let's say we have top-level frame A, hosting iframe B, which hosts iframe C, and B and C have the policy denied. And let's say A moves focus to B. Once B has focus I think it makes sense for B to be able to move focus inside itself as it wants because it's not "stealing" focus from its parent or other frames anymore, right? I feel like a realistic example of this could be any webpage that moves focus from one element to another with .focus() and it's hosted in an iframe, which wouldn't be harmful.

Just to further clarify some behaviors, I have a PR in review to update the explainer here w3c/webappsec-permissions-policy#574 in which I try to capture all corner cases and old discussions with some pseudocode:

algorithm is_allowed_to_set_focus(focus_setter_frame, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

Let me know if we're more or less on the same page about this.

By the way, I also realized my change here is not right, it doesn't work for the case where A hosts iframes B and C (they're siblings), focus is on B, and A tries to focus C instead. I think A should be able to do that but, according to the spec, C is target and none of its inclusive descendants have focus. I think the spec should look at the inclusive descendants of the focus setter frame instead (in this case, A), just like the pseudocode above. I still need to figure out how to write this with spec words, but wanted to make sure we agree on the desired behaviors first.

@domenic
Copy link
Member

domenic commented Aug 26, 2025

Once B has focus I think it makes sense for B to be able to move focus inside itself as it wants because it's not "stealing" focus from its parent or other frames anymore, right? I feel like a realistic example of this could be any webpage that moves focus from one element to another with .focus() and it's hosted in an iframe, which wouldn't be harmful.

I'm not sure. It depends on the original intent of the proposal. If it was to prevent malicious third-party frames from moving the user's focus around without user activation, then just the fact that it got focus once is not a good license for allowing further focus movements. But, if the intent is some sort of belief that once the user has given user activation a single time, that proves the subframe trustworthy, then maybe it is OK.

I also think there's a significant difference between allowing a frame to move focus within itself, and allowing it to move focus within child iframes. Especially child iframes which the parent frame has explicitly disallowed. That gives another workaround. E.g. consider the permissions policy "allow focus-without-user-activation from all sites except https://evil.example/. All evil.example has to do in this case to bypass the policy is create a small wrapper frame at https://evil2.example/, which then hosts the https://evil.example/ frame, and the policy has become useless. That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Do you know of specific sites that need these changes to the current policy? Otherwise, I think being more conservative might make sense.

@ffiori
Copy link
Contributor Author

ffiori commented Sep 3, 2025

@domenic thanks for your comments! I'll reply inline.

I'm not sure. It depends on the original intent of the proposal. If it was to prevent malicious third-party frames from moving the user's focus around without user activation, then just the fact that it got focus once is not a good license for allowing further focus movements. But, if the intent is some sort of belief that once the user has given user activation a single time, that proves the subframe trustworthy, then maybe it is OK.

I think the main idea for the policy was to "prevent frames from stealing focus without the user noticing or without the user's consent". w3c/webappsec-permissions-policy#273 (comment)

I also think there's a significant difference between allowing a frame to move focus within itself, and allowing it to move focus within child iframes. Especially child iframes which the parent frame has explicitly disallowed. That gives another workaround. E.g. consider the permissions policy "allow focus-without-user-activation from all sites except https://evil.example/. All evil.example has to do in this case to bypass the policy is create a small wrapper frame at https://evil2.example/, which then hosts the https://evil.example/ frame, and the policy has become useless. That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Is this example actually possible? Permissions policies work with whitelists instead of forbidden lists, right? As in allow="focus-without-user-activation a.com b.com". So if the author is whitelisting a site, an iframe hosting this site should be able to "steal" focus and move it around itself as it wants, even to subframes. I understand that if you whitelist a site, you trust this site is not gonna act as a wrapper for a malicious one. Let me know if I'm missing a way to set a policy to be allowed for all sites except evil.com.

Do you know of specific sites that need these changes to the current policy? Otherwise, I think being more conservative might make sense.

I'm not aware of specific sites that I could cite, but seems to me that this could be breaking any site that moves focus from one element to another and is hosted in an iframe. Anyways, I filed an issue in the WebAppSecWG hoping to bring it to the attention of developers or people who might have more info on this question. Probably moving this discussion there is better for more visibility instead of continuing it in this PR.

Also let me know if WebAppSecWG is the right place to file an issue about this policy and discuss it. I've also seen some issues filed in WHATWG/html, so wasn't super sure which one is more suitable.

@domenic
Copy link
Member

domenic commented Sep 4, 2025

Is this example actually possible? Permissions policies work with whitelists instead of forbidden lists, right?

You're right my exact example is not possible. However, my larger point stands, even with an allowlist approach:

That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

@dandclark dandclark added the agenda+ To be discussed at a triage meeting label Sep 4, 2025
@ffiori
Copy link
Contributor Author

ffiori commented Sep 8, 2025

However, my larger point stands, even with an allowlist approach:

That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Hmm, I feel like we might be thinking of the policy in different ways. I think the main idea for the policy was to "prevent frames from stealing focus from other frames" (w3c/webappsec-permissions-policy#273 (comment)) instead of "prevent frames from using focus APIs". With the former in mind, the wrapper frame counterexample doesn’t really apply because the outer frame already passed focus to evil2.example, which then passes focus to evil.example, so no frame is stealing focus from other frames there. Even if evil.example is denied from the policy, the wrapper frame is not really enabling evil.example to use the policy since it's not stealing focus from other places.

As a similar example, if you go to outlook.com and click on the To Do icon, it loads an iframe with the To Do app, the top frame passes focus to this iframe, but then this iframe focuses the input field "Add a task". Under the "prevent frames from using focus APIs" model, this wouldn’t work unless the iframe had the policy explicitly allowed.
(Pasting a screenshot below to describe this better)

Also, now that we're discussing this, the policy name might be misleading. The last TPAC resolution “Focus delegation should also be allowed” means that a parent frame should be able to programmatically set focus into a child iframe even without user activation. And that behavior should be preserved even when the policy is disabled. So maybe something like focus-steal-without-user-activation would better capture the intent? I'm open to discuss more suitable names here.

image

@cwilso cwilso removed the agenda+ To be discussed at a triage meeting label Sep 18, 2025
@dandclark dandclark added the agenda+ To be discussed at a triage meeting label Sep 24, 2025
@ffiori
Copy link
Contributor Author

ffiori commented Sep 24, 2025

To expand on this:

Do you know of specific sites that need these changes to the current policy?

Besides the example for the To Do app in Outlook I mentioned above this comment, I also confirmed with a customer of ours (Microsoft Teams) that this less conservative behavior (the one this PR proposes) is needed for them to be able to use the permissions policy.

As an example of this, I'm pasting a screenshot below where the user opens the Microsoft Copilot app (loaded in iframe B) inside Teams (iframe A). Teams wants to focus on the app (B), which in turn wants to move focus to the input element at the bottom of it (the one that says "Message Copilot"). Teams wants to deny the permissions policy on B so it doesn't steal focus if let's say the user starts typing something in the search bar at the top while B is still loading, and then when it finishes loading it tries to focus its input bar. If the policy prevented all use of programmatic focusing APIs inside it even when it's focused, then this experience would break.

image

I'm pasting another screenshot below where there are 3 frames, A hosting B hosting C. Teams would like to disable the policy on B for similar reasons as the previous example. If the user just clicks on the Engage icon on the left and waits, the app loads B and A moves focus to B. Now if B wants to focus the video that's in iframe C as soon as it gets focus, it should be able to do that. If we choose the stricter behavior, this experience is broken as well.

image

There are more examples like these in other M365 products like Outlook (the one above this comment), OneNote, Word/PowerPoint/Excel and more. Choosing the stricter behavior would be breaking lots of these sites, and pretty sure the same with similar ones outside of this ecosystem.

@ffiori ffiori marked this pull request as draft September 25, 2025 17:21
@cwilso
Copy link

cwilso commented Sep 26, 2025

Discussed in #11696, feel free to re-add agenda+ when ready to discuss again.

@ffiori ffiori force-pushed the focus-without-user-activation-descendants branch from a15eca0 to 7f94c8b Compare October 3, 2025 23:21
@ffiori ffiori marked this pull request as ready for review October 3, 2025 23:24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear to me what changed here overall to address the comments given during WHATNOT, but this seems wrong. Why would the "entry global object" (a concept we don't really want to use in new places) be the parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @annevk! Thanks for having a look, I made this commit to match the proposal because the previous change wasn't right. I updated the description to try and make it clearer what this PR's intention is.

I was taking some time to comment here because I was contacting Microsoft Teams (who's our customer) to discuss the corner case that came up during WHATNOT with this proposal: A hosting B hosting C, B and C have the policy denied, C is focused, B tries to move focus somewhere else. This PR would allow that to happen. Teams supports this behavior arguing that there might be apps relying on this, and that this wouldn't really constitute a security concern because:

  1. B could have other mechanisms to regain focus (deleting C for example)
  2. B could trick the user into typing inside an element that belongs to B (for example with a transparent div on top of C's input element)
  3. C could avoid this by using CSP frame-ancestors to avoid being embedded by B.

The fact that some webpages might be counting on behaviors like the case discussed here is my original motivation for this PR. I would try to avoid breaking existing sites that are embedded with this policy denied so it can be more easily adopted, while still fulfilling the market need for the policy to avoid frames from stealing focus.

Why would the "entry global object" (a concept we don't really want to use in new places) be the parent?

Sorry I wasn't aware that this concept is not meant to be used in new places. As far as I understand, the entry global object's associated document is the document that initiates the action, e.g. the one calling element.focus(). I named it parentDocument because that's its relationship with the descendantDocuments I define below in the For each. I'm open to rename it to something else if parentDocument is confusing, maybe initiatorDocument or callerDocument could be other options that come to my mind.

Otherwise, if entry global object is discouraged, what would you suggest instead? I'm thinking of having the caller document passed explicitly into the allow focus steps algorithm, although that would imply modifying every place where the algorithm is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think your assumptions hold. For all we know it's a document in a popup that calls element.focus().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what assumptions are you referring to specifically?

For all we know it's a document in a popup that calls element.focus()

Here my understanding is that the entry global object's associated document is the popup's document, then the algorithm looks at all its descendants and if any of them (or the popup's document itself) has focus, then it allows element.focus().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's wrong. We're not talking about whether the document gets to call focus(). We're talking about whether the document that focus() is called in will respect the call. So if you have a popup and it calls element.focus() in its opener, the entry document and its descendants are not that interesting.

@ffiori
Copy link
Contributor Author

ffiori commented Oct 23, 2025

@annevk, thanks for your comments so far. I'd like to make sure we're on the same page about behaviors before continuing the discussion on the technical details of the spec.

So far there's high level agreement on the Permissions Policy: there's support from WebKit and a satisfied TAG review. There's also a merged spec PR on this repo. So there's only this piece of behavior that would need to be resolved before the feature is in a good state for finishing implementations and proposing shipping.

As I mentioned in this comment, we got back to our customer Microsoft Teams and talked about the corner case that came up during 2025-09-25 WHATNOT #11696: A hosting B hosting C, B and C have the policy denied, C is focused, B tries to move focus somewhere else. This PR would allow that to happen. Teams supports this behavior too, arguing that there might be apps relying on this, and that this wouldn't really constitute a security concern because:

  1. B could have other mechanisms to regain focus (deleting C for example).
  2. B could trick the user into typing inside an element that belongs to B (for example with a transparent div on top of C's input element).
  3. C could prevent this by using CSP frame-ancestors to avoid being embedded by B.

The fact that some webpages might be counting on behaviors like the case discussed here is the original motivation for this PR. It would try to avoid breaking existing sites that are embedded with this policy denied so it can be more easily adopted, while still fulfilling the market need for a policy that prevents frames from stealing focus.

Just to further clarify the proposal, I added this pseudo algorithm to the description, trying to capture all possible cases of the 'allow focus steps':

algorithm allow_focus(focus_setter_frame, target, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if the user initiated the action (target's frame has transient activation):
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

(the current cases being discussed would fall into the third 'if' statement above, the rest of the algorithm looks like it's currently spec'd as of now)

cc @ydogandjiev @taylore-msft

@smaug----
Copy link

smaug---- commented Nov 21, 2025

Wouldn't the proposal be rather problematic with fullscreen? First user triggers fullscreen on C and browser tells about C being in fullscreen. Then B steals focus from C and all the keyboard events go to B? Or am I missing something (I very well could be)?

@ydogandjiev
Copy link

ydogandjiev commented Nov 24, 2025

Wouldn't the proposal be rather problematic with fullscreen? First user triggers fullscreen on C and browser tells about C being in fullscreen. Then B steals focus from C and all the keyboard events go to B? Or am I missing something (I very well could be)?

Hey @smaug----, the intent of this feature is not to protect child frames from their parent frames. There are existing mechanisms that websites/webapps can use to prevent themselves from being iframed by untrusted origins (e.g. CSP frame-ancestors, X-Frame-Options). If C didn't trust B then it wouldn't allow itself to be iframed by it. The intent of this feature is to give website/webapp developers full control over focus when choosing to render subsets of the experience using embedded frames (e.g. Teams Platform Apps, ChatGPT Apps, etc.).

@smaug----
Copy link

Where is it documented that the current spec'ed behavior is not the intent of the feature? (other than here in this pr the proposal to change the behavior). What is requested here isn't about focus delegation, but focus stealing from descendant.

@ydogandjiev
Copy link

ydogandjiev commented Nov 26, 2025

Hey @smaug----, this change has been discussed several times in WHATNOT meetings and it's intended to address an edge case not considered in the original spec/implementation. Once @ffiori is back from his break, he will bring it up again in the next one to ensure there is alignment with all stakeholders.

As far as I can tell, the original intent of this feature was to protect apps running in the top-level window from child frames stealing their focus. It does not prevent parents from taking that focus back. As currently implemented, an app running in the top-level window can always take focus back from a child frame. Now if this same app running in the top-level window gets embedded in an iframe, its focus logic will break because it can no longer take focus back from its children. That is what we are trying to fix here and ensure consistency. Ultimately, a parent window/frame can always force focus back to itself by either destroying the child iframe or even using an overlay to capture user input (i.e. click-jacking) so I don't believe we should be trying to prevent that with this feature.

@smaug----
Copy link

Yes, I've attended probably all those WHATNOT meetings ;). I brought up a possible issue here and I expect that someone will either explain why it is not a problem, or tweak the PR. Fullscreen is a special case and we need to be careful with it.

@ffiori
Copy link
Contributor Author

ffiori commented Nov 26, 2025

I'm back :)

Thanks @ydogandjiev for summarizing the context, and thanks @smaug---- for your interest in the policy.

Regarding the fullscreen case you mentioned: if the user triggers fullscreen on C, then B is considered to have user activation because it’s an ancestor of C in the same activation chain. As a result, B can take focus since the policy allows focus when there’s user activation (see item 2 in https://html.spec.whatwg.org/#allow-focus-steps).

Also, I’d like to point you to #11839, where I explain the reasoning behind this approach and the problems it addresses. There’s support from different developers there as well. @smaug----, would you change anything in the proposal?

@leotlee leotlee removed the agenda+ To be discussed at a triage meeting label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

8 participants