Skip to content

Add single epoch optimization #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 15, 2025
Merged

Add single epoch optimization #239

merged 5 commits into from
Aug 15, 2025

Conversation

csharrison
Copy link
Collaborator

@csharrison csharrison commented Aug 11, 2025

This fixes #78 and partially addresses #212 . The algorithm uses the configured lookback window to determine automatically if the attribution is only considering impressions from the current epoch. If so, budget deduction is altered by considering the report's sensitivity to be |l1Norm| rather than 2 * |value|.

In both cases, we assume that the noise scale for Laplace noise is `lambda = |maxValue| / |epsilon|.

Note: An alternative considered was to add a new option to the API to query epochs and apply the optimization if only a single epoch was chosen, but using that API seems difficult, and at a minimum requires exposing the epoch start map to all conversion sites to use it effectively. Additionally, it adds API surface bloat.


Preview | Diff

@csharrison csharrison marked this pull request as ready for review August 11, 2025 14:06
@csharrison
Copy link
Collaborator Author

cc @alexanderknop for initial review.

Comment on lines +1625 to +1626
1. Let |impressions| be the result of invoking [=common matching logic=]
with |options|, |topLevelSite|, |intermediarySite|, |epoch|, and |now|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have this problem, whereby we let the multi-epoch query avoid deductions unless it selects an impression from an epoch. But the single epoch query does not. I see that this is how it is implemented in psdlib, so I'm not contesting the conclusion, but it's annoying.

If a multi-epoch query only selects impressions from a single epoch, we deduct 2 * value * epsilon / maxValue. That's mostly OK, but that factor of 2 is a real challenge. It is not going to be obvious to people using this API that the epsilon value in their browser is only half of what they get for most queries. That is, the browser might set an epsilon budget of 4, but the site can only reasonably make queries with epsilon = 2 within that budget, unless they are careful to stay within a single epoch. Given the length of our epoch - and its random start offset - that will rarely be useful to them, so I don't expect it to happen.

Copy link
Collaborator Author

@csharrison csharrison Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have this problem, whereby we let the multi-epoch query avoid deductions unless it selects an impression from an epoch. But the single epoch query does not. I see that this is how it is implemented in psdlib, so I'm not contesting the conclusion, but it's annoying.

That is not the case. In the single epoch case we:

  1. Return early if matchedImpressions is empty, deducting no budget (before invoking attribution logic)
  2. If there are matched impressings, we deduct proportional to the actual realized L1 norm of the histogram, so even if we removed step (1) we would still deduct nothing (the l1 norm of the empty histogram is 0).

If a multi-epoch query only selects impressions from a single epoch, we deduct 2 * value * epsilon / maxValue. That's mostly OK, but that factor of 2 is a real challenge. It is not going to be obvious to people using this API that the epsilon value in their browser is only half of what they get for most queries. That is, the browser might set an epsilon budget of 4, but the site can only reasonably make queries with epsilon = 2 within that budget, unless they are careful to stay within a single epoch. Given the length of our epoch - and its random start offset - that will rarely be useful to them, so I don't expect it to happen.

Two things:

a. Without the single-budget optimization, we still need the factor of 2 increase, per some of the discussion in #212. This PR only makes this more apparent (since in the existing spec, to achieve the privacy guarantees the noise factor needs to be 2 * |maxValue| / |epsilon|.

b. I actually think there are plenty of use-cases where you will hit the single-budget opt. For instance, I think it is relatively common to have O(day) lookback windows for view throughs. In those cases, I think it is perfectly fine to "silently" optimize and deduct less than the expected budget.

Copy link
Collaborator

@apasel422 apasel422 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM spec-wise; I defer the math part to you and Martin. I can implement this in the simulator in a followup PR.

@csharrison
Copy link
Collaborator Author

Thanks folks, I'll wait for review by @alexanderknop before landing.

@alexanderknop
Copy link

The algorithms loogs great to me so other than renaming variables for clarity, I think this is a great!

@csharrison csharrison merged commit 01d9d61 into main Aug 15, 2025
1 of 2 checks passed
@csharrison csharrison deleted the single-opt branch August 15, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spec the single epoch optimization
4 participants