-
Notifications
You must be signed in to change notification settings - Fork 9
Add single epoch optimization #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @alexanderknop for initial review. |
1. Let |impressions| be the result of invoking [=common matching logic=] | ||
with |options|, |topLevelSite|, |intermediarySite|, |epoch|, and |now|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have this problem, whereby we let the multi-epoch query avoid deductions unless it selects an impression from an epoch. But the single epoch query does not. I see that this is how it is implemented in psdlib, so I'm not contesting the conclusion, but it's annoying.
If a multi-epoch query only selects impressions from a single epoch, we deduct 2 * value * epsilon / maxValue
. That's mostly OK, but that factor of 2 is a real challenge. It is not going to be obvious to people using this API that the epsilon value in their browser is only half of what they get for most queries. That is, the browser might set an epsilon budget of 4, but the site can only reasonably make queries with epsilon = 2 within that budget, unless they are careful to stay within a single epoch. Given the length of our epoch - and its random start offset - that will rarely be useful to them, so I don't expect it to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have this problem, whereby we let the multi-epoch query avoid deductions unless it selects an impression from an epoch. But the single epoch query does not. I see that this is how it is implemented in psdlib, so I'm not contesting the conclusion, but it's annoying.
That is not the case. In the single epoch case we:
- Return early if
matchedImpressions
is empty, deducting no budget (before invoking attribution logic) - If there are matched impressings, we deduct proportional to the actual realized L1 norm of the histogram, so even if we removed step (1) we would still deduct nothing (the l1 norm of the empty histogram is 0).
If a multi-epoch query only selects impressions from a single epoch, we deduct 2 * value * epsilon / maxValue. That's mostly OK, but that factor of 2 is a real challenge. It is not going to be obvious to people using this API that the epsilon value in their browser is only half of what they get for most queries. That is, the browser might set an epsilon budget of 4, but the site can only reasonably make queries with epsilon = 2 within that budget, unless they are careful to stay within a single epoch. Given the length of our epoch - and its random start offset - that will rarely be useful to them, so I don't expect it to happen.
Two things:
a. Without the single-budget optimization, we still need the factor of 2 increase, per some of the discussion in #212. This PR only makes this more apparent (since in the existing spec, to achieve the privacy guarantees the noise factor needs to be 2 * |maxValue| / |epsilon|
.
b. I actually think there are plenty of use-cases where you will hit the single-budget opt. For instance, I think it is relatively common to have O(day) lookback windows for view throughs. In those cases, I think it is perfectly fine to "silently" optimize and deduct less than the expected budget.
Co-authored-by: Martin Thomson <[email protected]>
Co-authored-by: Andrew Paseltiner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM spec-wise; I defer the math part to you and Martin. I can implement this in the simulator in a followup PR.
Thanks folks, I'll wait for review by @alexanderknop before landing. |
The algorithms loogs great to me so other than renaming variables for clarity, I think this is a great! |
This fixes #78 and partially addresses #212 . The algorithm uses the configured lookback window to determine automatically if the attribution is only considering impressions from the current epoch. If so, budget deduction is altered by considering the report's sensitivity to be
|l1Norm|
rather than2 * |value|
.In both cases, we assume that the noise scale for Laplace noise is `lambda = |maxValue| / |epsilon|.
Note: An alternative considered was to add a new option to the API to query epochs and apply the optimization if only a single epoch was chosen, but using that API seems difficult, and at a minimum requires exposing the epoch start map to all conversion sites to use it effectively. Additionally, it adds API surface bloat.
Preview | Diff