You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor interface for projections/proximal operators (#147)
* fix outdated type parameters in `LocationScale`
* add `operator` keyword argument to `optimize` so that projection/proximal operatord can have their own interface.
* fix benchmark
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hong Ge <[email protected]>
`ClipScale` is a projection operator, which ensures that the variational approximation stays within a stable region of the variational family.
127
+
For more information see [this section](@ref clipscale).
128
+
125
129
`q_avg_trans` is the final output of the optimization procedure.
126
130
If a parameter averaging strategy is used through the keyword argument `averager`, `q_avg_trans` is be the output of the averaging strategy, while `q_trans` is the last iterate.
Copy file name to clipboardExpand all lines: docs/src/optimization.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,3 +26,18 @@ PolynomialAveraging
26
26
[^DCAMHV2020]: Dhaka, A. K., Catalina, A., Andersen, M. R., Magnusson, M., Huggins, J., & Vehtari, A. (2020). Robust, accurate stochastic optimization for variational inference. Advances in Neural Information Processing Systems, 33, 10961-10973.
27
27
[^KMJ2024]: Khaled, A., Mishchenko, K., & Jin, C. (2023). Dowg unleashed: An efficient universal parameter-free gradient descent method. Advances in Neural Information Processing Systems, 36, 6748-6769.
28
28
[^IHC2023]: Ivgi, M., Hinder, O., & Carmon, Y. (2023). Dog is sgd's best friend: A parameter-free dynamic step size schedule. In International Conference on Machine Learning (pp. 14465-14499). PMLR.
29
+
## Operators
30
+
31
+
Depending on the variational family, variational objective, and optimization strategy, it might be necessary to modify the variational parameters after performing a gradient-based update.
32
+
For this, an operator acting on the parameters can be supplied via the `operator` keyword argument of `AdvancedVI.optimize`.
33
+
34
+
### [`ClipScale`](@id clipscale)
35
+
36
+
For the location scale, it is often the case that optimization is stable only when the smallest eigenvalue of the scale matrix is strictly positive[^D2020].
37
+
To ensure this, we provide the following projection operator:
38
+
39
+
```@docs
40
+
ClipScale
41
+
```
42
+
43
+
[^D2020]: Domke, J. (2020). Provable smoothness guarantees for black-box variational inference. In *International Conference on Machine Learning*.
Update variational distribution according to the update rule in the optimizer state `opt_st` and the variational family `family_type`.
68
-
69
-
This is a wrapper around `Optimisers.update!` to provide some indirection.
70
-
For example, depending on the optimizer and the variational family, this may do additional things such as applying projection or proximal mappings.
71
-
Same as the default behavior of `Optimisers.update!`, `params` and `opt_st` may be updated by the routine and are no longer valid after calling this functino.
72
-
Instead, the return values should be used.
73
-
74
-
# Arguments
75
-
- `family_type::Type`: Type of the variational family `typeof(restructure(params))`.
76
-
- `opt_st`: Optimizer state returned by `Optimisers.setup`.
77
-
- `params`: Current set of parameters to be updated.
78
-
- `restructure`: Callable for restructuring the varitional distribution from `params`.
79
-
- `grad`: Gradient to be used by the update rule of `opt_st`.
The location scale variational family broadly represents various variational
14
6
families using `location` and `scale` variational parameters.
@@ -20,21 +12,11 @@ represented as follows:
20
12
u = rand(dist, d)
21
13
z = scale*u + location
22
14
```
23
-
24
-
`scale_eps` sets a constraint on the smallest value of `scale` to be enforced during optimization.
25
-
This is necessary to guarantee stable convergence.
26
-
27
-
# Keyword Arguments
28
-
- `scale_eps`: Lower bound constraint for the diagonal of the scale. (default: `1e-4`).
29
15
"""
30
-
functionMvLocationScale(
31
-
location::AbstractVector{T},
32
-
scale::AbstractMatrix{T},
33
-
dist::ContinuousUnivariateDistribution;
34
-
scale_eps::T=T(1e-4),
35
-
) where {T<:Real}
36
-
@assertminimum(diag(scale)) ≥ scale_eps "Initial scale is too small (smallest diagonal value is $(minimum(diag(scale)))). This might result in unstable optimization behavior."
0 commit comments