Skip to content

Commit f1920b5

Browse files
mhauruRed-Portalgithub-actions[bot]penelopeysm
authored
[breaking] v0.42 (#2702)
A PR to accumulate breaking changes, to be released as v0.42. --------- Co-authored-by: Kyurae Kim <kyrkim@seas.upenn.edu> Co-authored-by: Kyurae Kim <msca8h@naver.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Penelope Yong <penelopeysm@gmail.com>
1 parent 361a2cc commit f1920b5

33 files changed

+841
-1487
lines changed

HISTORY.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,194 @@
1+
# 0.42.0
2+
3+
## DynamicPPL 0.39
4+
5+
Turing.jl v0.42 brings with it all the underlying changes in DynamicPPL 0.39.
6+
Please see [the DynamicPPL changelog](https://github.com/TuringLang/DynamicPPL.jl/releases/tag/v0.39.0) for full details; in here we summarise only the changes that are most pertinent to end-users of Turing.jl.
7+
8+
### Thread safety opt-in
9+
10+
Turing.jl has supported threaded tilde-statements for a while now, as long as said tilde-statements are **observations** (i.e., likelihood terms).
11+
For example:
12+
13+
```julia
14+
@model function f(y)
15+
x ~ Normal()
16+
Threads.@threads for i in eachindex(y)
17+
y[i] ~ Normal(x)
18+
end
19+
end
20+
```
21+
22+
**Models where tilde-statements or `@addlogprob!` are used in parallel require what we call 'threadsafe evaluation'.**
23+
In previous releases of Turing.jl, threadsafe evaluation was enabled whenever Julia was launched with more than one thread.
24+
However, this is an imprecise way of determining whether threadsafe evaluation is really needed.
25+
It causes performance degradation for models that do _not_ actually need threadsafe evaluation, and generally led to ill-defined behaviour in various parts of the Turing codebase.
26+
27+
In Turing.jl v0.42, **threadsafe evaluation is now opt-in.**
28+
To enable threadsafe evaluation, after defining a model, you now need to call `setthreadsafe(model, true)` (note that this is not a mutating function, it returns a new model):
29+
30+
```julia
31+
y = randn(100)
32+
model = f(y)
33+
model = setthreadsafe(model, true)
34+
```
35+
36+
You *only* need to do this if your model uses tilde-statements or `@addlogprob!` in parallel.
37+
You do *not* need to do this if:
38+
39+
- your model has other kinds of parallelism but does not include tilde-statements inside;
40+
- or you are using `MCMCThreads()` or `MCMCDistributed()` to sample multiple chains in parallel, but your model itself does not use parallelism.
41+
42+
If your model does include parallelised tilde-statements or `@addlogprob!` calls, and you evaluate it/sample from it without setting `setthreadsafe(model, true)`, then you may get statistically incorrect results without any warnings or errors.
43+
44+
### Faster performance
45+
46+
Many operations in DynamicPPL have been substantially sped up.
47+
You should find that anything that uses LogDensityFunction (i.e., HMC/NUTS samplers, optimisation) is faster in this release.
48+
Prior sampling should also be much faster than before.
49+
50+
### `predict` improvements
51+
52+
If you have a model that requires threadsafe evaluation (i.e., parallel observations), you can now use this with `predict`.
53+
Carrying on from the previous example, you can do:
54+
55+
```julia
56+
model = setthreadsafe(f(y), true)
57+
chain = sample(model, NUTS(), 1000)
58+
59+
pdn_model = f(fill(missing, length(y)))
60+
pdn_model = setthreadsafe(pdn_model, true) # set threadsafe
61+
predictions = predict(pdn_model, chain) # generate new predictions in parallel
62+
```
63+
64+
### Log-density names in chains
65+
66+
When sampling from a Turing model, the resulting `MCMCChains.Chains` object now contains the log-joint, log-prior, and log-likelihood under the names `:logjoint`, `:logprior`, and `:loglikelihood` respectively.
67+
Previously, `:logjoint` would be stored under the name `:lp`.
68+
69+
### Log-evidence in chains
70+
71+
When sampling using MCMCChains, the chain object will no longer have its `chain.logevidence` field set.
72+
Instead, you can calculate this yourself from the log-likelihoods stored in the chain.
73+
For SMC samplers, the log-evidence of the entire trajectory is stored in `chain[:logevidence]` (which is the same for every particle in the 'chain').
74+
75+
## AdvancedVI 0.6
76+
77+
Turing.jl v0.42 updates `AdvancedVI.jl` compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features).
78+
`AdvancedVI.jl@0.6` introduces major structural changes including breaking changes to the interface and multiple new features.
79+
The summary of the changes below are the things that affect the end-users of Turing.
80+
For a more comprehensive list of changes, please refer to the [changelogs](https://github.com/TuringLang/AdvancedVI.jl/blob/main/HISTORY.md) in `AdvancedVI`.
81+
82+
### Breaking changes
83+
84+
A new level of interface for defining different variational algorithms has been introduced in `AdvancedVI` v0.5. As a result, the function `Turing.vi` now receives a keyword argument `algorithm`. The object `algorithm <: AdvancedVI.AbstractVariationalAlgorithm` should now contain all the algorithm-specific configurations. Therefore, keyword arguments of `vi` that were algorithm-specific such as `objective`, `operator`, `averager` and so on, have been moved as fields of the relevant `<: AdvancedVI.AbstractVariationalAlgorithm` structs.
85+
86+
In addition, the outputs also changed. Previously, `vi` returned both the last-iterate of the algorithm `q` and the iterate average `q_avg`. Now, for the algorithms running parameter averaging, only `q_avg` is returned. As a result, the number of returned values reduced from 4 to 3.
87+
88+
For example,
89+
90+
```julia
91+
q, q_avg, info, state = vi(
92+
model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale()
93+
)
94+
```
95+
96+
is now
97+
98+
```julia
99+
q_avg, info, state = vi(
100+
model,
101+
q,
102+
n_iters;
103+
algorithm=KLMinRepGradDescent(adtype; n_samples=10, operator=AdvancedVI.ClipScale()),
104+
)
105+
```
106+
107+
Similarly,
108+
109+
```julia
110+
vi(
111+
model,
112+
q,
113+
n_iters;
114+
objective=RepGradELBO(10; entropy=AdvancedVI.ClosedFormEntropyZeroGradient()),
115+
operator=AdvancedVI.ProximalLocationScaleEntropy(),
116+
)
117+
```
118+
119+
is now
120+
121+
```julia
122+
vi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10))
123+
```
124+
125+
Lastly, to obtain the last-iterate `q` of `KLMinRepGradDescent`, which is not returned in the new interface, simply select the averaging strategy to be `AdvancedVI.NoAveraging()`. That is,
126+
127+
```julia
128+
q, info, state = vi(
129+
model,
130+
q,
131+
n_iters;
132+
algorithm=KLMinRepGradDescent(
133+
adtype;
134+
n_samples=10,
135+
operator=AdvancedVI.ClipScale(),
136+
averager=AdvancedVI.NoAveraging(),
137+
),
138+
)
139+
```
140+
141+
Additionally,
142+
143+
- The default hyperparameters of `DoG`and `DoWG` have been altered.
144+
- The deprecated `AdvancedVI@0.2`-era interface is now removed.
145+
- `estimate_objective` now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, `estimate_objective` will return the *negative ELBO*. This is breaking change from the previous behavior where the ELBO was returned.
146+
- The initial value for the `q_meanfield_gaussian`, `q_fullrank_gaussian`, and `q_locationscale` have changed. Specificially, the default initial value for the scale matrix has been changed from `I` to `0.6*I`.
147+
- When using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a `Bijectors.TransformedDistribution` wrapping an unconstrained distribution. (Refer to the docstring of `vi`.)
148+
149+
### New Features
150+
151+
`AdvancedVI@0.6` adds numerous new features including the following new VI algorithms:
152+
153+
- `KLMinWassFwdBwd`: Also known as "Wasserstein variational inference," this algorithm minimizes the KL divergence under the Wasserstein-2 metric.
154+
- `KLMinNaturalGradDescent`: This algorithm, also known as "online variational Newton," is the canonical "black-box" natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence.
155+
- `KLMinSqrtNaturalGradDescent`: This is a recent variant of `KLMinNaturalGradDescent` that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices.
156+
- `FisherMinBatchMatch`: This algorithm called "batch-and-match," minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm.
157+
158+
Any of the new algorithms above can readily be used by simply swappin the `algorithm` keyword argument of `vi`.
159+
For example, to use batch-and-match:
160+
161+
```julia
162+
vi(model, q, n_iters; algorithm=FisherMinBatchMatch())
163+
```
164+
165+
## External sampler interface
166+
167+
The interface for defining an external sampler has been reworked.
168+
In general, implementations of external samplers should now no longer need to depend on Turing.
169+
This is because the interface functions required have been shifted upstream to AbstractMCMC.jl.
170+
171+
In particular, you now only need to define the following functions:
172+
173+
- `AbstractMCMC.step(rng::Random.AbstractRNG, model::AbstractMCMC.LogDensityModel, ::MySampler; kwargs...)` (and also a method with `state`, and the corresponding `step_warmup` methods if needed)
174+
- `AbstractMCMC.getparams(::MySamplerState)` -> Vector{<:Real}
175+
- `AbstractMCMC.getstats(::MySamplerState)` -> NamedTuple
176+
- `AbstractMCMC.requires_unconstrained_space(::MySampler)` -> Bool (default `true`)
177+
178+
This means that you only need to depend on AbstractMCMC.jl.
179+
As long as the above functions are defined correctly, Turing will be able to use your external sampler.
180+
181+
The `Turing.Inference.isgibbscomponent(::MySampler)` interface function still exists, but in this version the default has been changed to `true`, so you should not need to overload this.
182+
183+
## Optimisation interface
184+
185+
The Optim.jl interface has been removed (so you cannot call `Optim.optimize` directly on Turing models).
186+
You can use the `maximum_likelihood` or `maximum_a_posteriori` functions with an Optim.jl solver instead (via Optimization.jl: see https://docs.sciml.ai/Optimization/stable/optimization_packages/optim/ for documentation of the available solvers).
187+
188+
## Internal changes
189+
190+
The constructors of `OptimLogDensity` have been replaced with a single constructor, `OptimLogDensity(::DynamicPPL.LogDensityFunction)`.
191+
1192
# 0.41.4
2193

3194
Fixed a bug where the `check_model=false` keyword argument would not be respected when sampling with multiple threads or cores.

Project.toml

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "Turing"
22
uuid = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
3-
version = "0.41.4"
3+
version = "0.42.0"
44

55
[deps]
66
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
@@ -41,21 +41,19 @@ StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
4141

4242
[weakdeps]
4343
DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
44-
Optim = "429524aa-4258-5aef-a3af-852621145aeb"
4544

4645
[extensions]
4746
TuringDynamicHMCExt = "DynamicHMC"
48-
TuringOptimExt = ["Optim", "AbstractPPL"]
4947

5048
[compat]
5149
ADTypes = "1.9"
52-
AbstractMCMC = "5.5"
50+
AbstractMCMC = "5.9"
5351
AbstractPPL = "0.11, 0.12, 0.13"
5452
Accessors = "0.1"
55-
AdvancedHMC = "0.3.0, 0.4.0, 0.5.2, 0.6, 0.7, 0.8"
56-
AdvancedMH = "0.8"
53+
AdvancedHMC = "0.8.3"
54+
AdvancedMH = "0.8.9"
5755
AdvancedPS = "0.7"
58-
AdvancedVI = "0.4"
56+
AdvancedVI = "0.6"
5957
BangBang = "0.4.2"
6058
Bijectors = "0.14, 0.15"
6159
Compat = "4.15.0"
@@ -64,15 +62,14 @@ Distributions = "0.25.77"
6462
DistributionsAD = "0.6"
6563
DocStringExtensions = "0.8, 0.9"
6664
DynamicHMC = "3.4"
67-
DynamicPPL = "0.38"
65+
DynamicPPL = "0.39.1"
6866
EllipticalSliceSampling = "0.5, 1, 2"
6967
ForwardDiff = "0.10.3, 1"
7068
Libtask = "0.9.3"
7169
LinearAlgebra = "1"
7270
LogDensityProblems = "2"
7371
MCMCChains = "5, 6, 7"
7472
NamedArrays = "0.9, 0.10"
75-
Optim = "1"
7673
Optimization = "3, 4, 5"
7774
OptimizationOptimJL = "0.1, 0.2, 0.3, 0.4"
7875
OrderedCollections = "1"
@@ -89,4 +86,3 @@ julia = "1.10.8"
8986

9087
[extras]
9188
DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
92-
Optim = "429524aa-4258-5aef-a3af-852621145aeb"

docs/src/api.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ even though [`Prior()`](@ref) is actually defined in the `Turing.Inference` modu
4343
| `prefix` | [`DynamicPPL.prefix`](@extref) | Prefix all variable names in a model with a given VarName |
4444
| `LogDensityFunction` | [`DynamicPPL.LogDensityFunction`](@extref) | A struct containing all information about how to evaluate a model. Mostly for advanced users |
4545
| `@addlogprob!` | [`DynamicPPL.@addlogprob!`](@extref) | Add arbitrary log-probability terms during model evaluation |
46+
| `setthreadsafe` | [`DynamicPPL.setthreadsafe`](@extref) | Mark a model as requiring threadsafe evaluation |
4647

4748
### Inference
4849

@@ -109,12 +110,19 @@ Turing.jl provides several strategies to initialise parameters for models.
109110

110111
See the [docs of AdvancedVI.jl](https://turinglang.org/AdvancedVI.jl/stable/) for detailed usage and the [variational inference tutorial](https://turinglang.org/docs/tutorials/09-variational-inference/) for a basic walkthrough.
111112

112-
| Exported symbol | Documentation | Description |
113-
|:---------------------- |:------------------------------------------------- |:---------------------------------------------------------------------------------------- |
114-
| `vi` | [`Turing.vi`](@ref) | Perform variational inference |
115-
| `q_locationscale` | [`Turing.Variational.q_locationscale`](@ref) | Find a numerically non-degenerate initialization for a location-scale variational family |
116-
| `q_meanfield_gaussian` | [`Turing.Variational.q_meanfield_gaussian`](@ref) | Find a numerically non-degenerate initialization for a mean-field Gaussian family |
117-
| `q_fullrank_gaussian` | [`Turing.Variational.q_fullrank_gaussian`](@ref) | Find a numerically non-degenerate initialization for a full-rank Gaussian family |
113+
| Exported symbol | Documentation | Description |
114+
|:----------------------------- |:-------------------------------------------------------- |:------------------------------------------------------------------------------------------------------------------------------------------------- |
115+
| `vi` | [`Turing.vi`](@ref) | Perform variational inference |
116+
| `q_locationscale` | [`Turing.Variational.q_locationscale`](@ref) | Find a numerically non-degenerate initialization for a location-scale variational family |
117+
| `q_meanfield_gaussian` | [`Turing.Variational.q_meanfield_gaussian`](@ref) | Find a numerically non-degenerate initialization for a mean-field Gaussian family |
118+
| `q_fullrank_gaussian` | [`Turing.Variational.q_fullrank_gaussian`](@ref) | Find a numerically non-degenerate initialization for a full-rank Gaussian family |
119+
| `KLMinRepGradDescent` | [`Turing.Variational.KLMinRepGradDescent`](@ref) | KL divergence minimization via stochastic gradient descent with the reparameterization gradient |
120+
| `KLMinRepGradProxDescent` | [`Turing.Variational.KLMinRepGradProxDescent`](@ref) | KL divergence minimization via stochastic proximal gradient descent with the reparameterization gradient over location-scale variational families |
121+
| `KLMinScoreGradDescent` | [`Turing.Variational.KLMinScoreGradDescent`](@ref) | KL divergence minimization via stochastic gradient descent with the score gradient |
122+
| `KLMinWassFwdBwd` | [`Turing.Variational.KLMinWassFwdBwd`](@ref) | KL divergence minimization via Wasserstein proximal gradient descent |
123+
| `KLMinNaturalGradDescent` | [`Turing.Variational.KLMinNaturalGradDescent`](@ref) | KL divergence minimization via natural gradient descent |
124+
| `KLMinSqrtNaturalGradDescent` | [`Turing.Variational.KLMinSqrtNaturalGradDescent`](@ref) | KL divergence minimization via natural gradient descent in the square-root parameterization |
125+
| `FisherMinBatchMatch` | [`Turing.Variational.FisherMinBatchMatch`](@ref) | Covariance-weighted Fisher divergence minimization via the batch-and-match algorithm |
118126

119127
### Automatic differentiation types
120128

ext/TuringDynamicHMCExt.jl

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,8 @@ State of the [`DynamicNUTS`](@ref) sampler.
3535
# Fields
3636
$(TYPEDFIELDS)
3737
"""
38-
struct DynamicNUTSState{L,V<:DynamicPPL.AbstractVarInfo,C,M,S}
38+
struct DynamicNUTSState{L,C,M,S}
3939
logdensity::L
40-
vi::V
4140
"Cache of sample, log density, and gradient of log density evaluation."
4241
cache::C
4342
metric::M
@@ -70,9 +69,8 @@ function Turing.Inference.initialstep(
7069
Q, _ = DynamicHMC.mcmc_next_step(steps, results.final_warmup_state.Q)
7170

7271
# Create first sample and state.
73-
vi = DynamicPPL.unflatten(vi, Q.q)
74-
sample = Turing.Inference.Transition(model, vi, nothing)
75-
state = DynamicNUTSState(ℓ, vi, Q, steps.H.κ, steps.ϵ)
72+
sample = DynamicPPL.ParamsWithStats(Q.q, ℓ)
73+
state = DynamicNUTSState(ℓ, Q, steps.H.κ, steps.ϵ)
7674

7775
return sample, state
7876
end
@@ -85,15 +83,13 @@ function AbstractMCMC.step(
8583
kwargs...,
8684
)
8785
# Compute next sample.
88-
vi = state.vi
8986
= state.logdensity
9087
steps = DynamicHMC.mcmc_steps(rng, spl.sampler, state.metric, ℓ, state.stepsize)
9188
Q, _ = DynamicHMC.mcmc_next_step(steps, state.cache)
9289

9390
# Create next sample and state.
94-
vi = DynamicPPL.unflatten(vi, Q.q)
95-
sample = Turing.Inference.Transition(model, vi, nothing)
96-
newstate = DynamicNUTSState(ℓ, vi, Q, state.metric, state.stepsize)
91+
sample = DynamicPPL.ParamsWithStats(Q.q, ℓ)
92+
newstate = DynamicNUTSState(ℓ, Q, state.metric, state.stepsize)
9793

9894
return sample, newstate
9995
end

0 commit comments

Comments
 (0)