Skip to content

Commit 98a733e

Browse files
Implement PPC Plots
1 parent 6909f74 commit 98a733e

File tree

5 files changed

+595
-1
lines changed

5 files changed

+595
-1
lines changed

docs/src/statsplots.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,11 +183,133 @@ forestplot(chn, [:C, :B, :A], hpd_val = [0.05, 0.15, 0.25])
183183
forestplot(chn, chn.name_map[:parameters], hpd_val = [0.05, 0.15, 0.25], ordered = true)
184184
```
185185

186+
## Posterior Predictive Checks (PPC)
187+
188+
Posterior Predictive Checks (PPC) are essential tools for Bayesian model validation. They compare observed data with samples from the posterior predictive distribution to assess whether the model can reproduce key features of the data. Prior Predictive Checks can also be performed to evaluate prior appropriateness before seeing the data.
189+
190+
```@example statsplots
191+
using Random
192+
Random.seed!(123)
193+
194+
# Generate posterior samples (parameters)
195+
n_iter = 500
196+
posterior_data = randn(n_iter, 2, 2) # μ, σ parameters
197+
posterior_chains = Chains(posterior_data, [:μ, :σ])
198+
199+
# Generate posterior predictive samples
200+
n_obs = 20
201+
pp_data = zeros(n_iter, n_obs, 2)
202+
for i in 1:n_iter, j in 1:2
203+
μ = posterior_data[i, 1, j]
204+
σ = abs(posterior_data[i, 2, j]) + 0.5 # Ensure positive σ
205+
pp_data[i, :, j] = randn(n_obs) * σ .+ μ
206+
end
207+
pp_chains = Chains(pp_data)
208+
209+
# Generate observed data
210+
Random.seed!(456)
211+
observed = randn(n_obs) * 1.2 .+ 0.3
212+
213+
# Basic posterior predictive check (density overlay)
214+
# Note: observed data is shown by default for posterior checks
215+
ppcplot(posterior_chains, pp_chains, observed)
216+
```
217+
218+
### Plot Types
219+
220+
Our PPC implementation supports four main plot types:
221+
222+
#### Density Plots (Default)
223+
```@example statsplots
224+
# Density overlay with customized transparency
225+
ppcplot(posterior_chains, pp_chains, observed;
226+
kind=:density, alpha=0.3, num_pp_samples=50)
227+
```
228+
229+
#### Histogram Comparison
230+
```@example statsplots
231+
# Normalized histogram comparison
232+
ppcplot(posterior_chains, pp_chains, observed; kind=:histogram)
233+
```
234+
235+
#### Cumulative Distribution Functions
236+
```@example statsplots
237+
# Empirical CDFs comparison
238+
ppcplot(posterior_chains, pp_chains, observed; kind=:cumulative)
239+
```
240+
241+
#### Scatter Plots with Jitter
242+
```@example statsplots
243+
# Index-based scatter plot with automatic jitter for small samples
244+
ppcplot(posterior_chains, pp_chains, observed;
245+
kind=:scatter, num_pp_samples=8, jitter=0.3)
246+
```
247+
248+
### Advanced Styling and Options
249+
250+
```@example statsplots
251+
# Comprehensive customization example
252+
ppcplot(posterior_chains, pp_chains, observed;
253+
kind=:density,
254+
colors=[:steelblue, :darkred, :orange], # [predictive, observed, mean]
255+
alpha=0.25,
256+
observed_rug=true, # Add rug plot for observed data
257+
num_pp_samples=75, # Limit predictive samples shown
258+
mean_pp=true, # Show predictive mean
259+
legend=true,
260+
random_seed=42) # Reproducible subsampling
261+
```
262+
263+
### Prior Predictive Checks
264+
265+
Prior predictive checks assess whether priors generate reasonable data before observing actual data. The `ppc_group` parameter controls default behavior:
266+
267+
```@example statsplots
268+
# Prior predictive check - observed data hidden by default
269+
ppcplot(posterior_chains, pp_chains, observed; ppc_group=:prior)
270+
```
271+
272+
```@example statsplots
273+
# Prior check with observed data explicitly shown for comparison
274+
ppcplot(posterior_chains, pp_chains, observed;
275+
ppc_group=:prior, observed=true, alpha=0.4)
276+
```
277+
278+
### Controlling Observed Data Display
279+
280+
You can explicitly control whether observed data is shown regardless of the check type:
281+
282+
```@example statsplots
283+
# Posterior check without observed data
284+
ppcplot(posterior_chains, pp_chains, observed;
285+
ppc_group=:posterior, observed=false)
286+
```
287+
288+
### Performance and Sampling Control
289+
290+
For large datasets or when you want to reduce visual clutter:
291+
292+
```@example statsplots
293+
# Limit the number of predictive samples displayed
294+
ppcplot(posterior_chains, pp_chains, observed;
295+
num_pp_samples=25,
296+
random_seed=123) # Reproducible results
297+
```
298+
299+
```julia
300+
ppcplot(posterior_chains::Chains, posterior_predictive_chains::Chains, observed_data::Vector;
301+
kind=:density, alpha=nothing, num_pp_samples=nothing, mean_pp=true, observed=nothing,
302+
observed_rug=false, colors=[:steelblue, :black, :orange], jitter=nothing,
303+
legend=true, random_seed=nothing, ppc_group=:posterior)
304+
```
305+
186306
## API
187307

188308
```@docs
189309
energyplot
190310
energyplot!
311+
ppcplot
312+
ppcplot!
191313
ridgelineplot
192314
ridgelineplot!
193315
forestplot

src/MCMCChains.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ import StatsBase:
1919
quantile,
2020
sample,
2121
summarystats,
22-
cov
22+
cov,
23+
ecdf
2324

2425
import MCMCDiagnosticTools
2526
import MLJModelInterface

0 commit comments

Comments
 (0)