Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ uuid = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
keywords = ["markov chain monte carlo", "probablistic programming"]
license = "MIT"
desc = "Chain types and utility functions for MCMC simulations."
version = "7.3.0"
version = "7.4.0"

[deps]
AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
Expand Down
122 changes: 122 additions & 0 deletions docs/src/statsplots.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,11 +183,133 @@ forestplot(chn, [:C, :B, :A], hpd_val = [0.05, 0.15, 0.25])
forestplot(chn, chn.name_map[:parameters], hpd_val = [0.05, 0.15, 0.25], ordered = true)
```

## Posterior Predictive Checks (PPC)

Posterior Predictive Checks (PPC) are essential tools for Bayesian model validation. They compare observed data with samples from the posterior predictive distribution to assess whether the model can reproduce key features of the data. Prior Predictive Checks can also be performed to evaluate prior appropriateness before seeing the data.

```@example statsplots
using Random
Random.seed!(123)

# Generate posterior samples (parameters)
n_iter = 500
posterior_data = randn(n_iter, 2, 2) # μ, σ parameters
posterior_chains = Chains(posterior_data, [:μ, :σ])

# Generate posterior predictive samples
n_obs = 20
pp_data = zeros(n_iter, n_obs, 2)
for i in 1:n_iter, j in 1:2
μ = posterior_data[i, 1, j]
σ = abs(posterior_data[i, 2, j]) + 0.5 # Ensure positive σ
pp_data[i, :, j] = randn(n_obs) * σ .+ μ
end
pp_chains = Chains(pp_data)

# Generate observed data
Random.seed!(456)
observed = randn(n_obs) * 1.2 .+ 0.3

# Basic posterior predictive check (density overlay)
# Note: observed data is shown by default for posterior checks
ppcplot(posterior_chains, pp_chains, observed)
```

### Plot Types

Our PPC implementation supports four main plot types:

#### Density Plots (Default)
```@example statsplots
# Density overlay with customized transparency
ppcplot(posterior_chains, pp_chains, observed;
kind=:density, alpha=0.3, num_pp_samples=50)
```

#### Histogram Comparison
```@example statsplots
# Normalized histogram comparison
ppcplot(posterior_chains, pp_chains, observed; kind=:histogram)
```

#### Cumulative Distribution Functions
```@example statsplots
# Empirical CDFs comparison
ppcplot(posterior_chains, pp_chains, observed; kind=:cumulative)
```

#### Scatter Plots with Jitter
```@example statsplots
# Index-based scatter plot with automatic jitter for small samples
ppcplot(posterior_chains, pp_chains, observed;
kind=:scatter, num_pp_samples=8, jitter=0.3)
```

### Advanced Styling and Options

```@example statsplots
# Comprehensive customization example
ppcplot(posterior_chains, pp_chains, observed;
kind=:density,
colors=[:steelblue, :darkred, :orange], # [predictive, observed, mean]
alpha=0.25,
observed_rug=true, # Add rug plot for observed data
num_pp_samples=75, # Limit predictive samples shown
mean_pp=true, # Show predictive mean
legend=true,
random_seed=42) # Reproducible subsampling
```

### Prior Predictive Checks

Prior predictive checks assess whether priors generate reasonable data before observing actual data. The `ppc_group` parameter controls default behavior:

```@example statsplots
# Prior predictive check - observed data hidden by default
ppcplot(posterior_chains, pp_chains, observed; ppc_group=:prior)
```

```@example statsplots
# Prior check with observed data explicitly shown for comparison
ppcplot(posterior_chains, pp_chains, observed;
ppc_group=:prior, observed=true, alpha=0.4)
```

### Controlling Observed Data Display

You can explicitly control whether observed data is shown regardless of the check type:

```@example statsplots
# Posterior check without observed data
ppcplot(posterior_chains, pp_chains, observed;
ppc_group=:posterior, observed=false)
```

### Performance and Sampling Control

For large datasets or when you want to reduce visual clutter:

```@example statsplots
# Limit the number of predictive samples displayed
ppcplot(posterior_chains, pp_chains, observed;
num_pp_samples=25,
random_seed=123) # Reproducible results
```

```julia
ppcplot(posterior_chains::Chains, posterior_predictive_chains::Chains, observed_data::Vector;
kind=:density, alpha=nothing, num_pp_samples=nothing, mean_pp=true, observed=nothing,
observed_rug=false, colors=[:steelblue, :black, :orange], jitter=nothing,
legend=true, random_seed=nothing, ppc_group=:posterior)
```

## API

```@docs
energyplot
energyplot!
ppcplot
ppcplot!
ridgelineplot
ridgelineplot!
forestplot
Expand Down
3 changes: 2 additions & 1 deletion src/MCMCChains.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ import StatsBase:
quantile,
sample,
summarystats,
cov
cov,
ecdf

import MCMCDiagnosticTools
import MLJModelInterface
Expand Down
Loading
Loading