Type: | Package |
Title: | Plotting for Bayesian Models |
Version: | 1.11.1 |
Date: | 2024-02-14 |
Maintainer: | Jonah Gabry <jsg2201@columbia.edu> |
Description: | Plotting functions for posterior analysis, MCMC diagnostics, prior and posterior predictive checks, and other visualizations to support the applied Bayesian workflow advocated in Gabry, Simpson, Vehtari, Betancourt, and Gelman (2019) <doi:10.1111/rssa.12378>. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with 'Stan'. |
License: | GPL (≥ 3) |
URL: | https://mc-stan.org/bayesplot/ |
BugReports: | https://github.com/stan-dev/bayesplot/issues/ |
SystemRequirements: | pandoc (>= 1.12.3), pandoc-citeproc |
Depends: | R (≥ 3.1.0) |
Imports: | dplyr (≥ 0.8.0), ggplot2 (≥ 3.4.0), ggridges (≥ 0.5.5), glue, posterior, reshape2, rlang (≥ 0.3.0), stats, tibble (≥ 2.0.0), tidyselect, utils |
Suggests: | ggfortify, gridExtra (≥ 2.2.1), hexbin, knitr (≥ 1.16), loo (≥ 2.0.0), RColorBrewer, rmarkdown (≥ 1.0.0), rstan (≥ 2.17.1), rstanarm (≥ 2.17.4), rstantools (≥ 1.5.0), scales, shinystan (≥ 2.3.0), survival, testthat (≥ 2.0.0), vdiffr (≥ 1.0.2) |
RoxygenNote: | 7.3.0 |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2024-02-14 20:46:03 UTC; jgabry |
Author: | Jonah Gabry [aut, cre], Tristan Mahr [aut], Paul-Christian Bürkner [ctb], Martin Modrák [ctb], Malcolm Barrett [ctb], Frank Weber [ctb], Eduardo Coronado Sroka [ctb], Teemu Sailynoja [ctb], Aki Vehtari [ctb] |
Repository: | CRAN |
Date/Publication: | 2024-02-15 05:30:11 UTC |
bayesplot: Plotting for Bayesian Models
Description
Stan Development Team
The bayesplot package provides a variety of ggplot2-based plotting functions for use after fitting Bayesian models (typically, though not exclusively, via Markov chain Monte Carlo). The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of packages for Bayesian modeling, particularly (but not necessarily) packages powered by RStan (the R interface to Stan). Examples of packages that will soon (or already are) using bayesplot are rstan itself, as well as the rstan-dependent rstanarm and brms packages for applied regression modeling.
Plotting functionality
The plotting functions in bayesplot are organized into several modules:
-
MCMC: Visualizations of Markov chain Monte Carlo (MCMC) simulations generated by any MCMC algorithm as well as diagnostics. There are also additional functions specifically for use with models fit using the No-U-Turn Sampler (NUTS).
-
PPC: Graphical (posterior or prior) predictive checks (PPCs).
-
PPD: Plots of (posterior or prior) predictive distributions without comparisons to observed data.
Resources
-
Online documentation and vignettes: Visit the bayesplot website at https://mc-stan.org/bayesplot/
-
Bug reports and feature requests: If you would like to request a new feature or if you have noticed a bug that needs to be fixed please let us know at the bayesplot issue tracker at https://github.com/stan-dev/bayesplot/issues/
-
General questions and help: To ask a question about bayesplot on the Stan Forums forum please visit https://discourse.mc-stan.org.
Author(s)
Maintainer: Jonah Gabry jsg2201@columbia.edu
Authors:
Tristan Mahr
Other contributors:
Paul-Christian Bürkner [contributor]
Martin Modrák [contributor]
Malcolm Barrett [contributor]
Frank Weber [contributor]
Eduardo Coronado Sroka [contributor]
Teemu Sailynoja [contributor]
Aki Vehtari [contributor]
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
theme_default()
for the default ggplot theme used by
bayesplot and bayesplot_theme_set()
to change it.
bayesplot-colors to set or view the color scheme used for plotting.
ggplot2::ggsave()
for saving plots.
Examples
# A few quick examples (all of the functions have many examples
# on their individual help pages)
# MCMC plots
x <- example_mcmc_draws(params = 5)
mcmc_intervals(x, prob = 0.5)
mcmc_intervals(x, regex_pars = "beta")
color_scheme_set("purple")
mcmc_areas(x, regex_pars = "beta", prob = 0.8)
color_scheme_set("mix-blue-red")
mcmc_trace(x, pars = c("alpha", "sigma"),
facet_args = list(nrow = 2))
color_scheme_set("brightblue")
mcmc_scatter(x, pars = c("beta[1]", "sigma"),
transformations = list(sigma = "log"))
# Graphical PPCs
y <- example_y_data()
yrep <- example_yrep_draws()
ppc_dens_overlay(y, yrep[1:50, ])
color_scheme_set("pink")
ppc_stat(y, yrep, stat = "median") + grid_lines()
ppc_hist(y, yrep[1:8, ])
# Same plots but without y (using ppd_ instead of ppc_)
bayesplot_theme_set(ggplot2::theme_gray())
ypred <- yrep
ppd_dens_overlay(ypred[1:50, ])
ppd_stat(ypred, stat = "median") + grid_lines()
ppd_hist(ypred[1:8, ])
Get or view the names of available plotting or data functions
Description
Get or view the names of available plotting or data functions
Usage
available_ppc(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE)
available_ppd(pattern = NULL, fixed = FALSE, invert = FALSE, plots_only = TRUE)
available_mcmc(
pattern = NULL,
fixed = FALSE,
invert = FALSE,
plots_only = TRUE
)
Arguments
pattern , fixed , invert |
Passed to |
plots_only |
If |
Value
A possibly empty character vector of function names with several
additional attributes (for use by a custom print method). If pattern
is missing then the returned object contains the names of all available
plotting functions in the MCMC, PPC, or PPD module, depending on
which function is called. If pattern
is specified then a subset of
function names is returned.
Examples
available_mcmc()
available_mcmc("nuts")
available_mcmc("rhat|neff")
available_ppc()
available_ppc("grouped")
available_ppc("grouped", invert = TRUE)
available_ppd()
available_ppd("grouped")
# can also see which functions that return data are available
available_ppc(plots_only = FALSE)
# only show the _data functions
available_ppc("_data", plots_only = FALSE)
available_ppd("_data", plots_only = FALSE)
available_mcmc("_data", plots_only = FALSE)
Arrange plots in a grid
Description
The bayesplot_grid
function makes it simple to juxtapose plots using
common x
and/or y
axes.
Usage
bayesplot_grid(
...,
plots = list(),
xlim = NULL,
ylim = NULL,
grid_args = list(),
titles = character(),
subtitles = character(),
legends = TRUE,
save_gg_objects = TRUE
)
Arguments
... |
One or more ggplot objects. |
plots |
A list of ggplot objects. Can be used as an alternative to
specifying plot objects via |
xlim , ylim |
Optionally, numeric vectors of length 2 specifying lower and upper limits for the axes that will be shared across all plots. |
grid_args |
An optional named list of arguments to pass to
|
titles , subtitles |
Optional character vectors of plot titles and
subtitles. If specified, |
legends |
If any of the plots have legends should they be displayed?
Defaults to |
save_gg_objects |
If |
Value
An object of class "bayesplot_grid"
(essentially a gtable object
from gridExtra::arrangeGrob()
), which has a plot
method.
Examples
y <- example_y_data()
yrep <- example_yrep_draws()
stats <- c("sd", "median", "max", "min")
color_scheme_set("pink")
bayesplot_grid(
plots = lapply(stats, function(s) ppc_stat(y, yrep, stat = s)),
titles = stats,
legends = FALSE,
grid_args = list(ncol = 1)
)
## Not run:
library(rstanarm)
mtcars$log_mpg <- log(mtcars$mpg)
fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0)
fit2 <- stan_glm(log_mpg ~ wt, data = mtcars, refresh = 0)
y <- mtcars$mpg
yrep1 <- posterior_predict(fit1, draws = 50)
yrep2 <- posterior_predict(fit2, fun = exp, draws = 50)
color_scheme_set("blue")
ppc1 <- ppc_dens_overlay(y, yrep1)
ppc1
ppc1 + yaxis_text()
color_scheme_set("red")
ppc2 <- ppc_dens_overlay(y, yrep2)
bayesplot_grid(ppc1, ppc2)
# make sure the plots use the same limits for the axes
bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2))
# remove the legends and add text
bayesplot_grid(ppc1, ppc2, xlim = c(-5, 60), ylim = c(0, 0.2),
legends = FALSE, subtitles = rep("Predicted MPG", 2))
## End(Not run)
Get, set, and modify the active bayesplot theme
Description
These functions are the bayesplot equivalent to
ggplot2's ggplot2::theme_set()
and friends. They set, get, and update
the active theme but only apply them to bayesplots
. The current/active
theme is automatically applied to every bayesplot
you draw.
Use bayesplot_theme_get()
to get the current bayesplot theme and
bayesplot_theme_set()
to set a new theme. bayesplot_theme_update()
and
bayesplot_theme_replace()
are shorthands for changing individual elements.
Usage
bayesplot_theme_get()
bayesplot_theme_set(new = theme_default())
bayesplot_theme_update(...)
bayesplot_theme_replace(...)
Arguments
new |
The new theme (list of theme elements) to use. This is analogous
to the |
... |
A named list of theme settings. |
Details
bayesplot_theme_set()
and friends only apply to bayesplots
.
However, ggplot2::theme_set()
can also be used to change the
bayesplot theme. Currently, setting a theme with ggplot2::theme_set()
(other than the ggplot2 default ggplot2::theme_grey()
) will override
the bayesplot theme.
Value
bayesplot_theme_get()
returns the current theme. The other three
functions (set, update, replace) invisibly return the previous theme
so it can be saved and easily restored later. This is the same behavior as
the ggplot2 versions of these functions.
See Also
theme_default()
for the default bayesplot theme.
bayesplot-helpers for a variety of convenience functions, many of which provide shortcuts for tweaking theme elements after creating a plot.
bayesplot-colors to set or view the color scheme used for plotting.
Examples
library(ggplot2)
# plot using the current value of bayesplot_theme_get()
# (the default is bayesplot::theme_default())
x <- example_mcmc_draws()
mcmc_hist(x)
# change the bayesplot theme to theme_minimal and save the old theme
old <- bayesplot_theme_set(theme_minimal())
mcmc_hist(x)
# change back to the previous theme
bayesplot_theme_set(old)
mcmc_hist(x)
# change the default font size and family for bayesplots
bayesplot_theme_update(text = element_text(size = 16, family = "sans"))
mcmc_hist(x)
# change back to the default
bayesplot_theme_set() # same as bayesplot_theme_set(theme_default())
mcmc_hist(x)
# updating theme elements
color_scheme_set("brightblue")
bayesplot_theme_set(theme_dark())
mcmc_hist(x)
bayesplot_theme_update(panel.background = element_rect(fill = "black"))
mcmc_hist(x)
# to get the same plot without updating the theme we could also have
# used the bayeplot convenience function panel_bg()
bayesplot_theme_set(theme_dark())
mcmc_hist(x) + panel_bg(fill = "black")
# reset
bayesplot_theme_set()
Set, get, or view bayesplot color schemes
Description
Set, get, or view color schemes. Choose from a preset scheme or create a custom scheme. See the Available color schemes section below for a list of available scheme names. The Custom color schemes section describes how to specify a custom scheme.
Usage
color_scheme_set(scheme = "blue")
color_scheme_get(scheme = NULL, i = NULL)
color_scheme_view(scheme = NULL)
Arguments
scheme |
For For For See the Available color schemes section below for a list of available scheme names. The Custom color schemes section describes how to specify a custom scheme. |
i |
For |
Value
color_scheme_set()
has the side effect of setting the color scheme
used for plotting. It also returns (invisibly) a list of
the hexadecimal color values used in scheme
.
color_scheme_get()
returns a list of the hexadecimal color
values (without changing the current scheme). If the scheme
argument
is not specified the returned values correspond to the current color
scheme. If the optional argument i
is specified then the returned
list only contains length(i)
elements.
color_scheme_view()
returns a ggplot object if only a single scheme is
specified and a gtable object if multiple schemes names are specified.
Available color schemes
Currently, the available preset color schemes are:
-
"blue"
,"brightblue"
-
"gray"
,"darkgray"
-
"green"
-
"pink"
-
"purple"
-
"red"
-
"teal"
-
"yellow"
-
"viridis"
,"viridisA"
,"viridisB"
,"viridisC"
,"viridisD"
,"viridisE"
-
"mix-x-y"
, replacingx
andy
with any two of the scheme names listed above (e.g. "mix-teal-pink", "mix-blue-red", etc.). The order ofx
andy
matters, i.e., the color schemes"mix-blue-red"
and"mix-red-blue"
are not identical. There is no guarantee that every possible mixed scheme will look good with every possible plot. -
"brewer-x"
, replacingx
with the name of a palette available fromRColorBrewer::brewer.pal()
(e.g.,brewer-PuBuGn
).
If you have a suggestion for a new color scheme please let us know via the bayesplot issue tracker.
Custom color schemes
A bayesplot color scheme consists of six colors. To specify a custom color scheme simply pass a character vector containing either the names of six colors or six hexadecimal color values (or a mix of names and hex values). The colors should be in order from lightest to darkest. See the end of the Examples section for a demonstration.
See Also
theme_default()
for the default ggplot theme used by
bayesplot and bayesplot_theme_set()
to change it.
Examples
color_scheme_set("blue")
color_scheme_view()
color_scheme_get()
color_scheme_get(i = c(3, 5)) # 3rd and 5th colors only
color_scheme_get("brightblue")
color_scheme_view("brightblue")
# compare multiple schemes
color_scheme_view(c("pink", "gray", "teal"))
color_scheme_view(c("viridis", "viridisA", "viridisB", "viridisC"))
color_scheme_set("pink")
x <- example_mcmc_draws()
mcmc_intervals(x)
color_scheme_set("teal")
color_scheme_view()
mcmc_intervals(x)
color_scheme_set("red")
mcmc_areas(x, regex_pars = "beta")
color_scheme_set("purple")
color_scheme_view()
y <- example_y_data()
yrep <- example_yrep_draws()
ppc_stat(y, yrep, stat = "mean") + legend_none()
############################
### Mixing color schemes ###
############################
color_scheme_set("mix-teal-pink")
ppc_stat(y, yrep, stat = "sd") + legend_none()
mcmc_areas(x, regex_pars = "beta")
##########################
### ColorBrewer scheme ###
##########################
color_scheme_set("brewer-Spectral")
color_scheme_view()
mcmc_trace(x, pars = "sigma")
###########################
### Custom color scheme ###
###########################
orange_scheme <- c("#ffebcc", "#ffcc80",
"#ffad33", "#e68a00",
"#995c00", "#663d00")
color_scheme_set(orange_scheme)
color_scheme_view()
mcmc_areas(x, regex_pars = "alpha")
mcmc_dens_overlay(x)
ppc_stat(y, yrep, stat = "var") + legend_none()
Extract quantities needed for plotting from model objects
Description
Generics and methods for extracting quantities needed for plotting from various types of model objects. Currently methods are provided for stanfit (rstan), CmdStanMCMC (cmdstanr), and stanreg (rstanarm) objects, but adding new methods should be relatively straightforward.
Usage
log_posterior(object, ...)
nuts_params(object, ...)
rhat(object, ...)
neff_ratio(object, ...)
## S3 method for class 'stanfit'
log_posterior(object, inc_warmup = FALSE, ...)
## S3 method for class 'stanreg'
log_posterior(object, inc_warmup = FALSE, ...)
## S3 method for class 'CmdStanMCMC'
log_posterior(object, inc_warmup = FALSE, ...)
## S3 method for class 'stanfit'
nuts_params(object, pars = NULL, inc_warmup = FALSE, ...)
## S3 method for class 'stanreg'
nuts_params(object, pars = NULL, inc_warmup = FALSE, ...)
## S3 method for class 'list'
nuts_params(object, pars = NULL, ...)
## S3 method for class 'CmdStanMCMC'
nuts_params(object, pars = NULL, ...)
## S3 method for class 'stanfit'
rhat(object, pars = NULL, ...)
## S3 method for class 'stanreg'
rhat(object, pars = NULL, regex_pars = NULL, ...)
## S3 method for class 'CmdStanMCMC'
rhat(object, pars = NULL, ...)
## S3 method for class 'stanfit'
neff_ratio(object, pars = NULL, ...)
## S3 method for class 'stanreg'
neff_ratio(object, pars = NULL, regex_pars = NULL, ...)
## S3 method for class 'CmdStanMCMC'
neff_ratio(object, pars = NULL, ...)
Arguments
object |
The object to use. |
... |
Arguments passed to individual methods. |
inc_warmup |
A logical scalar (defaulting to |
pars |
An optional character vector of parameter names. For
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
Value
log_posterior()
-
log_posterior()
methods return a molten data frame (seereshape2::melt()
). The data frame should have columns"Iteration"
(integer),"Chain"
(integer), and"Value"
(numeric). See Examples, below. nuts_params()
-
nuts_params()
methods return a molten data frame (seereshape2::melt()
). The data frame should have columns"Parameter"
(factor),"Iteration"
(integer),"Chain"
(integer), and"Value"
(numeric). See Examples, below. rhat()
,neff_ratio()
-
Methods return (named) vectors.
See Also
Examples
## Not run:
library(rstanarm)
fit <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0)
np <- nuts_params(fit)
head(np)
tail(np)
lp <- log_posterior(fit)
head(lp)
tail(lp)
## End(Not run)
Convenience functions for adding or changing plot details
Description
Convenience functions for adding to (and changing details of) ggplot objects (many of the objects returned by bayesplot functions). See the Examples section, below.
Usage
vline_at(v, fun, ..., na.rm = TRUE)
hline_at(v, fun, ..., na.rm = TRUE)
vline_0(..., na.rm = TRUE)
hline_0(..., na.rm = TRUE)
abline_01(..., na.rm = TRUE)
lbub(p, med = TRUE)
legend_move(position = "right")
legend_none()
legend_text(...)
xaxis_title(on = TRUE, ...)
xaxis_text(on = TRUE, ...)
xaxis_ticks(on = TRUE, ...)
yaxis_title(on = TRUE, ...)
yaxis_text(on = TRUE, ...)
yaxis_ticks(on = TRUE, ...)
facet_text(on = TRUE, ...)
facet_bg(on = TRUE, ...)
panel_bg(on = TRUE, ...)
plot_bg(on = TRUE, ...)
grid_lines(color = "gray50", size = 0.2)
overlay_function(...)
Arguments
v |
Either a numeric vector specifying the value(s) at which to
draw the vertical or horizontal line(s), or an object of any type to use as
the first argument to |
fun |
A function, or the name of a function, that returns a numeric vector. |
... |
For the various For functions ending in For functions ending in For For |
na.rm |
A logical scalar passed to the appropriate geom (e.g.
|
p |
The probability mass (in |
med |
Should the median also be included in addition to the lower and upper bounds of the interval? |
position |
The position of the legend. Either a numeric vector (of
length 2) giving the relative coordinates (between 0 and 1) for the legend,
or a string among |
on |
For functions modifying ggplot theme elements,
set |
color , size |
Passed to |
Details
Add vertical, horizontal, and diagonal lines to plots
-
vline_at()
andhline_at()
return an object created by eitherggplot2::geom_vline()
orggplot2::geom_hline()
that can be added to a ggplot object to draw a vertical or horizontal line (at one or several values). Iffun
is missing then the lines are drawn at the values inv
. Iffun
is specified then the lines are drawn at the values returned byfun(v)
. -
vline_0()
andhline_0()
are wrappers forvline_at()
andhline_at()
withv = 0
andfun
missing. -
abline_01()
is a wrapper forggplot2::geom_abline()
with the intercept set to0
and the slope set to1
. -
lbub()
returns a function that takes a single argumentx
and returns the lower and upper bounds (lb
,ub
) of the100*p
\ ofx
, as well as the median (ifmed=TRUE
).
Control appearance of facet strips
-
facet_text()
returns ggplot2 theme objects that can be added to an existing plot (ggplot object) to format the text in facet strips. -
facet_bg()
can be added to a plot to change the background of the facet strips.
Move legend, remove legend, or style the legend text
-
legend_move()
andlegend_none()
return a ggplot2 theme object that can be added to an existing plot (ggplot object) in order to change the position of the legend or remove it. -
legend_text()
works much likefacet_text()
but for the legend.
Control appearance of x
-axis and y
-axis features
-
xaxis_title()
andyaxis_title()
return a ggplot2 theme object that can be added to an existing plot (ggplot object) in order to toggle or format the titles displayed on thex
ory
axis. (To change the titles themselves useggplot2::labs()
.) -
xaxis_text()
andyaxis_text()
return a ggplot2 theme object that can be added to an existing plot (ggplot object) in order to toggle or format the text displayed on thex
ory
axis (e.g. tick labels). -
xaxis_ticks()
andyaxis_ticks()
return a ggplot2 theme object that can be added to an existing plot (ggplot object) to change the appearance of the axis tick marks.
Customize plot background
-
plot_bg()
returns a ggplot2 theme object that can be added to an existing plot (ggplot object) to format the background of the entire plot. -
panel_bg()
returns a ggplot2 theme object that can be added to an existing plot (ggplot object) to format the background of the just the plotting area. -
grid_lines()
returns a ggplot2 theme object that can be added to an existing plot (ggplot object) to add grid lines to the plot background.
Superimpose a function on an existing plot
-
overlay_function()
is a simple wrapper forggplot2::stat_function()
but with theinherit.aes
argument fixed toFALSE
. Fixinginherit.aes=FALSE
will avoid potential errors due to theggplot2::aes()
thetic mapping used by certain bayesplot plotting functions.
Value
A ggplot2 layer or ggplot2::theme()
object that can be
added to existing ggplot objects, like those created by many of the
bayesplot plotting functions. See the Details section.
See Also
theme_default()
for the default ggplot theme used by
bayesplot.
Examples
color_scheme_set("gray")
x <- example_mcmc_draws(chains = 1)
dim(x)
colnames(x)
###################################
### vertical & horizontal lines ###
###################################
(p <- mcmc_intervals(x, regex_pars = "beta"))
# vertical line at zero (with some optional styling)
p + vline_0()
p + vline_0(linewidth = 0.25, color = "darkgray", linetype = 2)
# vertical line(s) at specified values
v <- c(-0.5, 0, 0.5)
p + vline_at(v, linetype = 3, linewidth = 0.25)
my_lines <- vline_at(v, alpha = 0.25, linewidth = 0.75 * c(1, 2, 1),
color = c("maroon", "skyblue", "violet"))
p + my_lines
# add vertical line(s) at computed values
# (three ways of getting lines at column means)
color_scheme_set("brightblue")
p <- mcmc_intervals(x, regex_pars = "beta")
p + vline_at(x[, 3:4], colMeans)
p + vline_at(x[, 3:4], "colMeans", color = "darkgray",
lty = 2, linewidth = 0.25)
p + vline_at(x[, 3:4], function(a) apply(a, 2, mean),
color = "orange",
linewidth = 2, alpha = 0.1)
# using the lbub function to get interval lower and upper bounds (lb, ub)
color_scheme_set("pink")
parsed <- ggplot2::label_parsed
p2 <- mcmc_hist(x, pars = "beta[1]", binwidth = 1/20,
facet_args = list(labeller = parsed))
(p2 <- p2 + facet_text(size = 16))
b1 <- x[, "beta[1]"]
p2 + vline_at(b1, fun = lbub(0.8), color = "gray20",
linewidth = 2 * c(1,.5,1), alpha = 0.75)
p2 + vline_at(b1, lbub(0.8, med = FALSE), color = "gray20",
linewidth = 2, alpha = 0.75)
##########################
### format axis titles ###
##########################
color_scheme_set("green")
y <- example_y_data()
yrep <- example_yrep_draws()
(p3 <- ppc_stat(y, yrep, stat = "median", binwidth = 1/4))
# turn off the legend, turn on x-axis title
p3 +
legend_none() +
xaxis_title(size = 13, family = "sans") +
ggplot2::xlab(expression(italic(T(y)) == median(italic(y))))
################################
### format axis & facet text ###
################################
color_scheme_set("gray")
p4 <- mcmc_trace(example_mcmc_draws(), pars = c("alpha", "sigma"))
myfacets <-
facet_bg(fill = "gray30", color = NA) +
facet_text(face = "bold", color = "skyblue", size = 14)
p4 + myfacets
##########################
### control tick marks ###
##########################
p4 +
myfacets +
yaxis_text(FALSE) +
yaxis_ticks(FALSE) +
xaxis_ticks(linewidth = 1, color = "skyblue")
##############################
### change plot background ###
##############################
color_scheme_set("blue")
# add grid lines
ppc_stat(y, yrep) + grid_lines()
# panel_bg vs plot_bg
ppc_scatter_avg(y, yrep) + panel_bg(fill = "gray90")
ppc_scatter_avg(y, yrep) + plot_bg(fill = "gray90")
color_scheme_set("yellow")
p5 <- ppc_scatter_avg(y, yrep, alpha = 1)
p5 + panel_bg(fill = "gray20") + grid_lines(color = "white")
color_scheme_set("purple")
ppc_dens_overlay(y, yrep[1:30, ]) +
legend_text(size = 14) +
legend_move(c(0.75, 0.5)) +
plot_bg(fill = "gray90") +
panel_bg(color = "black", fill = "gray99", linewidth = 3)
###############################################
### superimpose a function on existing plot ###
###############################################
# compare posterior of beta[1] to Gaussian with same posterior mean
# and sd as beta[1]
x <- example_mcmc_draws(chains = 4)
dim(x)
purple_gaussian <-
overlay_function(
fun = dnorm,
args = list(mean(x[,, "beta[1]"]), sd(x[,, "beta[1]"])),
color = "purple",
linewidth = 2
)
color_scheme_set("gray")
mcmc_hist(x, pars = "beta[1]", freq = FALSE) + purple_gaussian
mcmc_dens(x, pars = "beta[1]") + purple_gaussian
Example draws to use in demonstrations and tests
Description
These functions return various objects containing data used in the examples throughout the bayesplot package documentation.
Usage
example_mcmc_draws(chains = 4, params = 4)
example_yrep_draws()
example_y_data()
example_x_data()
example_group_data()
Arguments
chains |
An integer between 1 and 4 indicating the desired number of chains. |
params |
An integer between 1 and 6 indicating the desired number of parameters. |
Details
Each of these functions returns an object containing data, parameter draws, or
predictions corresponding to a basic linear regression model with data
y
(outcome vector) and X
(predictor matrix), and parameters
alpha
(intercept), beta
(coefficient vector), and sigma
(error sd).
example_mcmc_draws()
-
If
chains > 1
, a250
(iterations) bychains
byparams
array or, ifchains = 1
, a250
byparams
matrix of MCMC draws from the posterior distribution of the parameters in the linear regression model described above. Ifparams = 1
then only the draws foralpha
are included in the returned object. Ifparams >= 2
then draws forsigma
are also included. And ifparams
is between3
and the maximum of6
then draws for regression coefficientsbeta[k]
(k
in1:(params-2)
) are also included. example_y_data()
-
A numeric vector with
434
observations of the outcome variable in the linear regression model. example_x_data()
-
A numeric vector with
434
observations of one of the predictor variables in the linear regression model. example_group_data()
-
A factor variable with
434
observations of a grouping variable with two levels. example_yrep_draws()
-
A
500
(draws) by434
(data points) matrix of draws from the posterior predictive distribution. Each row represents a full dataset drawn from the posterior predictive distribution of the outcomey
after fitting the linear regression model mentioned above.
Value
See Details.
Examples
draws <- example_mcmc_draws()
dim(draws)
dimnames(draws)
draws <- example_mcmc_draws(1, 2)
dim(draws)
colnames(draws)
draws <- example_mcmc_draws(params = 6)
dimnames(draws)[[3]]
y <- example_y_data()
x <- example_x_data()
group <- example_group_data()
length(y)
length(x)
length(group)
tail(data.frame(y, x, group), 5)
yrep <- example_yrep_draws()
dim(yrep) # ncol(yrep) = length(y) = length(x) = length(group)
Combination plots
Description
Combination plots
Usage
mcmc_combo(x, combo = c("dens", "trace"), ..., widths = NULL, gg_theme = NULL)
Arguments
x |
An object containing MCMC draws:
|
combo |
A character vector with at least two elements. Each element of
|
... |
Arguments passed to the plotting functions named in |
widths |
A numeric vector the same length as |
gg_theme |
Unlike most of the other bayesplot functions,
|
Value
A gtable object (the result of calling
gridExtra::arrangeGrob()
) with length(combo)
columns and
a row for each parameter.
See Also
Other MCMC:
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
# some parameter draws to use for demonstration
x <- example_mcmc_draws()
dim(x)
dimnames(x)
mcmc_combo(x, pars = c("alpha", "sigma"))
mcmc_combo(x, pars = c("alpha", "sigma"), widths = c(1, 2))
# change second plot, show log(sigma) instead of sigma,
# and remove the legends
color_scheme_set("mix-blue-red")
mcmc_combo(
x,
combo = c("dens_overlay", "trace"),
pars = c("alpha", "sigma"),
transformations = list(sigma = "log"),
gg_theme = legend_none()
)
# same thing but this time also change the entire ggplot theme
mcmc_combo(
x,
combo = c("dens_overlay", "trace"),
pars = c("alpha", "sigma"),
transformations = list(sigma = "log"),
gg_theme = ggplot2::theme_gray() + legend_none()
)
General MCMC diagnostics
Description
Plots of Rhat statistics, ratios of effective sample size to total sample size, and autocorrelation of MCMC draws. See the Plot Descriptions section, below, for details. For models fit using the No-U-Turn-Sampler, see also MCMC-nuts for additional MCMC diagnostic plots.
Usage
mcmc_rhat(rhat, ..., size = NULL)
mcmc_rhat_hist(rhat, ..., binwidth = NULL, bins = NULL, breaks = NULL)
mcmc_rhat_data(rhat, ...)
mcmc_neff(ratio, ..., size = NULL)
mcmc_neff_hist(ratio, ..., binwidth = NULL, bins = NULL, breaks = NULL)
mcmc_neff_data(ratio, ...)
mcmc_acf(
x,
pars = character(),
regex_pars = character(),
...,
facet_args = list(),
lags = 20,
size = NULL
)
mcmc_acf_bar(
x,
pars = character(),
regex_pars = character(),
...,
facet_args = list(),
lags = 20
)
Arguments
rhat |
A vector of R-hat estimates. |
... |
Currently ignored. |
size |
Optional values to override |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
ratio |
A vector of ratios of effective sample size estimates to
total sample size. See |
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
facet_args |
A named list of arguments (other than |
lags |
The number of lags to show in the autocorrelation plot. |
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
mcmc_rhat()
,mcmc_rhat_hist()
-
Rhat values as either points or a histogram. Values are colored using different shades (lighter is better). The chosen thresholds are somewhat arbitrary, but can be useful guidelines in practice.
-
light: below 1.05 (good)
-
mid: between 1.05 and 1.1 (ok)
-
dark: above 1.1 (too high)
-
mcmc_neff()
,mcmc_neff_hist()
-
Ratios of effective sample size to total sample size as either points or a histogram. Values are colored using different shades (lighter is better). The chosen thresholds are somewhat arbitrary, but can be useful guidelines in practice.
-
light: between 0.5 and 1 (high)
-
mid: between 0.1 and 0.5 (good)
-
dark: below 0.1 (low)
-
mcmc_acf()
,mcmc_acf_bar()
-
Grid of autocorrelation plots by chain and parameter. The
lags
argument gives the maximum number of lags at which to calculate the autocorrelation function.mcmc_acf()
is a line plot whereasmcmc_acf_bar()
is a barplot.
References
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org/users/documentation/
Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science. 7(4), 457–472.
See Also
The Visual MCMC Diagnostics vignette.
-
MCMC-nuts for additional MCMC diagnostic plots for models fit using the No-U-Turn-Sampler.
Other MCMC:
MCMC-combos
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
# autocorrelation
x <- example_mcmc_draws()
dim(x)
dimnames(x)
color_scheme_set("green")
mcmc_acf(x, pars = c("alpha", "beta[1]"))
color_scheme_set("pink")
(p <- mcmc_acf_bar(x, pars = c("alpha", "beta[1]")))
# add horiztonal dashed line at 0.5
p + hline_at(0.5, linetype = 2, size = 0.15, color = "gray")
# fake rhat values to use for demonstration
rhat <- c(runif(100, 1, 1.15))
mcmc_rhat_hist(rhat)
mcmc_rhat(rhat)
# lollipops
color_scheme_set("purple")
mcmc_rhat(rhat[1:10], size = 5)
color_scheme_set("blue")
mcmc_rhat(runif(1000, 1, 1.07))
mcmc_rhat(runif(1000, 1, 1.3)) + legend_move("top") # add legend above plot
# fake neff ratio values to use for demonstration
ratio <- c(runif(100, 0, 1))
mcmc_neff_hist(ratio)
mcmc_neff(ratio)
## Not run:
# Example using rstanarm model (requires rstanarm package)
library(rstanarm)
# intentionally use small 'iter' so there are some
# problems with rhat and neff for demonstration
fit <- stan_glm(mpg ~ ., data = mtcars, iter = 50, refresh = 0)
rhats <- rhat(fit)
ratios <- neff_ratio(fit)
mcmc_rhat(rhats)
mcmc_neff(ratios, size = 3)
# there's a small enough number of parameters in the
# model that we can display their names on the y-axis
mcmc_neff(ratios) + yaxis_text(hjust = 1)
# can also look at autocorrelation
draws <- as.array(fit)
mcmc_acf(draws, pars = c("wt", "cyl"), lags = 10)
# increase number of iterations and plots look much better
fit2 <- update(fit, iter = 500)
mcmc_rhat(rhat(fit2))
mcmc_neff(neff_ratio(fit2))
mcmc_acf(as.array(fit2), pars = c("wt", "cyl"), lags = 10)
## End(Not run)
Histograms and kernel density plots of MCMC draws
Description
Various types of histograms and kernel density plots of MCMC draws. See the Plot Descriptions section, below, for details.
Usage
mcmc_hist(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE,
alpha = 1
)
mcmc_dens(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
trim = FALSE,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL,
alpha = 1
)
mcmc_hist_by_chain(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
freq = TRUE,
alpha = 1
)
mcmc_dens_overlay(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
color_chains = TRUE,
trim = FALSE,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_dens_chains(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
color_chains = TRUE,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_dens_chains_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_violin(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
probs = c(0.1, 0.5, 0.9)
)
Arguments
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
facet_args |
A named list of arguments (other than |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
alpha |
Passed to the geom to control the transparency. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens |
Optional arguments passed to
|
color_chains |
Option for whether to separately color chains. |
probs |
A numeric vector passed to |
Value
A ggplot object that can be further customized using the ggplot2 package.
Plot Descriptions
mcmc_hist()
-
Histograms of posterior draws with all chains merged.
mcmc_dens()
-
Kernel density plots of posterior draws with all chains merged.
mcmc_hist_by_chain()
-
Histograms of posterior draws with chains separated via faceting.
mcmc_dens_overlay()
-
Kernel density plots of posterior draws with chains separated but overlaid on a single plot.
mcmc_violin()
-
The density estimate of each chain is plotted as a violin with horizontal lines at notable quantiles.
mcmc_dens_chains()
-
Ridgeline kernel density plots of posterior draws with chains separated but overlaid on a single plot. In
mcmc_dens_overlay()
parameters appear in separate facets; inmcmc_dens_chains()
they appear in the same panel and can overlap vertically.
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
set.seed(9262017)
# some parameter draws to use for demonstration
x <- example_mcmc_draws()
dim(x)
dimnames(x)
##################
### Histograms ###
##################
# histograms of all parameters
color_scheme_set("brightblue")
mcmc_hist(x)
# histograms of some parameters
color_scheme_set("pink")
mcmc_hist(x, pars = c("alpha", "beta[2]"))
mcmc_hist(x, pars = "sigma", regex_pars = "beta")
# example of using 'transformations' argument to plot log(sigma),
# and parsing facet labels (e.g. to get greek letters for parameters)
mcmc_hist(x, transformations = list(sigma = "log"),
facet_args = list(labeller = ggplot2::label_parsed)) +
facet_text(size = 15)
# instead of list(sigma = "log"), you could specify the transformation as
# list(sigma = log) or list(sigma = function(x) log(x)), but then the
# label for the transformed sigma is 't(sigma)' instead of 'log(sigma)'
mcmc_hist(x, transformations = list(sigma = log))
# separate histograms by chain
color_scheme_set("pink")
mcmc_hist_by_chain(x, regex_pars = "beta")
#################
### Densities ###
#################
mcmc_dens(x, pars = c("sigma", "beta[2]"),
facet_args = list(nrow = 2))
# separate and overlay chains
color_scheme_set("mix-teal-pink")
mcmc_dens_overlay(x, pars = c("sigma", "beta[2]"),
facet_args = list(nrow = 2)) +
facet_text(size = 14)
x2 <- example_mcmc_draws(params = 6)
mcmc_dens_chains(x2, pars = c("beta[1]", "beta[2]", "beta[3]"))
# separate chains as violin plots
color_scheme_set("green")
mcmc_violin(x) + panel_bg(color = "gray20", size = 2, fill = "gray30")
Plot interval estimates from MCMC draws
Description
Plot central (quantile-based) posterior interval estimates from MCMC draws. See the Plot Descriptions section, below, for details.
Usage
mcmc_intervals(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
prob = 0.5,
prob_outer = 0.9,
point_est = c("median", "mean", "none"),
outer_size = 0.5,
inner_size = 2,
point_size = 4,
rhat = numeric()
)
mcmc_areas(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
area_method = c("equal area", "equal height", "scaled height"),
prob = 0.5,
prob_outer = 1,
point_est = c("median", "mean", "none"),
rhat = numeric(),
border_size = NULL,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_areas_ridges(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
prob_outer = 1,
prob = 1,
border_size = NULL,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_intervals_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
prob = 0.5,
prob_outer = 0.9,
point_est = c("median", "mean", "none"),
rhat = numeric()
)
mcmc_areas_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
prob = 0.5,
prob_outer = 1,
point_est = c("median", "mean", "none"),
rhat = numeric(),
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
mcmc_areas_ridges_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
prob_outer = 1,
prob = 1,
bw = NULL,
adjust = NULL,
kernel = NULL,
n_dens = NULL
)
Arguments
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently unused. |
prob |
The probability mass to include in the inner interval (for
|
prob_outer |
The probability mass to include in the outer interval. The
default is |
point_est |
The point estimate to show. Either |
inner_size , outer_size |
For |
point_size |
For |
rhat |
An optional numeric vector of R-hat estimates, with one element
per parameter included in |
area_method |
How to constrain the areas in |
border_size |
For |
bw , adjust , kernel , n_dens |
Optional arguments passed to
|
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
mcmc_intervals()
-
Plots of uncertainty intervals computed from posterior draws with all chains merged.
mcmc_areas()
-
Density plots computed from posterior draws with all chains merged, with uncertainty intervals shown as shaded areas under the curves.
mcmc_areas_ridges()
-
Density plot, as in
mcmc_areas()
, but drawn with overlapping ridgelines. This plot provides a compact display of (hierarchically) related distributions.
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
set.seed(9262017)
# load ggplot2 to use its functions to modify our plots
library(ggplot2)
# some parameter draws to use for demonstration
x <- example_mcmc_draws(params = 6)
dim(x)
dimnames(x)
color_scheme_set("brightblue")
mcmc_intervals(x)
mcmc_intervals(x, pars = c("beta[1]", "beta[2]"))
mcmc_areas(x, regex_pars = "beta\\[[1-3]\\]", prob = 0.8) +
labs(
title = "Posterior distributions",
subtitle = "with medians and 80% intervals"
)
color_scheme_set("red")
p <- mcmc_areas(
x,
pars = c("alpha", "beta[4]"),
prob = 2/3,
prob_outer = 0.9,
point_est = "mean",
border_size = 1.5 # make the ridgelines fatter
)
plot(p)
# control spacing at top and bottom of plot
# see ?ggplot2::expansion
p + scale_y_discrete(
limits = c("beta[4]", "alpha"),
expand = expansion(add = c(1, 2))
)
p + scale_y_discrete(
limits = c("beta[4]", "alpha"),
expand = expansion(add = c(.1, .3))
)
# relabel parameters
p + scale_y_discrete(
labels = c("alpha" = "param label 1",
"beta[4]" = "param label 2")
)
# relabel parameters and define the order
p + scale_y_discrete(
labels = c("alpha" = "param label 1",
"beta[4]" = "param label 2"),
limits = c("beta[4]", "alpha")
)
# color by rhat value
color_scheme_set("blue")
fake_rhat_values <- c(1, 1.07, 1.3, 1.01, 1.15, 1.005)
mcmc_intervals(x, rhat = fake_rhat_values)
# get the dataframe that is used in the plotting functions
mcmc_intervals_data(x)
mcmc_intervals_data(x, rhat = fake_rhat_values)
mcmc_areas_data(x, pars = "alpha")
color_scheme_set("gray")
p <- mcmc_areas(x, pars = c("alpha", "beta[4]"), rhat = c(1, 1.1))
p + legend_move("bottom")
p + legend_move("none") # or p + legend_none()
# Different area calculations
b3 <- c("beta[1]", "beta[2]", "beta[3]")
mcmc_areas(x, pars = b3, area_method = "equal area") +
labs(
title = "Curves have same area",
subtitle = "A wide, uncertain interval is spread thin when areas are equal"
)
mcmc_areas(x, pars = b3, area_method = "equal height") +
labs(
title = "Curves have same maximum height",
subtitle = "Local curvature is clearer but more uncertain curves use more area"
)
mcmc_areas(x, pars = b3, area_method = "scaled height") +
labs(
title = "Same maximum heights but heights scaled by square-root",
subtitle = "Compromise: Local curvature is accentuated and less area is used"
)
# apply transformations
mcmc_intervals(
x,
pars = c("beta[2]", "sigma"),
transformations = list("sigma" = "log", "beta[2]" = function(x) x + 3)
)
# apply same transformation to all selected parameters
mcmc_intervals(x, regex_pars = "beta", transformations = "exp")
## Not run:
# example using fitted model from rstanarm package
library(rstanarm)
fit <- stan_glm(
mpg ~ 0 + wt + factor(cyl),
data = mtcars,
iter = 500,
refresh = 0
)
x <- as.matrix(fit)
color_scheme_set("teal")
mcmc_intervals(x, point_est = "mean", prob = 0.8, prob_outer = 0.95)
mcmc_areas(x, regex_pars = "cyl", bw = "SJ",
rhat = rhat(fit, regex_pars = "cyl"))
## End(Not run)
## Not run:
# Example of hierarchically related parameters
# plotted with ridgelines
m <- shinystan::eight_schools@posterior_sample
mcmc_areas_ridges(m, pars = "mu", regex_pars = "theta", border_size = 0.75) +
ggtitle("Treatment effect on eight schools (Rubin, 1981)")
## End(Not run)
Diagnostic plots for the No-U-Turn-Sampler (NUTS)
Description
Diagnostic plots for the No-U-Turn-Sampler (NUTS), the default MCMC algorithm used by Stan. See the Plot Descriptions section, below.
Usage
mcmc_nuts_acceptance(
x,
lp,
chain = NULL,
...,
binwidth = NULL,
bins = NULL,
breaks = NULL
)
mcmc_nuts_divergence(x, lp, chain = NULL, ...)
mcmc_nuts_stepsize(x, lp, chain = NULL, ...)
mcmc_nuts_treedepth(x, lp, chain = NULL, ...)
mcmc_nuts_energy(
x,
...,
binwidth = NULL,
bins = NULL,
breaks = NULL,
alpha = 0.5,
merge_chains = FALSE
)
Arguments
x |
A molten data frame of NUTS sampler parameters, either created by
|
lp |
A molten data frame of draws of the log-posterior or, more
commonly, of a quantity equal to the log-posterior up to a constant.
|
chain |
A positive integer for selecting a particular chain. The default
( |
... |
Currently ignored. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
alpha |
For |
merge_chains |
For |
Value
A gtable object (the result of calling
gridExtra::arrangeGrob()
) created from several ggplot objects,
except for mcmc_nuts_energy()
, which returns a ggplot object.
Quick Definitions
For more details see Stan Development Team (2016) and Betancourt (2017).
-
accept_stat__
: the average acceptance probabilities of all possible samples in the proposed tree. -
divergent__
: the number of leapfrog transitions with diverging error. Because NUTS terminates at the first divergence this will be either 0 or 1 for each iteration. -
stepsize__
: the step size used by NUTS in its Hamiltonian simulation. -
treedepth__
: the depth of tree used by NUTS, which is the log (base 2) of the number of leapfrog steps taken during the Hamiltonian simulation. -
energy__
: the value of the Hamiltonian (up to an additive constant) at each iteration.
Plot Descriptions
mcmc_nuts_acceptance()
-
Three plots:
Histogram of
accept_stat__
with vertical lines indicating the mean (solid line) and median (dashed line).Histogram of
lp__
with vertical lines indicating the mean (solid line) and median (dashed line).Scatterplot of
accept_stat__
vslp__
.
mcmc_nuts_divergence()
-
Two plots:
Violin plots of
lp__|divergent__=1
andlp__|divergent__=0
.Violin plots of
accept_stat__|divergent__=1
andaccept_stat__|divergent__=0
.
mcmc_nuts_stepsize()
-
Two plots:
Violin plots of
lp__
by chain ordered bystepsize__
value.Violin plots of
accept_stat__
by chain ordered bystepsize__
value.
mcmc_nuts_treedepth()
-
Three plots:
Violin plots of
lp__
by value oftreedepth__
.Violin plots of
accept_stat__
by value oftreedepth__
.Histogram of
treedepth__
.
mcmc_nuts_energy()
-
Overlaid histograms showing
energy__
vs the change inenergy__
. See Betancourt (2016) for details.
References
Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434
Betancourt, M. and Girolami, M. (2013). Hamiltonian Monte Carlo for hierarchical models. https://arxiv.org/abs/1312.0906
Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research. 15:1593–1623.
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org/users/documentation/
See Also
The Visual MCMC Diagnostics vignette.
Several other plotting functions are not NUTS-specific but take optional extra arguments if the model was fit using NUTS:
-
mcmc_trace()
: show divergences as tick marks below the trace plot. -
mcmc_parcoord()
: change the color/size/transparency of lines corresponding to divergences. -
mcmc_scatter()
: change the color/size/shape of points corresponding to divergences. -
mcmc_pairs()
: change the color/size/shape of points corresponding divergences and/or max treedepth saturation.
-
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
## Not run:
library(ggplot2)
library(rstanarm)
fit <- stan_glm(mpg ~ wt + am, data = mtcars, iter = 1000, refresh = 0)
np <- nuts_params(fit)
lp <- log_posterior(fit)
color_scheme_set("brightblue")
mcmc_nuts_acceptance(np, lp)
mcmc_nuts_acceptance(np, lp, chain = 2)
mcmc_nuts_divergence(np, lp)
mcmc_nuts_stepsize(np, lp)
mcmc_nuts_treedepth(np, lp)
color_scheme_set("red")
mcmc_nuts_energy(np)
mcmc_nuts_energy(np, merge_chains = TRUE, binwidth = .15)
mcmc_nuts_energy(np) +
facet_wrap(vars(Chain), nrow = 1) +
coord_fixed(ratio = 150) +
ggtitle("NUTS Energy Diagnostic")
## End(Not run)
Plots for Markov chain Monte Carlo simulations
Description
The bayesplot MCMC module provides various plotting functions for creating graphical displays of Markov chain Monte Carlo (MCMC) simulations. The MCMC plotting functions section, below, provides links to the documentation for various categories of MCMC plots. Currently the MCMC plotting functions accept posterior draws provided in one of the following formats:
-
3-D array: An array with dimensions
Iteration, Chain, Parameter
in that order. -
list: A list of matrices, where each matrix corresponds to a Markov chain. All of the matrices should have the same number of iterations (rows) and parameters (columns), and parameters should have the same names and be in the same order.
-
matrix (2-D array): A matrix with one column per parameter. If using matrix there should only be a single Markov chain or all chains should already be merged (stacked).
-
data frame: There are two types of data frames allowed. Either a data frame with one column per parameter (if only a single chain or all chains have already been merged), or a data frame with one column per parameter plus an additional column
"Chain"
that contains the chain number (an integer) corresponding to each row in the data frame. -
draws: Any of the
draws
formats supported by the posterior package.
Note: typically the user should not include warmup iterations in the object passed to bayesplot plotting functions, although for certain plots (e.g. trace plots) it can occasionally be useful to include the warmup iterations for diagnostic purposes.
MCMC plotting functions
-
Posterior distributions: Histograms and kernel density plots of parameter draws, optionally showing each Markov chain separately.
-
Uncertainty intervals: Uncertainty intervals computed from parameter draws.
-
Trace plots: Times series of parameter draws, optionally including HMC/NUTS diagnostic information.
-
Scatterplots: Scatterplots, heatmaps, and pairs plots of parameter draws, optionally including HMC/NUTS diagnostic information.
-
Parallel coordinates plots: Parallel coordinates plot of MCMC draws (one dimension per parameter), optionally including HMC/NUTS diagnostic information.
-
Combos: Combination plots (e.g. trace plot + histogram).
-
General MCMC diagnostics: MCMC diagnostic plots including R-hat, effective sample size, autocorrelation. NUTS diagnostics: Special diagnostic plots for the No-U-Turn Sampler.
-
Comparisons to "true" values: Plots comparing MCMC estimates to "true" parameter values (e.g., values used to simulate data).
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Parallel coordinates plot of MCMC draws
Description
Parallel coordinates plot of MCMC draws (one dimension per parameter). See the Plot Descriptions section below for details, and see Gabry et al. (2019) for more background and a real example.
Usage
mcmc_parcoord(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
size = 0.2,
alpha = 0.3,
np = NULL,
np_style = parcoord_style_np()
)
mcmc_parcoord_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
np = NULL
)
parcoord_style_np(div_color = "red", div_size = 0.2, div_alpha = 0.2)
Arguments
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
size , alpha |
Arguments passed on to |
np |
For models fit using NUTS (more generally,
any symplectic integrator),
an optional data frame providing NUTS diagnostic information. The data
frame should be the object returned by |
np_style |
A call to the |
div_color , div_size , div_alpha |
Optional arguments to the
|
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
mcmc_parcoord()
-
Parallel coordinates plot of MCMC draws. There is one dimension per parameter along the horizontal axis and each set of connected line segments represents a single MCMC draw (i.e., a vector of length equal to the number of parameters).
The parallel coordinates plot is most useful if the optional HMC/NUTS diagnostic information is provided via the
np
argument. In that case divergences are highlighted in the plot. The appearance of the divergences can be customized using thenp_style
argument and theparcoord_style_np
helper function. This version of the plot is the same as the parallel coordinates plot described in Gabry et al. (2019).When the plotted model parameters are on very different scales the
transformations
argument can be useful. For example, to standardize all variables before plotting you could use function(x - mean(x))/sd(x)
when specifying thetransformations
argument tomcmc_parcoord
. See the Examples section for how to do this.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Hartikainen, A. (2017, Aug 23). Concentration of divergences (Msg 21). Message posted to The Stan Forums: https://discourse.mc-stan.org/t/concentration-of-divergences/1590/21.
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-recover
,
MCMC-scatterplots
,
MCMC-traces
Examples
color_scheme_set("pink")
x <- example_mcmc_draws(params = 5)
mcmc_parcoord(x)
mcmc_parcoord(x, regex_pars = "beta")
## Not run:
# Example using a Stan demo model
library(rstan)
fit <- stan_demo("eight_schools")
draws <- as.array(fit, pars = c("mu", "tau", "theta", "lp__"))
np <- nuts_params(fit)
str(np)
levels(np$Parameter)
color_scheme_set("brightblue")
mcmc_parcoord(draws, alpha = 0.05)
mcmc_parcoord(draws, np = np)
# customize appearance of divergences
color_scheme_set("darkgray")
div_style <- parcoord_style_np(div_color = "green", div_size = 0.05, div_alpha = 0.4)
mcmc_parcoord(draws, size = 0.25, alpha = 0.1,
np = np, np_style = div_style)
# to use a transformation (e.g., standardizing all the variables can be helpful)
# specify the 'transformations' argument (though partial argument name
# matching means we can just use 'trans' or 'transform')
mcmc_parcoord(
draws,
transform = function(x) {(x - mean(x)) / sd(x)},
size = 0.25,
alpha = 0.1,
np = np,
np_style = div_style
)
# mcmc_parcoord_data returns just the data in a conventient form for plotting
d <- mcmc_parcoord_data(x, np = np)
head(d)
tail(d)
## End(Not run)
Compare MCMC estimates to "true" parameter values
Description
Plots comparing MCMC estimates to "true" parameter values. Before fitting a model to real data it is useful to simulate data according to the model using known (fixed) parameter values and to check that these "true" parameter values are (approximately) recovered by fitting the model to the simulated data. See the Plot Descriptions section, below, for details on the available plots.
Usage
mcmc_recover_intervals(
x,
true,
batch = rep(1, length(true)),
...,
facet_args = list(),
prob = 0.5,
prob_outer = 0.9,
point_est = c("median", "mean", "none"),
size = 4,
alpha = 1
)
mcmc_recover_scatter(
x,
true,
batch = rep(1, length(true)),
...,
facet_args = list(),
point_est = c("median", "mean"),
size = 3,
alpha = 1
)
mcmc_recover_hist(
x,
true,
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL
)
Arguments
x |
An object containing MCMC draws:
|
true |
A numeric vector of "true" values of the parameters in |
batch |
Optionally, a vector-like object (numeric, character, integer,
factor) used to split the parameters into batches. If |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
prob |
The probability mass to include in the inner interval. The
default is |
prob_outer |
The probability mass to include in the outer interval. The
default is |
point_est |
The point estimate to show. Either |
size , alpha |
Passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
Value
A ggplot object that can be further customized using the ggplot2 package.
Plot Descriptions
mcmc_recover_intervals()
-
Central intervals and point estimates computed from MCMC draws, with "true" values plotted using a different shape.
mcmc_recover_scatter()
-
Scatterplot of posterior means (or medians) against "true" values.
mcmc_recover_hist()
-
Histograms of the draws for each parameter with the "true" value overlaid as a vertical line.
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-scatterplots
,
MCMC-traces
Examples
## Not run:
library(rstanarm)
alpha <- 1; beta <- rnorm(10, 0, 3); sigma <- 2
X <- matrix(rnorm(1000), 100, 10)
y <- rnorm(100, mean = c(alpha + X %*% beta), sd = sigma)
fit <- stan_glm(y ~ ., data = data.frame(y, X), refresh = 0)
draws <- as.matrix(fit)
print(colnames(draws))
true <- c(alpha, beta, sigma)
mcmc_recover_intervals(draws, true)
# put the coefficients on X into the same batch
mcmc_recover_intervals(draws, true, batch = c(1, rep(2, 10), 1))
# equivalent
mcmc_recover_intervals(draws, true, batch = grepl("X", colnames(draws)))
# same but facets stacked vertically
mcmc_recover_intervals(draws, true,
batch = grepl("X", colnames(draws)),
facet_args = list(ncol = 1),
size = 3)
# each parameter in its own facet
mcmc_recover_intervals(draws, true, batch = 1:ncol(draws))
# same but in a different order
mcmc_recover_intervals(draws, true, batch = c(1, 3, 4, 2, 5:12))
# present as bias by centering with true values
mcmc_recover_intervals(sweep(draws, 2, true), rep(0, ncol(draws))) + hline_0()
# scatterplot of posterior means vs true values
mcmc_recover_scatter(draws, true, point_est = "mean")
# histograms of parameter draws with true value added as vertical line
color_scheme_set("brightblue")
mcmc_recover_hist(draws[, 1:4], true[1:4])
## End(Not run)
Scatterplots of MCMC draws
Description
Scatterplots, hexagonal heatmaps, and pairs plots from MCMC draws. See the Plot Descriptions section, below, for details.
Usage
mcmc_scatter(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
size = 2.5,
alpha = 0.8,
np = NULL,
np_style = scatter_style_np()
)
mcmc_hex(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
bins = 30,
binwidth = NULL
)
mcmc_pairs(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
diag_fun = c("hist", "dens"),
off_diag_fun = c("scatter", "hex"),
diag_args = list(),
off_diag_args = list(),
condition = pairs_condition(),
lp = NULL,
np = NULL,
np_style = pairs_style_np(),
max_treedepth = NULL,
grid_args = list(),
save_gg_objects = TRUE
)
scatter_style_np(
div_color = "red",
div_shape = 16,
div_size = 2.5,
div_alpha = 1
)
pairs_style_np(
div_color = "red",
div_shape = 4,
div_size = 1,
div_alpha = 1,
td_color = "yellow2",
td_shape = 3,
td_size = 1,
td_alpha = 1
)
pairs_condition(chains = NULL, draws = NULL, nuts = NULL)
Arguments
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
size , alpha |
For |
np |
Optionally, a data frame of NUTS sampler parameters, either created
by |
np_style |
If |
bins , binwidth |
For |
diag_fun , off_diag_fun |
For |
diag_args , off_diag_args |
For |
condition |
For |
lp |
For |
max_treedepth |
For |
grid_args , save_gg_objects |
For |
div_color , div_shape , div_size , div_alpha , td_color , td_shape , td_size , td_alpha |
Optional arguments to the |
chains , draws , nuts |
Optional arguments to the
|
Value
mcmc_scatter()
and mcmc_hex()
return a ggplot object that
can be further customized using the ggplot2 package.
mcmc_pairs()
returns many ggplot objects organized into a grid via
bayesplot_grid()
.
Plot Descriptions
mcmc_scatter()
-
Bivariate scatterplot of posterior draws. If using a very large number of posterior draws then
mcmc_hex()
may be preferable to avoid overplotting. For models fit using NUTS thenp
, andnp_style
arguments can be used to add additional information in the plot (in this case the approximate location of divergences). For more on why the scatter plot with divergences is a useful diagnostic tool see Gabry et al. (2019). mcmc_hex()
-
Hexagonal heatmap of 2-D bin counts. This plot is useful in cases where the posterior sample size is large enough that
mcmc_scatter()
suffers from overplotting. mcmc_pairs()
-
A square plot matrix with univariate marginal distributions along the diagonal (as histograms or kernel density plots) and bivariate distributions off the diagonal (as scatterplots or hex heatmaps).
For the off-diagonal plots, the default is to split the chains so that (roughly) half are displayed above the diagonal and half are below (all chains are always merged together for the plots along the diagonal). Other possibilities are available by setting the
condition
argument.Additionally, extra diagnostic information for models fit using NUTS can be added to the pairs plot using the
lp
,np
, andnp_style
arguments. Ifnp
is specified (andcondition
is not"divergent__"
), then points (red, by default) will be superimposed onto the off-diagonal plots indicating which (if any) iterations encountered a divergent transition. Also, if bothnp
andmax_treedepth
are specified then points (yellow, by default) will be superimposed to indicate a transition that hit the maximum treedepth rather than terminated its evolution normally. Thenp_style
argument can be used with thepairs_style_np()
convenience function to change the appearance of these overlaid points. See the Examples section.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-traces
Examples
library("ggplot2")
# some parameter draws to use for demonstration
x <- example_mcmc_draws(params = 6)
dimnames(x)
# scatterplot of alpha vs log(sigma)
color_scheme_set("teal")
(p <- mcmc_scatter(x, pars = c("alpha", "sigma"),
transform = list(sigma = "log")))
p +
labs(
title = "Insert your own headline-grabbing title",
subtitle = "with a provocative subtitle",
caption = "and a controversial caption",
x = expression(alpha),
y = expression(log(sigma))
)
# add ellipse
p + stat_ellipse(level = 0.9, color = "gray20", size = 1)
# add contour
color_scheme_set("red")
p2 <- mcmc_scatter(x, pars = c("alpha", "sigma"), size = 3.5, alpha = 0.25)
p2 + stat_density_2d(color = "black", size = .5)
# can also add lines/smooths
color_scheme_set("pink")
(p3 <- mcmc_scatter(x, pars = c("alpha", "beta[3]"), alpha = 0.25, size = 3))
p3 + geom_smooth(method = "lm", se = FALSE, color = "gray20",
size = .75, linetype = 2)
if (requireNamespace("hexbin", quietly = TRUE)) {
# hexagonal heatmap
color_scheme_set("brightblue")
(p <- mcmc_hex(x, pars = c("sigma", "alpha"), transform = list(sigma = "log")))
p + plot_bg(fill = "gray95")
p + plot_bg(fill = "gray95") + panel_bg(fill = "gray70")
}
color_scheme_set("purple")
# pairs plots
# default of condition=NULL implies splitting chains between upper and lower panels
mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]",
off_diag_args = list(size = 1, alpha = 0.5))
# change to density plots instead of histograms and hex plots instead of
# scatterplots
mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]",
diag_fun = "dens", off_diag_fun = "hex")
# plot chain 1 above diagonal and chains 2, 3, and 4 below
color_scheme_set("brightblue")
mcmc_pairs(x, pars = "alpha", regex_pars = "beta\\[[1,4]\\]",
diag_fun = "dens", off_diag_fun = "hex",
condition = pairs_condition(chains = list(1, 2:4)))
## Not run:
### Adding NUTS diagnostics to scatterplots and pairs plots
# examples using rstanarm package
library(rstanarm)
# for demonstration purposes, intentionally fit a model that
# will (almost certainly) have some divergences
fit <- stan_glm(
mpg ~ ., data = mtcars,
iter = 1000, refresh = 0,
# this combo of prior and adapt_delta should lead to some divergences
prior = hs(),
adapt_delta = 0.9
)
posterior <- as.array(fit)
np <- nuts_params(fit)
# mcmc_scatter with divergences highlighted
color_scheme_set("brightblue")
mcmc_scatter(posterior, pars = c("wt", "sigma"), np = np)
color_scheme_set("darkgray")
div_style <- scatter_style_np(div_color = "green", div_shape = 4, div_size = 4)
mcmc_scatter(posterior, pars = c("sigma", "(Intercept)"),
np = np, np_style = div_style)
# split the draws according to above/below median accept_stat__
# and show approximate location of divergences (red points)
color_scheme_set("brightblue")
mcmc_pairs(
posterior,
pars = c("wt", "cyl", "sigma"),
off_diag_args = list(size = 1, alpha = 1/3),
condition = pairs_condition(nuts = "accept_stat__"),
np = np
)
# more customizations:
# - transform sigma to log(sigma)
# - median log-posterior as 'condition'
# - hex instead of scatter for off-diagonal plots
# - show points where max treedepth hit in blue
color_scheme_set("darkgray")
mcmc_pairs(
posterior,
pars = c("wt", "cyl", "sigma"),
transform = list(sigma = "log"),
off_diag_fun = "hex",
condition = pairs_condition(nuts = "lp__"),
lp = log_posterior(fit),
np = np,
np_style = pairs_style_np(div_color = "firebrick",
td_color = "blue",
td_size = 2),
# for demonstration purposes, set max_treedepth to a value that will
# result in at least a few max treedepth warnings
max_treedepth = with(np, -1 + max(Value[Parameter == "treedepth__"]))
)
## End(Not run)
Trace and rank plots of MCMC draws
Description
Trace and rank plots of MCMC draws. See the Plot Descriptions section, below, for details.
Usage
mcmc_trace(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
n_warmup = 0,
iter1 = 0,
window = NULL,
size = NULL,
np = NULL,
np_style = trace_style_np(),
divergences = NULL
)
mcmc_trace_highlight(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
n_warmup = 0,
window = NULL,
size = NULL,
alpha = 0.2,
highlight = 1
)
trace_style_np(div_color = "red", div_size = 0.25, div_alpha = 1)
mcmc_rank_overlay(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
facet_args = list(),
...,
n_bins = 20,
ref_line = FALSE
)
mcmc_rank_hist(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
facet_args = list(),
n_bins = 20,
ref_line = FALSE
)
mcmc_rank_ecdf(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
K = NULL,
facet_args = list(),
prob = 0.99,
plot_diff = FALSE,
interpolate_adj = NULL
)
mcmc_trace_data(
x,
pars = character(),
regex_pars = character(),
transformations = list(),
...,
highlight = NULL,
n_warmup = 0,
iter1 = 0
)
Arguments
x |
An object containing MCMC draws:
|
pars |
An optional character vector of parameter names. If neither
|
regex_pars |
An optional regular expression to use for
parameter selection. Can be specified instead of |
transformations |
Optionally, transformations to apply to parameters
before plotting. If Note: due to partial argument matching |
... |
Currently ignored. |
facet_args |
A named list of arguments (other than |
n_warmup |
An integer; the number of warmup iterations included in
|
iter1 |
An integer; the iteration number of the first included draw
(default is |
window |
An integer vector of length two specifying the limits of a range of iterations to display. |
size |
An optional value to override the default line size
for |
np |
For models fit using NUTS (more generally, any
symplectic integrator),
an optional data frame providing NUTS diagnostic information. The data
frame should be the object returned by |
np_style |
A call to the |
divergences |
Deprecated. Use the |
alpha |
For |
highlight |
For |
div_color , div_size , div_alpha |
Optional arguments to the
|
n_bins |
For the rank plots, the number of bins to use for the histogram
of rank-normalized MCMC samples. Defaults to |
ref_line |
For the rank plots, whether to draw a horizontal line at the
average number of ranks per bin. Defaults to |
K |
An optional integer defining the number of equally spaced evaluation
points for the PIT-ECDF. Reducing K when using |
prob |
For |
plot_diff |
For |
interpolate_adj |
A boolean defining if the simultaneous confidence
bands should be interpolated based on precomputed values rather than
computed exactly. Computing the bands may be computationally intensive and
the approximation gives a fast method for assessing the ECDF trajectory.
The default is to use interpolation if |
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
mcmc_trace_data()
returns the data for the trace and rank plots
in the same data frame.
Plot Descriptions
mcmc_trace()
-
Standard trace plots of MCMC draws. For models fit using NUTS, the
np
argument can be used to also show divergences on the trace plot. mcmc_trace_highlight()
-
Traces are plotted using points rather than lines and the opacity of all chains but one (specified by the
highlight
argument) is reduced. mcmc_rank_hist()
-
Whereas traditional trace plots visualize how the chains mix over the course of sampling, rank histograms visualize how the values from the chains mix together in terms of ranking. An ideal plot would show the rankings mixing or overlapping in a uniform distribution. See Vehtari et al. (2019) for details.
mcmc_rank_overlay()
-
Ranks from
mcmc_rank_hist()
are plotted using overlaid lines in a single panel. mcmc_rank_ecdf()
-
The ECDFs of the ranks from
mcmc_rank_hist()
are plotted with the simultaneous confidence bands with a coverage determined byprob
, that is, bands that completely cover all of the rank ECDFs with the probabilityprob
. Ifplot_diff = TRUE
, the difference between the observed rank ECDFs and the theoretical expectation for samples originating from the same distribution is drawn. See Säilynoja et al. (2021) for details.
References
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., Bürkner, P. (2019). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. arXiv preprint.
Säilynoja, T., Bürkner, P., Vehtari, A. (2021). Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison arXiv preprint.
See Also
Other MCMC:
MCMC-combos
,
MCMC-diagnostics
,
MCMC-distributions
,
MCMC-intervals
,
MCMC-nuts
,
MCMC-overview
,
MCMC-parcoord
,
MCMC-recover
,
MCMC-scatterplots
Examples
# some parameter draws to use for demonstration
x <- example_mcmc_draws(chains = 4, params = 6)
dim(x)
dimnames(x)
# trace plots of the betas
color_scheme_set("viridis")
mcmc_trace(x, regex_pars = "beta")
color_scheme_set("viridisA")
mcmc_trace(x, regex_pars = "beta")
color_scheme_set("viridisC")
mcmc_trace(x, regex_pars = "beta")
# mix color schemes
color_scheme_set("mix-blue-red")
mcmc_trace(x, regex_pars = "beta")
# use traditional ggplot discrete color scale
mcmc_trace(x, pars = c("alpha", "sigma")) +
ggplot2::scale_color_discrete()
# zoom in on a window of iterations, increase line size,
# add tick marks, move legend to the top, add gray background
color_scheme_set("viridisA")
mcmc_trace(x[,, 1:4], window = c(100, 130), size = 1) +
panel_bg(fill = "gray90", color = NA) +
legend_move("top")
# Rank-normalized histogram plots. Instead of showing how chains mix over
# time, look at how the ranking of MCMC samples mixed between chains.
color_scheme_set("viridisE")
mcmc_rank_hist(x, "alpha")
mcmc_rank_hist(x, pars = c("alpha", "sigma"), ref_line = TRUE)
mcmc_rank_overlay(x, "alpha")
# ECDF and ECDF difference plots of the ranking of MCMC samples between chains.
# Provide 99% simultaneous confidence intervals for the chains sampling from
# the same distribution.
mcmc_rank_ecdf(x, prob = 0.99)
mcmc_rank_ecdf(x, prob = 0.99, plot_diff = TRUE)
## Not run:
# parse facet label text
color_scheme_set("purple")
p <- mcmc_trace(
x,
regex_pars = "beta\\[[1,3]\\]",
facet_args = list(labeller = ggplot2::label_parsed)
)
p + facet_text(size = 15)
# mark first 100 draws as warmup
mcmc_trace(x, n_warmup = 100)
# plot as points, highlighting chain 2
color_scheme_set("brightblue")
mcmc_trace_highlight(x, pars = "sigma", highlight = 2, size = 2)
# for models fit using HMC/NUTS divergences can be displayed in the trace plot
library("rstanarm")
fit <- stan_glm(mpg ~ ., data = mtcars, refresh = 0,
# next line to keep example fast and also ensure we get some divergences
prior = hs(), iter = 400, adapt_delta = 0.8)
# extract draws using as.array (instead of as.matrix) to keep
# chains separate for trace plot
posterior <- as.array(fit)
# for stanfit and stanreg objects use nuts_params() to get the divergences
mcmc_trace(posterior, pars = "sigma", np = nuts_params(fit))
color_scheme_set("viridis")
mcmc_trace(
posterior,
pars = c("wt", "sigma"),
size = 0.5,
facet_args = list(nrow = 2),
np = nuts_params(fit),
np_style = trace_style_np(div_color = "black", div_size = 0.5)
)
## End(Not run)
Posterior (or prior) predictive checks (S3 generic and default method)
Description
S3 generic with simple default method. The intent is to provide a generic so
authors of other R packages who wish to provide interfaces to the functions
in bayesplot will be encouraged to include pp_check()
methods in their
package, preserving the same naming conventions for posterior (and prior)
predictive checking across many R packages for Bayesian inference. This is
for the convenience of both users and developers. See the Details and
Examples sections, below, and the package vignettes for examples of
defining pp_check()
methods.
Usage
pp_check(object, ...)
## Default S3 method:
pp_check(object, yrep, fun, ...)
Arguments
object |
Typically a fitted model object. The default method, however,
takes |
... |
For the generic, arguments passed to individual methods. For the
default method, these are additional arguments to pass to |
yrep |
For the default method, a |
fun |
For the default method, the plotting function to call. Can be any
of the PPC functions. The |
Details
A package that creates fitted model objects of class "foo"
can include a method pp_check.foo()
that prepares the appropriate
inputs (y
, yrep
, etc.) for the bayesplot functions. The
pp_check.foo()
method may, for example, let the user choose between
various plots, calling the functions from bayesplot internally as
needed. See Examples, below, and the package vignettes.
Value
The exact form of the value returned by pp_check()
may vary by
the class of object
, but for consistency we encourage authors of
methods to return the ggplot object created by one of bayesplot's
plotting functions. The default method returns the object returned by fun
.
Examples
# default method
y <- example_y_data()
yrep <- example_yrep_draws()
pp_check(y, yrep[1:50,], ppc_dens_overlay)
g <- example_group_data()
pp_check(y, yrep, fun = "stat_grouped", group = g, stat = "median")
# defining a method
x <- list(y = rnorm(50), yrep = matrix(rnorm(5000), nrow = 100, ncol = 50))
class(x) <- "foo"
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
y <- object[["y"]]
yrep <- object[["yrep"]]
switch(match.arg(type),
multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
overlaid = ppc_dens_overlay(y, yrep))
}
pp_check(x)
pp_check(x, type = "overlaid")
PPC censoring
Description
Compare the empirical distribution of censored data y
to the
distributions of simulated/replicated data yrep
from the posterior
predictive distribution. See the Plot Descriptions section, below, for
details.
Although some of the other bayesplot plots can be used with censored
data, ppc_km_overlay()
is currently the only plotting function designed
specifically for censored data. We encourage you to suggest or contribute
additional plots at
github.com/stan-dev/bayesplot.
Usage
ppc_km_overlay(y, yrep, ..., status_y, size = 0.25, alpha = 0.7)
ppc_km_overlay_grouped(y, yrep, group, ..., status_y, size = 0.25, alpha = 0.7)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently only used internally. |
status_y |
The status indicator for the observations from |
size , alpha |
Passed to the appropriate geom to control the appearance of
the |
group |
A grouping variable of the same length as |
Value
A ggplot object that can be further customized using the ggplot2 package.
Plot Descriptions
ppc_km_overlay()
-
Empirical CCDF estimates of each dataset (row) in
yrep
are overlaid, with the Kaplan-Meier estimate (Kaplan and Meier, 1958) fory
itself on top (and in a darker shade). This is a PPC suitable for right-censoredy
. Note that the replicated data fromyrep
is assumed to be uncensored. ppc_km_overlay_grouped()
-
The same as
ppc_km_overlay()
, but with separate facets bygroup
.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 53(282), 457–481. doi:10.1080/01621459.1958.10501452.
See Also
Other PPCs:
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
color_scheme_set("brightblue")
y <- example_y_data()
# For illustrative purposes, (right-)censor values y > 110:
status_y <- as.numeric(y <= 110)
y <- pmin(y, 110)
# In reality, the replicated data (yrep) would be obtained from a
# model which takes the censoring of y properly into account. Here,
# for illustrative purposes, we simply use example_yrep_draws():
yrep <- example_yrep_draws()
dim(yrep)
ppc_km_overlay(y, yrep[1:25, ], status_y = status_y)
# With separate facets by group:
group <- example_group_data()
ppc_km_overlay_grouped(y, yrep[1:25, ], group = group, status_y = status_y)
PPCs for discrete outcomes
Description
Many of the PPC functions in bayesplot can
be used with discrete data. The small subset of these functions that can
only be used if y
and yrep
are discrete are documented
on this page. Currently these include rootograms for count outcomes and bar
plots for ordinal, categorical, and multinomial outcomes. See the
Plot Descriptions section below.
Usage
ppc_bars(
y,
yrep,
...,
prob = 0.9,
width = 0.9,
size = 1,
fatten = 2.5,
linewidth = 1,
freq = TRUE
)
ppc_bars_grouped(
y,
yrep,
group,
...,
facet_args = list(),
prob = 0.9,
width = 0.9,
size = 1,
fatten = 2.5,
linewidth = 1,
freq = TRUE
)
ppc_rootogram(
y,
yrep,
style = c("standing", "hanging", "suspended"),
...,
prob = 0.9,
size = 1
)
ppc_bars_data(y, yrep, group = NULL, prob = 0.9, freq = TRUE)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
prob |
A value between |
width |
For bar plots only, passed to |
size , fatten , linewidth |
For bar plots, |
freq |
For bar plots only, if |
group |
A grouping variable of the same length as |
facet_args |
An optional list of arguments (other than |
style |
For |
Details
For all of these plots y
and yrep
must be integers, although
they need not be integers in the strict sense of R's
integer type. For rootogram plots y
and yrep
must also
be non-negative.
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
ppc_bars()
-
Bar plot of
y
withyrep
medians and uncertainty intervals superimposed on the bars. ppc_bars_grouped()
-
Same as
ppc_bars()
but a separate plot (facet) is generated for each level of a grouping variable. ppc_rootogram()
-
Rootograms allow for diagnosing problems in count data models such as overdispersion or excess zeros. They consist of a histogram of
y
with the expected counts based onyrep
overlaid as a line along with uncertainty intervals. The y-axis represents the square roots of the counts to approximately adjust for scale differences and thus ease comparison between observed and expected counts. Using thestyle
argument, the histogram style can be adjusted to focus on different aspects of the data:-
Standing: basic histogram of observed counts with curve showing expected counts.
-
Hanging: observed counts counts hanging from the curve representing expected counts.
-
Suspended: histogram of the differences between expected and observed counts.
All of the rootograms are plotted on the square root scale. See Kleiber and Zeileis (2016) for advice on interpreting rootograms and selecting among the different styles.
-
References
Kleiber, C. and Zeileis, A. (2016). Visualizing count data regressions using rootograms. The American Statistician. 70(3): 296–303. https://arxiv.org/abs/1605.01311.
See Also
Other PPCs:
PPC-censoring
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
set.seed(9222017)
# bar plots
f <- function(N) {
sample(1:4, size = N, replace = TRUE, prob = c(0.25, 0.4, 0.1, 0.25))
}
y <- f(100)
yrep <- t(replicate(500, f(100)))
dim(yrep)
group <- gl(2, 50, length = 100, labels = c("GroupA", "GroupB"))
color_scheme_set("mix-pink-blue")
ppc_bars(y, yrep)
# split by group, change interval width, and display proportion
# instead of count on y-axis
color_scheme_set("mix-blue-pink")
ppc_bars_grouped(y, yrep, group, prob = 0.5, freq = FALSE)
## Not run:
# example for ordinal regression using rstanarm
library(rstanarm)
fit <- stan_polr(
tobgp ~ agegp,
data = esoph,
method = "probit",
prior = R2(0.2, "mean"),
init_r = 0.1,
seed = 12345,
# cores = 4,
refresh = 0
)
# coded as character, so convert to integer
yrep_char <- posterior_predict(fit)
print(yrep_char[1, 1:4])
yrep_int <- sapply(data.frame(yrep_char, stringsAsFactors = TRUE), as.integer)
y_int <- as.integer(esoph$tobgp)
ppc_bars(y_int, yrep_int)
ppc_bars_grouped(
y = y_int,
yrep = yrep_int,
group = esoph$agegp,
freq=FALSE,
prob = 0.5,
fatten = 1,
size = 1.5
)
## End(Not run)
# rootograms for counts
y <- rpois(100, 20)
yrep <- matrix(rpois(10000, 20), ncol = 100)
color_scheme_set("brightblue")
ppc_rootogram(y, yrep)
ppc_rootogram(y, yrep, prob = 0)
ppc_rootogram(y, yrep, style = "hanging", prob = 0.8)
ppc_rootogram(y, yrep, style = "suspended")
PPC distributions
Description
Compare the empirical distribution of the data y
to the distributions of
simulated/replicated data yrep
from the posterior predictive distribution.
See the Plot Descriptions section, below, for details.
Usage
ppc_data(y, yrep, group = NULL)
ppc_dens_overlay(
y,
yrep,
...,
size = 0.25,
alpha = 0.7,
trim = FALSE,
bw = "nrd0",
adjust = 1,
kernel = "gaussian",
n_dens = 1024
)
ppc_dens_overlay_grouped(
y,
yrep,
group,
...,
size = 0.25,
alpha = 0.7,
trim = FALSE,
bw = "nrd0",
adjust = 1,
kernel = "gaussian",
n_dens = 1024
)
ppc_ecdf_overlay(
y,
yrep,
...,
discrete = FALSE,
pad = TRUE,
size = 0.25,
alpha = 0.7
)
ppc_ecdf_overlay_grouped(
y,
yrep,
group,
...,
discrete = FALSE,
pad = TRUE,
size = 0.25,
alpha = 0.7
)
ppc_dens(y, yrep, ..., trim = FALSE, size = 0.5, alpha = 1)
ppc_hist(
y,
yrep,
...,
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_freqpoly(
y,
yrep,
...,
binwidth = NULL,
bins = NULL,
freq = TRUE,
size = 0.5,
alpha = 1
)
ppc_freqpoly_grouped(
y,
yrep,
group,
...,
binwidth = NULL,
bins = NULL,
freq = TRUE,
size = 0.5,
alpha = 1
)
ppc_boxplot(y, yrep, ..., notch = TRUE, size = 0.5, alpha = 1)
ppc_violin_grouped(
y,
yrep,
group,
...,
probs = c(0.1, 0.5, 0.9),
size = 1,
alpha = 1,
y_draw = c("violin", "points", "both"),
y_size = 1,
y_alpha = 1,
y_jitter = 0.1
)
ppc_pit_ecdf(
y,
yrep,
...,
pit = NULL,
K = NULL,
prob = 0.99,
plot_diff = FALSE,
interpolate_adj = NULL
)
ppc_pit_ecdf_grouped(
y,
yrep,
group,
...,
K = NULL,
pit = NULL,
prob = 0.99,
plot_diff = FALSE,
interpolate_adj = NULL
)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
group |
A grouping variable of the same length as |
... |
Currently unused. |
size , alpha |
Passed to the appropriate geom to control the appearance of the predictive distributions. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens |
Optional arguments passed to
|
discrete |
For |
pad |
A logical scalar passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
notch |
For the box plot, a logical scalar passed to
|
probs |
A numeric vector passed to |
y_draw |
For |
y_jitter , y_size , y_alpha |
For |
pit |
An optional vector of probability integral transformed values for
which the ECDF is to be drawn. If NULL, PIT values are computed to |
K |
An optional integer defining the number of equally spaced evaluation
points for the PIT-ECDF. Reducing K when using |
prob |
The desired simultaneous coverage level of the bands around the ECDF. A value in (0,1). |
plot_diff |
A boolean defining whether to plot the difference between
the observed PIT- ECDF and the theoretical expectation for uniform PIT
values rather than plotting the regular ECDF. The default is |
interpolate_adj |
A boolean defining if the simultaneous confidence
bands should be interpolated based on precomputed values rather than
computed exactly. Computing the bands may be computationally intensive and
the approximation gives a fast method for assessing the ECDF trajectory.
The default is to use interpolation if |
Details
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
ppc_hist(), ppc_freqpoly(), ppc_dens(), ppc_boxplot()
-
A separate histogram, shaded frequency polygon, smoothed kernel density estimate, or box and whiskers plot is displayed for
y
and each dataset (row) inyrep
. For these plotsyrep
should therefore contain only a small number of rows. See the Examples section. ppc_freqpoly_grouped()
-
A separate frequency polygon is plotted for each level of a grouping variable for
y
and each dataset (row) inyrep
. For this plotyrep
should therefore contain only a small number of rows. See the Examples section. ppc_ecdf_overlay(), ppc_dens_overlay(), ppc_ecdf_overlay_grouped(), ppc_dens_overlay_grouped()
-
Kernel density or empirical CDF estimates of each dataset (row) in
yrep
are overlaid, with the distribution ofy
itself on top (and in a darker shade). When usingppc_ecdf_overlay()
with discrete data, set thediscrete
argument toTRUE
for better results. For an example ofppc_dens_overlay()
also see Gabry et al. (2019). ppc_violin_grouped()
-
The density estimate of
yrep
within each level of a grouping variable is plotted as a violin with horizontal lines at notable quantiles.y
is overlaid on the plot either as a violin, points, or both, depending on they_draw
argument. ppc_pit_ecdf()
,ppc_pit_ecdf_grouped()
-
The PIT-ECDF of the empirical PIT values of
y
computed with respect to the correspondingyrep
values.100 * prob
% central simultaneous confidence intervals are provided to asses ify
andyrep
originate from the same distribution. The PIT values can also be provided directly aspit
. See Säilynoja et al. (2021) for more details.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Säilynoja, T., Bürkner, P., Vehtari, A. (2021). Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison arXiv preprint.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
color_scheme_set("brightblue")
y <- example_y_data()
yrep <- example_yrep_draws()
group <- example_group_data()
dim(yrep)
ppc_dens_overlay(y, yrep[1:25, ])
# ppc_ecdf_overlay with continuous data (set discrete=TRUE if discrete data)
ppc_ecdf_overlay(y, yrep[sample(nrow(yrep), 25), ])
# PIT-ECDF and PIT-ECDF difference plot of the PIT values of y compared to
# yrep with 99% simultaneous confidence bands.
ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = FALSE)
ppc_pit_ecdf(y, yrep, prob = 0.99, plot_diff = TRUE)
# for ppc_hist,dens,freqpoly,boxplot definitely use a subset yrep rows so
# only a few (instead of nrow(yrep)) histograms are plotted
ppc_hist(y, yrep[1:8, ])
color_scheme_set("red")
ppc_boxplot(y, yrep[1:8, ])
# wizard hat plot
color_scheme_set("blue")
ppc_dens(y, yrep[200:202, ])
# frequency polygons
ppc_freqpoly(y, yrep[1:3, ], alpha = 0.1, size = 1, binwidth = 5)
ppc_freqpoly_grouped(y, yrep[1:3, ], group) + yaxis_text()
# if groups are different sizes then the 'freq' argument can be useful
ppc_freqpoly_grouped(y, yrep[1:3, ], group, freq = FALSE) + yaxis_text()
# density and distribution overlays by group
ppc_dens_overlay_grouped(y, yrep[1:25, ], group = group)
ppc_ecdf_overlay_grouped(y, yrep[1:25, ], group = group)
# PIT-ECDF plots of the PIT values by group
# with 99% simultaneous confidence bands.
ppc_pit_ecdf_grouped(y, yrep, group=group, prob=0.99)
# don't need to only use small number of rows for ppc_violin_grouped
# (as it pools yrep draws within groups)
color_scheme_set("gray")
ppc_violin_grouped(y, yrep, group, size = 1.5)
ppc_violin_grouped(y, yrep, group, alpha = 0)
# change how y is drawn
ppc_violin_grouped(y, yrep, group, alpha = 0, y_draw = "points", y_size = 1.5)
ppc_violin_grouped(y, yrep, group,
alpha = 0, y_draw = "both",
y_size = 1.5, y_alpha = 0.5, y_jitter = 0.33
)
PPC errors
Description
Various plots of predictive errors y - yrep
. See the
Details and Plot Descriptions sections, below.
Usage
ppc_error_hist(
y,
yrep,
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_error_hist_grouped(
y,
yrep,
group,
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_error_scatter(y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8)
ppc_error_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8)
ppc_error_scatter_avg_grouped(
y,
yrep,
group,
...,
facet_args = list(),
size = 2.5,
alpha = 0.8
)
ppc_error_scatter_avg_vs_x(y, yrep, x, ..., size = 2.5, alpha = 0.8)
ppc_error_binned(
y,
yrep,
...,
facet_args = list(),
bins = NULL,
size = 1,
alpha = 0.25
)
ppc_error_data(y, yrep, group = NULL)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
binwidth |
Passed to |
bins |
For |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
size , alpha |
For scatterplots, arguments passed to
|
x |
A numeric vector the same length as |
Details
All of these functions (aside from the *_scatter_avg
functions)
compute and plot predictive errors for each row of the matrix yrep
, so
it is usually a good idea for yrep
to contain only a small number of
draws (rows). See Examples, below.
For binomial and Bernoulli data the ppc_error_binned()
function can be used
to generate binned error plots. Bernoulli data can be input as a vector of 0s
and 1s, whereas for binomial data y
and yrep
should contain "success"
proportions (not counts). See the Examples section, below.
Value
A ggplot object that can be further customized using the ggplot2 package.
Plot descriptions
ppc_error_hist()
-
A separate histogram is plotted for the predictive errors computed from
y
and each dataset (row) inyrep
. For this plotyrep
should have only a small number of rows. ppc_error_hist_grouped()
-
Like
ppc_error_hist()
, except errors are computed within levels of a grouping variable. The number of histograms is therefore equal to the product of the number of rows inyrep
and the number of groups (unique values ofgroup
). ppc_error_scatter()
-
A separate scatterplot is displayed for
y
vs. the predictive errors computed fromy
and each dataset (row) inyrep
. For this plotyrep
should have only a small number of rows. ppc_error_scatter_avg()
-
A single scatterplot of
y
vs. the average of the errors computed fromy
and each dataset (row) inyrep
. For each individual data pointy[n]
the average error is the average of the errors fory[n]
computed over the the draws from the posterior predictive distribution. ppc_error_scatter_avg_vs_x()
-
Same as
ppc_error_scatter_avg()
, except the average is plotted on the y-axis and a predictor variablex
is plotted on the x-axis. ppc_error_binned()
-
Intended for use with binomial data. A separate binned error plot (similar to
arm::binnedplot()
) is generated for each dataset (row) inyrep
. For this ploty
andyrep
should contain proportions rather than counts, andyrep
should have only a small number of rows.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
y <- example_y_data()
yrep <- example_yrep_draws()
ppc_error_hist(y, yrep[1:3, ])
# errors within groups
group <- example_group_data()
(p1 <- ppc_error_hist_grouped(y, yrep[1:3, ], group))
p1 + yaxis_text() # defaults to showing counts on y-axis
table(group) # more obs in GroupB, can set freq=FALSE to show density on y-axis
(p2 <- ppc_error_hist_grouped(y, yrep[1:3, ], group, freq = FALSE))
p2 + yaxis_text()
# scatterplots
ppc_error_scatter(y, yrep[10:14, ])
ppc_error_scatter_avg(y, yrep)
x <- example_x_data()
ppc_error_scatter_avg_vs_x(y, yrep, x)
## Not run:
# binned error plot with binomial model from rstanarm
library(rstanarm)
example("example_model", package = "rstanarm")
formula(example_model)
# get observed proportion of "successes"
y <- example_model$y # matrix of "success" and "failure" counts
trials <- rowSums(y)
y_prop <- y[, 1] / trials # proportions
# get predicted success proportions
yrep <- posterior_predict(example_model)
yrep_prop <- sweep(yrep, 2, trials, "/")
ppc_error_binned(y_prop, yrep_prop[1:6, ])
## End(Not run)
PPC intervals
Description
Medians and central interval estimates of yrep
with y
overlaid.
See the Plot Descriptions section, below.
Usage
ppc_intervals(
y,
yrep,
x = NULL,
...,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 1,
fatten = 2.5,
linewidth = 1
)
ppc_intervals_grouped(
y,
yrep,
x = NULL,
group,
...,
facet_args = list(),
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 1,
fatten = 2.5,
linewidth = 1
)
ppc_ribbon(
y,
yrep,
x = NULL,
...,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 0.25,
y_draw = c("line", "points", "both")
)
ppc_ribbon_grouped(
y,
yrep,
x = NULL,
group,
...,
facet_args = list(),
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 0.25,
y_draw = c("line", "points", "both")
)
ppc_intervals_data(
y,
yrep,
x = NULL,
group = NULL,
...,
prob = 0.5,
prob_outer = 0.9
)
ppc_ribbon_data(
y,
yrep,
x = NULL,
group = NULL,
...,
prob = 0.5,
prob_outer = 0.9
)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
x |
A numeric vector to use as the x-axis
variable. For example, |
... |
Currently unused. |
prob , prob_outer |
Values between |
alpha , size , fatten , linewidth |
Arguments passed to geoms. For ribbon
plots |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
y_draw |
For ribbon plots only, a string specifying how to draw |
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
ppc_intervals(), ppc_ribbon()
-
100*prob
% central intervals foryrep
at eachx
value.ppc_intervals()
plots intervals as vertical bars with points indicatingyrep
medians and darker points indicating observedy
values.ppc_ribbon()
plots a ribbon of connected intervals with a line through the median ofyrep
and a darker line connecting observedy
values. In both cases an optionalx
variable can also be specified for the x-axis variable.Depending on the number of observations and the variability in the predictions at different values of
x
, one of these plots may be easier to read than the other. ppc_intervals_grouped(), ppc_ribbon_grouped()
-
Same as
ppc_intervals()
andppc_ribbon()
, respectively, but a separate plot (facet) is generated for each level of a grouping variable.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
y <- rnorm(50)
yrep <- matrix(rnorm(5000, 0, 2), ncol = 50)
color_scheme_set("brightblue")
ppc_intervals(y, yrep)
ppc_ribbon(y, yrep)
ppc_ribbon(y, yrep, y_draw = "points")
## Not run:
ppc_ribbon(y, yrep, y_draw = "both")
## End(Not run)
ppc_intervals(y, yrep, size = 1.5, fatten = 0) # remove the yrep point estimates
color_scheme_set("teal")
year <- 1950:1999
ppc_intervals(y, yrep, x = year, fatten = 1) + ggplot2::xlab("Year")
ppc_ribbon(y, yrep, x = year) + ggplot2::xlab("Year")
color_scheme_set("pink")
year <- rep(2000:2009, each = 5)
group <- gl(5, 1, length = 50, labels = LETTERS[1:5])
ppc_ribbon_grouped(y, yrep, x = year, group, y_draw = "both") +
ggplot2::scale_x_continuous(breaks = pretty)
ppc_ribbon_grouped(y, yrep, x = year, group,
facet_args = list(scales = "fixed")) +
xaxis_text(FALSE) +
xaxis_ticks(FALSE) +
panel_bg(fill = "gray20")
# get the data frames used to make the ggplots
ppc_dat <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_group_dat <- ppc_intervals_data(y, yrep, x = year, group = group, prob = 0.5)
## Not run:
library("rstanarm")
fit <- stan_glmer(mpg ~ wt + (1|cyl), data = mtcars, refresh = 0)
yrep <- posterior_predict(fit)
color_scheme_set("purple")
ppc_intervals(y = mtcars$mpg, yrep = yrep, x = mtcars$wt, prob = 0.8) +
panel_bg(fill="gray90", color = NA) +
grid_lines(color = "white")
ppc_ribbon(y = mtcars$mpg, yrep = yrep, x = mtcars$wt,
prob = 0.6, prob_outer = 0.8)
ppc_ribbon_grouped(y = mtcars$mpg, yrep = yrep, x = mtcars$wt,
group = mtcars$cyl)
color_scheme_set("gray")
ppc_intervals(mtcars$mpg, yrep, prob = 0.5) +
ggplot2::scale_x_continuous(
labels = rownames(mtcars),
breaks = 1:nrow(mtcars)
) +
xaxis_text(angle = -70, vjust = 1, hjust = 0) +
xaxis_title(FALSE)
## End(Not run)
LOO predictive checks
Description
Leave-One-Out (LOO) predictive checks. See the Plot Descriptions section, below, and Gabry et al. (2019) for details.
Usage
ppc_loo_pit_overlay(
y,
yrep,
lw = NULL,
...,
psis_object = NULL,
pit = NULL,
samples = 100,
size = 0.25,
alpha = 0.7,
boundary_correction = TRUE,
grid_len = 512,
bw = "nrd0",
trim = FALSE,
adjust = 1,
kernel = "gaussian",
n_dens = 1024
)
ppc_loo_pit_data(
y,
yrep,
lw = NULL,
...,
psis_object = NULL,
pit = NULL,
samples = 100,
bw = "nrd0",
boundary_correction = TRUE,
grid_len = 512
)
ppc_loo_pit_qq(
y,
yrep,
lw = NULL,
...,
psis_object = NULL,
pit = NULL,
compare = c("uniform", "normal"),
size = 2,
alpha = 1
)
ppc_loo_pit(
y,
yrep,
lw,
pit = NULL,
compare = c("uniform", "normal"),
...,
size = 2,
alpha = 1
)
ppc_loo_intervals(
y,
yrep,
psis_object,
...,
subset = NULL,
intervals = NULL,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 1,
fatten = 2.5,
linewidth = 1,
order = c("index", "median")
)
ppc_loo_ribbon(
y,
yrep,
psis_object,
...,
subset = NULL,
intervals = NULL,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 0.25
)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
lw |
A matrix of (smoothed) log weights with the same dimensions as
|
... |
Currently unused. |
psis_object |
If using loo version |
pit |
For |
samples |
For |
alpha , size , fatten , linewidth |
Arguments passed to code geoms to control plot
aesthetics. For |
boundary_correction |
For |
grid_len |
For |
bw , adjust , kernel , n_dens |
Optional arguments passed to
|
trim |
Passed to |
compare |
For |
subset |
For |
intervals |
For |
prob , prob_outer |
Values between |
order |
For |
Value
A ggplot object that can be further customized using the ggplot2 package.
Plot Descriptions
ppc_loo_pit_overlay()
,ppc_loo_pit_qq()
-
The calibration of marginal predictions can be assessed using probability integral transformation (PIT) checks. LOO improves the check by avoiding the double use of data. See the section on marginal predictive checks in Gelman et al. (2013, p. 152–153) and section 5 of Gabry et al. (2019) for an example of using bayesplot for these checks.
The LOO PIT values are asymptotically uniform (for continuous data) if the model is calibrated. The
ppc_loo_pit_overlay()
function creates a plot comparing the density of the LOO PITs (thick line) to the density estimates of many simulated data sets from the standard uniform distribution (thin lines). See Gabry et al. (2019) for an example of interpreting the shape of the miscalibration that can be observed in these plots.The
ppc_loo_pit_qq()
function provides an alternative visualization of the miscalibration with a quantile-quantile (Q-Q) plot comparing the LOO PITs to the standard uniform distribution. Comparing to the uniform is not good for extreme probabilities close to 0 and 1, so it can sometimes be useful to set thecompare
argument to"normal"
, which will produce a Q-Q plot comparing standard normal quantiles calculated from the PIT values to the theoretical standard normal quantiles. This can help see the (mis)calibration better for the extreme values. However, in most cases we have found that the overlaid density plot (ppc_loo_pit_overlay()
) function will provide a clearer picture of calibration problems than the Q-Q plot. ppc_loo_intervals()
,ppc_loo_ribbon()
-
Similar to
ppc_intervals()
andppc_ribbon()
but the intervals are for the LOO predictive distribution.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (p. 152–153)
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: https://arxiv.org/abs/1507.04544
Boneva, L. I., Kendall, D., & Stefanov, I. (1971). Spline transformations: Three new diagnostic aids for the statistical data-analyst. J. R. Stat. Soc. B (Methodological), 33(1), 1-71. https://www.jstor.org/stable/2986005.
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-overview
,
PPC-scatterplots
,
PPC-test-statistics
Examples
## Not run:
library(rstanarm)
library(loo)
head(radon)
fit <- stan_lmer(
log_radon ~ floor + log_uranium + floor:log_uranium
+ (1 + floor | county),
data = radon,
iter = 100,
chains = 2,
cores = 2
)
y <- radon$log_radon
yrep <- posterior_predict(fit)
loo1 <- loo(fit, save_psis = TRUE, cores = 4)
psis1 <- loo1$psis_object
lw <- weights(psis1) # normalized log weights
# marginal predictive check using LOO probability integral transform
color_scheme_set("orange")
ppc_loo_pit_overlay(y, yrep, lw = lw)
ppc_loo_pit_qq(y, yrep, lw = lw)
ppc_loo_pit_qq(y, yrep, lw = lw, compare = "normal")
# can use the psis object instead of lw
ppc_loo_pit_qq(y, yrep, psis_object = psis1)
# loo predictive intervals vs observations
keep_obs <- 1:50
ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs)
color_scheme_set("gray")
ppc_loo_intervals(y, yrep, psis_object = psis1, subset = keep_obs,
order = "median")
## End(Not run)
Graphical posterior predictive checking
Description
The bayesplot PPC module provides various plotting functions for creating graphical displays comparing observed data to simulated data from the posterior (or prior) predictive distribution. See the sections below for a brief discussion of the ideas behind posterior predictive checking, an overview of the available PPC plots, and tips on providing an interface to bayesplot from another package.
For plots of posterior (or prior) predictive distributions that do not include observed data see PPD-overview instead.
Details
The idea behind posterior predictive checking is simple: if a model is a good fit then we should be able to use it to generate data that looks a lot like the data we observed.
Posterior predictive distribution
To generate the data used for posterior predictive checks we simulate from
the posterior predictive distribution. The posterior predictive
distribution is the distribution of the outcome variable implied by a model
after using the observed data y
(a vector of outcome values), and
typically predictors X
, to update our beliefs about the unknown
parameters \theta
in the model. For each draw of the parameters
\theta
from the posterior distribution
p(\theta \,|\, y, X)
we generate an entire vector of outcomes. The result is
an S \times N
matrix of simulations, where S
is the the
size of the posterior sample (number of draws from the posterior
distribution) and N
is the number of data points in y
. That is,
each row of the matrix is an individual "replicated" dataset of N
observations.
Notation
When simulating from the posterior predictive distribution we can use either
the same values of the predictors X
that we used when fitting the model
or new observations of those predictors. When we use the same values of
X
we denote the resulting simulations by y^{rep}
as they
can be thought of as replications of the outcome y
rather than
predictions for future observations. This corresponds to the notation from
Gelman et. al. (2013) and is the notation used throughout the documentation
for this package.
Graphical posterior predictive checking
Using the datasets y^{rep}
drawn from the posterior predictive
distribution, the functions in the bayesplot package produce various
graphical displays comparing the observed data y
to the replications.
For a more thorough discussion of posterior predictive checking see
Chapter 6 of Gelman et. al. (2013).
Prior predictive checking
To use bayesplot for prior predictive checks you can simply use draws
from the prior predictive distribution instead of the posterior predictive
distribution. See Gabry et al. (2019) for more on prior predictive checking
and when it is reasonable to compare the prior predictive distribution to the
observed data. If you want to avoid using the observed data for prior
predictive checks then you can use the bayesplot PPD plots instead,
which do not take a y
argument, or you can use the PPC plots but provide
plausible or implausible y
values that you want to compare to the prior
predictive realizations.
PPC plotting functions
The plotting functions for prior and
posterior predictive checking all have the prefix ppc_
and all require
the arguments y
, a vector of observations, and yrep
, a matrix of
replications (in-sample predictions). The plots are organized into several
categories, each with its own documentation:
-
PPC-distributions: Histograms, kernel density estimates, boxplots, and other plots comparing the empirical distribution of data
y
to the distributions of individual simulated datasets (rows) inyrep
. -
PPC-test-statistics: The distribution of a statistic, or a pair of statistics, over the simulated datasets (rows) in
yrep
compared to value of the statistic(s) computed fromy
. -
PPC-intervals: Interval estimates of
yrep
withy
overlaid. The x-axis variable can be optionally specified by the user (e.g. to plot against a predictor variable or over time). -
PPC-errors: Plots of predictive errors (
y - yrep
) computed fromy
and each of the simulated datasets (rows) inyrep
. For binomial models binned error plots are also available. -
PPC-scatterplots: Scatterplots (and similar visualizations) of the data
y
vs. individual simulated datasets (rows) inyrep
, or vs. the average value of the distributions of each data point (columns) inyrep
. -
PPC-discrete: PPC functions that can only be used if
y
andyrep
are discrete. For example, rootograms for count outcomes and bar plots for ordinal, categorical, and multinomial outcomes. -
PPC-loo: PPC functions for predictive checks based on (approximate) leave-one-out (LOO) cross-validation. '
-
PPC-censoring: PPC functions comparing the empirical distribution of censored data
y
to the distributions of individual simulated datasets (rows) inyrep
.
Providing an interface for predictive checking from another package
In addition to the various plotting functions, the bayesplot package
provides the S3 generic pp_check()
. Authors of R packages for
Bayesian inference are encouraged to define pp_check()
methods for the
fitted model objects created by their packages. See the package vignettes for
more details and a simple example, and see the rstanarm and brms
packages for full examples of pp_check()
methods.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-scatterplots
,
PPC-test-statistics
PPC scatterplots
Description
Scatterplots of the observed data y
vs. simulated/replicated data
yrep
from the posterior predictive distribution. See the
Plot Descriptions and Details sections, below.
Usage
ppc_scatter(
y,
yrep,
...,
facet_args = list(),
size = 2.5,
alpha = 0.8,
ref_line = TRUE
)
ppc_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8, ref_line = TRUE)
ppc_scatter_avg_grouped(
y,
yrep,
group,
...,
facet_args = list(),
size = 2.5,
alpha = 0.8,
ref_line = TRUE
)
ppc_scatter_data(y, yrep)
ppc_scatter_avg_data(y, yrep, group = NULL)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
... |
Currently unused. |
facet_args |
A named list of arguments (other than |
size , alpha |
Arguments passed to |
ref_line |
If |
group |
A grouping variable of the same length as |
Details
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
ppc_scatter()
-
For each dataset (row) in
yrep
a scatterplot is generated showingy
against that row ofyrep
. For this plotyrep
should only contain a small number of rows. ppc_scatter_avg()
-
A single scatterplot of
y
against the average values ofyrep
, i.e., the points(x,y) = (mean(yrep[, n]), y[n])
, where eachyrep[, n]
is a vector of length equal to the number of posterior draws. Unlike forppc_scatter()
, forppc_scatter_avg()
yrep
should contain many draws (rows). ppc_scatter_avg_grouped()
-
The same as
ppc_scatter_avg()
, but a separate plot is generated for each level of a grouping variable.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-test-statistics
Examples
y <- example_y_data()
yrep <- example_yrep_draws()
p1 <- ppc_scatter_avg(y, yrep)
p1
# don't draw line x=y
ppc_scatter_avg(y, yrep, ref_line = FALSE)
p2 <- ppc_scatter(y, yrep[20:23, ], alpha = 0.5, size = 1.5)
p2
# give x and y axes the same limits
lims <- ggplot2::lims(x = c(0, 160), y = c(0, 160))
p1 + lims
p2 + lims
# for ppc_scatter_avg_grouped the default is to allow the facets
# to have different x and y axes
group <- example_group_data()
ppc_scatter_avg_grouped(y, yrep, group)
# let x-axis vary but force y-axis to be the same
ppc_scatter_avg_grouped(y, yrep, group, facet_args = list(scales = "free_x"))
PPC test statistics
Description
The distribution of a (test) statistic T(yrep)
, or a pair of (test)
statistics, over the simulated datasets in yrep
, compared to the
observed value T(y)
computed from the data y
. See the
Plot Descriptions and Details sections, below, as
well as Gabry et al. (2019).
Usage
ppc_stat(
y,
yrep,
stat = "mean",
...,
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_stat_grouped(
y,
yrep,
group,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_stat_freqpoly(
y,
yrep,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
freq = TRUE
)
ppc_stat_freqpoly_grouped(
y,
yrep,
group,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
freq = TRUE
)
ppc_stat_2d(y, yrep, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7)
ppc_stat_data(y, yrep, group = NULL, stat)
Arguments
y |
A vector of observations. See Details. |
yrep |
An |
stat |
A single function or a string naming a function, except for the 2D plot which requires a vector of exactly two names or functions. In all cases the function(s) should take a vector input and return a scalar statistic. If specified as a string (or strings) then the legend will display the function name(s). If specified as a function (or functions) then generic naming is used in the legend. |
... |
Currently unused. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
size , alpha |
For the 2D plot only, arguments passed to
|
Details
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
Plot Descriptions
ppc_stat()
,ppc_stat_freqpoly()
-
A histogram or frequency polygon of the distribution of a statistic computed by applying
stat
to each dataset (row) inyrep
. The value of the statistic in the observed data,stat(y)
, is overlaid as a vertical line. More details and example usage ofppc_stat()
can be found in Gabry et al. (2019). ppc_stat_grouped()
,ppc_stat_freqpoly_grouped()
-
The same as
ppc_stat()
andppc_stat_freqpoly()
, but a separate plot is generated for each level of a grouping variable. More details and example usage ofppc_stat_grouped()
can be found in Gabry et al. (2019). ppc_stat_2d()
-
A scatterplot showing the joint distribution of two statistics computed over the datasets (rows) in
yrep
. The value of the statistics in the observed data is overlaid as large point.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
See Also
Other PPCs:
PPC-censoring
,
PPC-discrete
,
PPC-distributions
,
PPC-errors
,
PPC-intervals
,
PPC-loo
,
PPC-overview
,
PPC-scatterplots
Examples
y <- example_y_data()
yrep <- example_yrep_draws()
ppc_stat(y, yrep)
ppc_stat(y, yrep, stat = "sd") + legend_none()
# use your own function for the 'stat' argument
color_scheme_set("brightblue")
q25 <- function(y) quantile(y, 0.25)
ppc_stat(y, yrep, stat = "q25") # legend includes function name
# can define the function in the 'stat' argument instead of
# using its name but then the legend doesn't include the function name
ppc_stat(y, yrep, stat = function(y) quantile(y, 0.25))
# plots by group
color_scheme_set("teal")
group <- example_group_data()
ppc_stat_grouped(y, yrep, group)
ppc_stat_grouped(y, yrep, group) + yaxis_text()
# force y-axes to have same scales, allow x axis to vary
ppc_stat_grouped(y, yrep, group, facet_args = list(scales = "free_x")) + yaxis_text()
# the freqpoly plots use frequency polygons instead of histograms
ppc_stat_freqpoly(y, yrep, stat = "median")
ppc_stat_freqpoly_grouped(y, yrep, group, stat = "median", facet_args = list(nrow = 2))
# ppc_stat_2d allows 2 statistics and makes a scatterplot
bayesplot_theme_set(ggplot2::theme_linedraw())
color_scheme_set("viridisE")
ppc_stat_2d(y, yrep, stat = c("mean", "sd"))
bayesplot_theme_set(ggplot2::theme_grey())
color_scheme_set("brewer-Paired")
ppc_stat_2d(y, yrep, stat = c("median", "mad"))
# reset aesthetics
color_scheme_set()
bayesplot_theme_set()
PPD distributions
Description
Plot posterior or prior predictive distributions. Each of these functions
makes the same plot as the corresponding ppc_
function
but without plotting any observed data y
. The Plot Descriptions section
at PPC-distributions has details on the individual plots.
Usage
ppd_data(ypred, group = NULL)
ppd_dens_overlay(
ypred,
...,
size = 0.25,
alpha = 0.7,
trim = FALSE,
bw = "nrd0",
adjust = 1,
kernel = "gaussian",
n_dens = 1024
)
ppd_ecdf_overlay(
ypred,
...,
discrete = FALSE,
pad = TRUE,
size = 0.25,
alpha = 0.7
)
ppd_dens(ypred, ..., trim = FALSE, size = 0.5, alpha = 1)
ppd_hist(ypred, ..., binwidth = NULL, bins = NULL, breaks = NULL, freq = TRUE)
ppd_freqpoly(
ypred,
...,
binwidth = NULL,
bins = NULL,
freq = TRUE,
size = 0.5,
alpha = 1
)
ppd_freqpoly_grouped(
ypred,
group,
...,
binwidth = NULL,
bins = NULL,
freq = TRUE,
size = 0.5,
alpha = 1
)
ppd_boxplot(ypred, ..., notch = TRUE, size = 0.5, alpha = 1)
Arguments
ypred |
An |
group |
A grouping variable of the same length as |
... |
Currently unused. |
size , alpha |
Passed to the appropriate geom to control the appearance of the predictive distributions. |
trim |
A logical scalar passed to |
bw , adjust , kernel , n_dens |
Optional arguments passed to
|
discrete |
For |
pad |
A logical scalar passed to |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
notch |
For the box plot, a logical scalar passed to
|
Details
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
See Also
Other PPDs:
PPD-intervals
,
PPD-overview
,
PPD-test-statistics
Examples
# difference between ppd_dens_overlay() and ppc_dens_overlay()
color_scheme_set("brightblue")
preds <- example_yrep_draws()
ppd_dens_overlay(ypred = preds[1:50, ])
ppc_dens_overlay(y = example_y_data(), yrep = preds[1:50, ])
PPD intervals
Description
Medians and central interval estimates of posterior or prior predictive
distributions. Each of these functions makes the same plot as the
corresponding ppc_
function but without plotting any
observed data y
. The Plot Descriptions section at PPC-intervals has
details on the individual plots.
Usage
ppd_intervals(
ypred,
x = NULL,
...,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 1,
fatten = 2.5,
linewidth = 1
)
ppd_intervals_grouped(
ypred,
x = NULL,
group,
...,
facet_args = list(),
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 1,
fatten = 2.5,
linewidth = 1
)
ppd_ribbon(
ypred,
x = NULL,
...,
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 0.25
)
ppd_ribbon_grouped(
ypred,
x = NULL,
group,
...,
facet_args = list(),
prob = 0.5,
prob_outer = 0.9,
alpha = 0.33,
size = 0.25
)
ppd_intervals_data(
ypred,
x = NULL,
group = NULL,
...,
prob = 0.5,
prob_outer = 0.9
)
ppd_ribbon_data(
ypred,
x = NULL,
group = NULL,
...,
prob = 0.5,
prob_outer = 0.9
)
Arguments
ypred |
An |
x |
A numeric vector to use as the x-axis
variable. For example, |
... |
Currently unused. |
prob , prob_outer |
Values between |
alpha , size , fatten , linewidth |
Arguments passed to geoms. For ribbon
plots |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
Other PPDs:
PPD-distributions
,
PPD-overview
,
PPD-test-statistics
Examples
color_scheme_set("brightblue")
ypred <- example_yrep_draws()
x <- example_x_data()
group <- example_group_data()
ppd_intervals(ypred[, 1:50])
ppd_intervals(ypred[, 1:50], fatten = 0)
ppd_intervals(ypred[, 1:50], fatten = 0, linewidth = 2)
ppd_intervals(ypred[, 1:50], prob_outer = 0.75, fatten = 0, linewidth = 2)
# put a predictor variable on the x-axis
ppd_intervals(ypred[, 1:100], x = x[1:100], fatten = 1) +
ggplot2::labs(y = "Prediction", x = "Some variable of interest")
# with a grouping variable too
ppd_intervals_grouped(
ypred = ypred[, 1:100],
x = x[1:100],
group = group[1:100],
size = 2,
fatten = 0,
facet_args = list(nrow = 2)
)
# even reducing size, ppd_intervals is too cluttered when there are many
# observations included (ppd_ribbon is better)
ppd_intervals(ypred, size = 0.5, fatten = 0.1, linewidth = 0.5)
ppd_ribbon(ypred)
ppd_ribbon(ypred, size = 0) # remove line showing median prediction
Plots of posterior or prior predictive distributions
Description
The bayesplot PPD module provides various plotting functions for creating graphical displays of simulated data from the posterior or prior predictive distribution. These plots are essentially the same as the corresponding PPC plots but without showing any observed data. Because these are not "checks" compared to data we use PPD (for prior/posterior predictive distribution) instead of PPC (for prior/posterior predictive check).
PPD plotting functions
The functions for plotting prior and
posterior predictive distributions without observed data each have the
prefix ppd_
and all have a required argument ypred
(a matrix of
predictions). The plots are organized into several categories, each with
its own documentation:
-
PPD-distributions: Histograms, kernel density estimates, boxplots, and other plots of multiple simulated datasets (rows) in
ypred
. These are the same as the plots in PPC-distributions but without including any comparison toy
. -
PPD-intervals: Interval estimates for each predicted observations (columns) in
ypred
. The x-axis variable can be optionally specified by the user (e.g. to plot against against a predictor variable or over time).These are the same as the plots in PPC-intervals but without including any comparison toy
. -
PPD-test-statistics: The distribution of a statistic, or a pair of statistics, over the simulated datasets (rows) in
ypred
. These are the same as the plots in PPC-test-statistics but without including any comparison toy
.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
Other PPDs:
PPD-distributions
,
PPD-intervals
,
PPD-test-statistics
PPD test statistics
Description
The distribution of a (test) statistic T(ypred)
, or a pair of (test)
statistics, over the simulations from the posterior or prior predictive
distribution. Each of these functions makes the same plot as the
corresponding ppc_
function but without comparing to
any observed data y
. The Plot Descriptions section at
PPC-test-statistics has details on the individual plots.
Usage
ppd_stat(
ypred,
stat = "mean",
...,
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppd_stat_grouped(
ypred,
group,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppd_stat_freqpoly(
ypred,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
freq = TRUE
)
ppd_stat_freqpoly_grouped(
ypred,
group,
stat = "mean",
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
freq = TRUE
)
ppd_stat_2d(ypred, stat = c("mean", "sd"), ..., size = 2.5, alpha = 0.7)
ppd_stat_data(ypred, group = NULL, stat)
Arguments
ypred |
An |
stat |
A single function or a string naming a function, except for the 2D plot which requires a vector of exactly two names or functions. In all cases the function(s) should take a vector input and return a scalar statistic. If specified as a string (or strings) then the legend will display the function name(s). If specified as a function (or functions) then generic naming is used in the legend. |
... |
Currently unused. |
binwidth |
Passed to |
bins |
Passed to |
breaks |
Passed to |
freq |
For histograms, |
group |
A grouping variable of the same length as |
facet_args |
A named list of arguments (other than |
size , alpha |
For the 2D plot only, arguments passed to
|
Details
For Binomial data, the plots may be more useful if the input contains the "success" proportions (not discrete "success" or "failure" counts).
Value
The plotting functions return a ggplot object that can be further
customized using the ggplot2 package. The functions with suffix
_data()
return the data that would have been drawn by the plotting
function.
References
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378. (journal version, arXiv preprint, code on GitHub)
See Also
Other PPDs:
PPD-distributions
,
PPD-intervals
,
PPD-overview
Examples
yrep <- example_yrep_draws()
ppd_stat(yrep)
ppd_stat(yrep, stat = "sd") + legend_none()
# use your own function for the 'stat' argument
color_scheme_set("brightblue")
q25 <- function(y) quantile(y, 0.25)
ppd_stat(yrep, stat = "q25") # legend includes function name
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- dplyr
Default bayesplot plotting theme
Description
The theme_default()
function returns the default ggplot
theme used by the bayesplot plotting functions. See
bayesplot_theme_set()
for details on setting and updating the plotting
theme.
Usage
theme_default(
base_size = getOption("bayesplot.base_size", 12),
base_family = getOption("bayesplot.base_family", "serif")
)
Arguments
base_size , base_family |
Base font size and family (passed to
|
Value
A ggplot theme object.
See Also
bayesplot_theme_set()
to change the ggplot theme.
bayesplot-colors to set or view the color scheme used for plotting.
bayesplot-helpers for a variety of convenience functions, many of which provide shortcuts for tweaking theme elements after creating a plot.
Examples
class(theme_default())
bayesplot_theme_set() # defaults to setting theme_default()
x <- example_mcmc_draws()
mcmc_hist(x)
# change the default font size and family for bayesplots
bayesplot_theme_set(theme_default(base_size = 8, base_family = "sans"))
mcmc_hist(x)
mcmc_areas(x, regex_pars = "beta")
# change back
bayesplot_theme_set()
mcmc_areas(x, regex_pars = "beta")
Tidy parameter selection
Description
Parameter selection in the style of dplyr and other tidyverse packages.
Usage
param_range(prefix, range, vars = NULL)
param_glue(pattern, ..., vars = NULL)
Arguments
prefix , range |
For param_range("beta", c(1,2,8)) would select parameters named |
vars |
|
pattern , ... |
For param_glue("beta_{var}[{level}]", var = c("age", "income"), level = c(3,8)) would select parameters with names
|
Details
As of version 1.7.0
, bayesplot allows the pars
argument for MCMC plots to use "tidy" variable selection (in the
style of the dplyr package). The vars()
function is
re-exported from dplyr for this purpose.
Features of tidy selection includes direct selection (vars(alpha, sigma)
),
everything-but selection (vars(-alpha)
), ranged selection
(vars(`beta[1]`:`beta[3]`)
), support for selection functions
(vars(starts_with("beta"))
), and combinations of these features. See the
Examples section, below.
When using pars
for tidy parameter selection, the regex_pars
argument is
ignored because bayesplot supports using tidyselect helper functions (starts_with()
, contains()
,
num_range()
, etc.) for the same purpose. bayesplot also exports some
additional helper functions to help with parameter selection:
-
param_range()
: likenum_range()
but used when parameter indexes are in brackets (e.g.beta[2]
). -
param_glue()
: for more complicated parameter names with multiple indexes (including variable names) inside the brackets (e.g.,beta[(Intercept) age_group:3]
).
These functions can be used inside of vars()
, dplyr::select()
,
and similar functions, just like the
tidyselect helper functions.
Extra Advice
Parameter names in vars()
are not quoted. When the names contain special
characters like brackets, they should be wrapped in backticks, as in
vars(`beta[1]`)
.
To exclude a range of variables, wrap the sequence in parentheses and then
negate it. For example, (vars(-(`beta[1]`:`beta[3]`))
) would exclude
beta[1]
, beta[2]
, and beta[3]
.
vars()
is a helper function. It holds onto the names and expressions used
to select columns. When selecting variables inside a bayesplot
function, use vars(...)
: mcmc_hist(data, pars = vars(alpha))
. When
using select()
to prepare a dataframe for a bayesplot function, do
not use vars()
: data %>% select(alpha) %>% mcmc_hist()
.
Internally, tidy selection works by converting names and expressions
into position numbers. As a result, integers will select parameters;
vars(1, 3)
selects the first and third ones. We do not endorse this
approach because positions might change as variables are added and
removed from models. To select a parameter that happens to be called 1
,
use backticks to escape it vars(`1`)
.
See Also
Examples
x <- example_mcmc_draws(params = 6)
dimnames(x)
mcmc_hex(x, pars = vars(alpha, `beta[2]`))
mcmc_dens(x, pars = vars(sigma, contains("beta")))
mcmc_hist(x, pars = vars(-contains("beta")))
# using the param_range() helper
mcmc_hist(x, pars = vars(param_range("beta", c(1, 3, 4))))
#############################
## Examples using rstanarm ##
#############################
if (requireNamespace("rstanarm", quietly = TRUE)) {
# see ?rstanarm::example_model
fit <- example("example_model", package = "rstanarm", local=TRUE)$value
print(fit)
posterior <- as.data.frame(fit)
str(posterior)
color_scheme_set("brightblue")
mcmc_hist(posterior, pars = vars(size, contains("period")))
# same as previous but using dplyr::select() and piping
library("dplyr")
posterior %>%
select(size, contains("period")) %>%
mcmc_hist()
mcmc_intervals(posterior, pars = vars(contains("herd")))
mcmc_intervals(posterior, pars = vars(contains("herd"), -contains("Sigma")))
bayesplot_theme_set(ggplot2::theme_dark())
color_scheme_set("viridisC")
mcmc_areas_ridges(posterior, pars = vars(starts_with("b[")))
bayesplot_theme_set()
color_scheme_set("purple")
not_789 <- vars(starts_with("b["), -matches("[7-9]"))
mcmc_intervals(posterior, pars = not_789)
# using the param_glue() helper
just_149 <- vars(param_glue("b[(Intercept) herd:{level}]", level = c(1,4,9)))
mcmc_intervals(posterior, pars = just_149)
# same but using param_glue() with dplyr::select()
# before passing to bayesplot
posterior %>%
select(param_glue("b[(Intercept) herd:{level}]",
level = c(1, 4, 9))) %>%
mcmc_intervals()
}
## Not run:
###################################
## More examples of param_glue() ##
###################################
library(dplyr)
posterior <- tibble(
b_Intercept = rnorm(1000),
sd_condition__Intercept = rexp(1000),
sigma = rexp(1000),
`r_condition[A,Intercept]` = rnorm(1000),
`r_condition[B,Intercept]` = rnorm(1000),
`r_condition[C,Intercept]` = rnorm(1000),
`r_condition[A,Slope]` = rnorm(1000),
`r_condition[B,Slope]` = rnorm(1000)
)
posterior
# using one expression in braces
posterior %>%
select(
param_glue("r_condition[{level},Intercept]", level = c("A", "B"))
) %>%
mcmc_hist()
# using multiple expressions in braces
posterior %>%
select(
param_glue(
"r_condition[{level},{type}]",
level = c("A", "B"),
type = c("Intercept", "Slope"))
) %>%
mcmc_hist()
## End(Not run)