Title: | Censored Regression with Conditional Heteroscedasticity |
Version: | 1.2-2 |
Date: | 2025-03-14 |
Depends: | R (≥ 3.6.0) |
Imports: | stats, Formula, ordinal, sandwich, scoringRules |
Suggests: | distributions3 (≥ 0.2.1), glmx, knitr, lmtest, memisc, quarto |
Description: | Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals. Infrastructure for working with censored or truncated normal, logistic, and Student-t distributions, i.e., d/p/q/r functions and distributions3 objects. |
License: | GPL-2 | GPL-3 |
URL: | https://topmodels.R-Forge.R-project.org/crch/ |
BugReports: | https://topmodels.R-Forge.R-project.org/crch/contact.html |
Encoding: | UTF-8 |
VignetteBuilder: | quarto |
NeedsCompilation: | yes |
Packaged: | 2025-03-14 14:16:09 UTC; zeileis |
Author: | Achim Zeileis |
Maintainer: | Achim Zeileis <Achim.Zeileis@R-project.org> |
Repository: | CRAN |
Date/Publication: | 2025-03-14 19:40:06 UTC |
Create a Censored Logistic Distribution
Description
Class and methods for left-, right-, and interval-censored logistic distributions using the workflow from the distributions3 package.
Usage
CensoredLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
location |
numeric. The location parameter of the underlying uncensored
logistic distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying uncensored logistic distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
Details
The constructor function CensoredLogistic
sets up a distribution
object, representing the censored logistic probability distribution by the
corresponding parameters: the latent mean location
= \mu
and
latent standard deviation scale
= \sigma
(i.e., the parameters
of the underlying uncensored logistic variable), the left
censoring
point (with -Inf
corresponding to uncensored), and the
right
censoring point (with Inf
corresponding to uncensored).
The censored logistic distribution has probability density function (PDF) f(x)
:
\Lambda((left - \mu)/\sigma) | if x \le left |
1 - \Lambda((right - \mu)/\sigma) | if x \ge right |
\lambda((x - \mu)/\sigma)/\sigma | otherwise |
where \Lambda
and \lambda
are the cumulative distribution function
and probability density function of the standard logistic distribution,
respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored logistic distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredLogistic
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored logistic distributions in the crch
package, see dclogis
, and the crps_clogis
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
Value
A CensoredLogistic
distribution object.
See Also
dclogis
, Logistic
, TruncatedLogistic
,
CensoredNormal
, CensoredStudentsT
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three censored logistic distributions:
## - uncensored standard logistic
## - left-censored at zero with latent location = 1 and scale = 1
## - interval-censored in [0, 5] with latent location = 2 and scale = 1
X <- CensoredLogistic(
location = c( 0, 1, 2),
scale = c( 1, 1, 1),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean of the censored distribution
mean(X)
## higher moments (variance, skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "uncensored")
hist(x[2, ], main = "left-censored at zero")
hist(x[3, ], main = "interval-censored in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses (except at censoring points)
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
Create a Censored Normal Distribution
Description
Class and methods for left-, right-, and interval-censored normal distributions using the workflow from the distributions3 package.
Usage
CensoredNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
Arguments
mu |
numeric. The location parameter of the underlying uncensored
normal distribution, typically written |
sigma |
numeric. The scale parameter (standard deviation) of
the underlying uncensored normal distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
Details
The constructor function CensoredNormal
sets up a distribution
object, representing the censored normal probability distribution by the
corresponding parameters: the latent mean mu
= \mu
and
latent standard deviation sigma
= \sigma
(i.e., the parameters
of the underlying uncensored normal variable), the left
censoring
point (with -Inf
corresponding to uncensored), and the
right
censoring point (with Inf
corresponding to uncensored).
The censored normal distribution has probability density function (PDF) f(x)
:
\Phi((left - \mu)/\sigma) | if x \le left |
1 - \Phi((right - \mu)/\sigma) | if x \ge right |
\phi((x - \mu)/\sigma)/\sigma | otherwise |
where \Phi
and \phi
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored normal distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredNormal
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored normal distributions in the crch
package, see dcnorm
, and the crps_cnorm
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
Value
A CensoredNormal
distribution object.
See Also
dcnorm
, Normal
, TruncatedNormal
,
CensoredLogistic
, CensoredStudentsT
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three censored normal distributions:
## - uncensored standard normal
## - left-censored at zero (Tobit) with latent mu = 1 and sigma = 1
## - interval-censored in [0, 5] with latent mu = 1 and sigma = 2
X <- CensoredNormal(
mu = c( 0, 1, 1),
sigma = c( 1, 1, 2),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean and variance of the censored distribution
mean(X)
variance(X)
## higher moments (skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "uncensored")
hist(x[2, ], main = "left-censored at zero")
hist(x[3, ], main = "interval-censored in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses (except at censoring points)
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
Create a Censored Student's T Distribution
Description
Class and methods for left-, right-, and interval-censored t distributions using the workflow from the distributions3 package.
Usage
CensoredStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
df |
numeric. The degrees of freedom of the underlying uncensored
t distribution. Can be any positive number, with |
location |
numeric. The location parameter of the underlying uncensored
t distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying uncensored t distribution,
typically written |
left |
numeric. The left censoring point. Can be any real number,
defaults to |
right |
numeric. The right censoring point. Can be any real number,
defaults to |
Details
The constructor function CensoredStudentsT
sets up a distribution
object, representing the censored t probability distribution by the
corresponding parameters: the degrees of freedom df
, the latent mean
location
= \mu
and latent scale parameter scale
= \sigma
(i.e., the parameters of the underlying uncensored t variable),
the left
censoring point (with -Inf
corresponding to uncensored),
and the right
censoring point (with Inf
corresponding to uncensored).
The censored t distribution has probability density function (PDF) f(x)
:
T((left - \mu)/\sigma) | if x \le left |
1 - T((right - \mu)/\sigma) | if x \ge right |
\tau((x - \mu)/\sigma)/\sigma | otherwise |
where T
and \tau
are the cumulative distribution function
and probability density function of the standard t distribution with
df
degrees of freedom, respectively.
All parameters can also be vectors, so that it is possible to define a vector of censored t distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the CensoredStudentsT
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the censored t distributions in the crch
package, see dct
, and the crps_ct
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (only TRUE
if
there is no censoring, i.e., if both left
and right
are infinite).
See the examples below for an illustration of the workflow for the class and methods.
Value
A CensoredStudentsT
distribution object.
See Also
dct
, StudentsT
, TruncatedStudentsT
,
CensoredNormal
, CensoredLogistic
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three censored t distributions:
## - uncensored standard t with 5 degrees of freedom
## - left-censored at zero with 5 df, latent location = 1 and scale = 1
## - interval-censored in [0, 5] with 5 df, latent location = 2 and scale = 2
X <- CensoredStudentsT(
df = c( 5, 5, 5),
location = c( 0, 1, 2),
scale = c( 1, 1, 2),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean of the censored distribution
mean(X)
## higher moments (variance, skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "uncensored")
hist(x[2, ], main = "left-censored at zero")
hist(x[3, ], main = "interval-censored in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses (except at censoring points)
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
The Censored Logistic Distribution
Description
Density, distribution function, quantile function, and random generation for the left and/or right censored logistic distribution.
Usage
dclogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE)
pclogis(q, location = 0, scale = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qclogis(p, location = 0, scale = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rclogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored logistic distribution has density f(x)
:
\Lambda((left - \mu)/\sigma) | if x \le left |
1 - \Lambda((right - \mu)/\sigma) | if x \ge right |
\lambda((x - \mu)/\sigma)/\sigma | if left < x < right
|
where \Lambda
and \lambda
are the cumulative distribution function
and probability density function of the standard logistic distribution
respectively, \mu
is the location of the distribution, and \sigma
the scale.
Value
dclogis
gives the density, pclogis
gives the distribution
function, qclogis
gives the quantile function, and rclogis
generates random deviates.
See Also
The Censored Normal Distribution
Description
Density, distribution function, quantile function, and random generation for the left and/or right censored normal distribution.
Usage
dcnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE)
pcnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qcnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rcnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
mean |
vector of means. |
sd |
vector of standard deviations. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If mean
or sd
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored normal distribution has density f(x)
:
\Phi((left - \mu)/\sigma) | if x \le left |
1 - \Phi((right - \mu)/\sigma) | if x \ge right |
\phi((x - \mu)/\sigma)/\sigma | if left < x < right
|
where \Phi
and \phi
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively, \mu
is the mean of the distribution, and \sigma
the standard deviation.
Value
dcnorm
gives the density, pcnorm
gives the distribution
function, qcnorm
gives the quantile function, and rcnorm
generates random deviates.
See Also
Methods for Fitted crch Models
Description
Methods for extracting information from fitted crch
objects.
Usage
## S3 method for class 'crch'
coef(object, model = c("full", "location", "scale", "df"), ...)
## S3 method for class 'crch'
vcov(object, model = c("full", "location", "scale", "df"), ...)
## S3 method for class 'crch'
terms(x, model = c("location", "scale", "full"), ...)
## S3 method for class 'crch'
fitted(object, type = c("location", "scale"), ...)
Arguments
object , x |
an object of class |
model |
model for which coefficients shall be returned. |
type |
type of fitted values. |
... |
further arguments passed to or from other methods. |
Details
In addition to the methods above, a set of standard extractor functions for
"crch"
objects is available, including methods to the generic
functions print
, summary
,
logLik
, and residuals
.
See Also
Methods for Boosted crch Models
Description
Methods for extracting information from fitted crch.boost
objects.
Usage
## S3 method for class 'crch.boost'
coef(object, model = c("full", "location", "scale", "df"),
mstop = NULL, zero.coefficients = FALSE, standardize = FALSE, ...)
## S3 method for class 'crch.boost'
print(x, digits = max(3, getOption("digits") - 3),
mstop = NULL, zero.coefficients = FALSE, ...)
## S3 method for class 'crch.boost'
summary(object, mstop = NULL, zero.coefficients = FALSE, ...)
## S3 method for class 'crch.boost'
logLik(object, mstop = NULL, ...)
Arguments
object , x |
an object of class |
model |
model for which coefficients shall be returned. |
mstop |
stopping iteration for which coefficients shall be returned.
Can be either a character ( |
zero.coefficients |
logical whether zero coefficients are returned. |
standardize |
logical whether coefficients shall be standardized. |
digits |
the number of significant digits to use when printing. |
... |
further arguments passed to or from other methods. |
Details
In addition to the methods above, the "crch"
methods
terms
, model.frame
, model.matrix
,
residuals
, and fitted
can be used also for
"crch.boost"
objects .
See Also
Methods for Fitted hxlr Models
Description
Methods for extracting information from fitted hxlr
objects.
Usage
## S3 method for class 'hxlr'
coef(object, model = c("full", "intercept", "location", "scale"),
type = c("CLM", "latent"), ...)
## S3 method for class 'hxlr'
vcov(object, model = c("full", "intercept", "location", "scale"),
type = c("CLM", "latent"), ...)
## S3 method for class 'hxlr'
terms(x, model = c("full", "location", "scale"), ...)
Arguments
object , x |
an object of class |
model |
model for which coefficients shall be returned. |
type |
type of coefficients. Default are CLM type coefficients. For
type |
... |
further arguments passed to or from other methods. |
Details
In addition to the methods above, a set of standard extractor functions for
"hxlr"
objects is available, including methods to the generic
functions print
, summary
, and
logLik
.
See Also
Censored and Truncated Regression with Conditional Heteroscedasticy
Description
Fitting censored (tobit) or truncated regression models with conditional heteroscedasticy.
Usage
crch(formula, data, subset, na.action, weights, offset,
link.scale = c("log", "identity", "quadratic"),
dist = c("gaussian", "logistic", "student"), df = NULL,
left = -Inf, right = Inf, truncated = FALSE,
type = c("ml", "crps"), control = crch.control(...),
model = TRUE, x = FALSE, y = FALSE, ...)
trch(formula, data, subset, na.action, weights, offset,
link.scale = c("log", "identity", "quadratic"),
dist = c("gaussian", "logistic", "student"), df = NULL,
left = -Inf, right = Inf, truncated = TRUE,
type = c("ml", "crps"), control = crch.control(...),
model = TRUE, x = FALSE, y = FALSE, ...)
crch.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian",
df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL,
control = crch.control())
Arguments
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
subset |
an optional vector specifying a subset of observations to be used for fitting. |
na.action |
a function which indicates what should happen when the data
contain |
weights |
optional case weights in fitting. |
offset |
optional numeric vector with a priori known component to
be included in the linear predictor for the location. For |
link.scale |
character specification of the link function in
the scale model. Currently, |
dist |
assumed distribution for the dependent variable |
df |
optional degrees of freedom for |
left |
left limit for the censored dependent variable |
right |
right limit for the censored dependent variable |
truncated |
logical. If |
type |
loss function to be optimized. Can be either |
control |
a list of control parameters passed to |
model |
logical. If |
x , y |
for |
z |
a design matrix with regressors for the scale. |
... |
arguments to be used to form the default |
Details
crch
fits censored (tobit) or truncated regression models with conditional
heteroscedasticy with maximum likelihood estimation. Student-t, Gaussian, and
logistic distributions can be fitted to left- and/or right censored or
truncated responses. Different regressors can be used to model the location
and the scale of this distribution. If control=crch.boost()
optimization is performed by boosting.
trch
is a wrapper function for crch
with default
truncated = TRUE
.
crch.fit
is the lower level function where the actual
fitting takes place.
Value
An object of class "crch"
or "crch.boost"
, i.e., a list with the
following elements.
coefficients |
list of coefficients for location, scale, and df. Scale and df coefficients are in log-scale. |
df |
if |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
list of fitted location and scale parameters. |
dist |
assumed distribution for the dependent variable |
cens |
list of censoring points. |
optim |
output from optimization from |
method |
optimization method used for |
type |
used loss function (maximum likelihood or minimum CRPS). |
control |
list of control parameters passed to |
start |
starting values of coefficients used in the optimization. |
weights |
case weights used for fitting. |
offset |
list of offsets for location and scale. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
vcov |
covariance matrix. |
link |
a list with element |
truncated |
logical indicating wheter a truncated model has been fitted. |
converged |
logical variable whether optimization has converged or not. |
iterations |
number of iterations in optimization. |
call |
function call. |
formula |
the formula supplied. |
terms |
the |
levels |
list of levels of the factors used in fitting for location and scale respectively. |
contrasts |
(where relevant) the contrasts used. |
y |
if requested, the response used. |
x |
if requested, the model matrix used. |
model |
if requested, the model frame used. |
stepsize , mstop , mstopopt , standardize |
return values of boosting
optimization. See |
References
Messner JW, Mayr GJ, Zeileis A (2016). Heteroscedastic Censored and Truncated Regression with crch. The R Journal, 3(1), 173–181. doi:10.32614/RJ-2016-012
Messner JW, Zeileis A, Broecker J, Mayr GJ (2014). Probabilistic Wind Power Forecasts with an Inverse Power Curve Transformation and Censored Regression. Wind Energy, 17(11), 1753–1766. doi:10.1002/we.1666
See Also
predict.crch
, crch.control
, crch.boost
Examples
data("RainIbk", package = "crch")
## mean and standard deviation of square root transformed ensemble forecasts
RainIbk$sqrtensmean <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean)
RainIbk$sqrtenssd <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd)
## fit linear regression model with Gaussian distribution
CRCH <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian")
## same as lm?
all.equal(
coef(lm(sqrt(rain) ~ sqrtensmean, data = RainIbk)),
head(coef(CRCH), -1),
tol = 1e-6)
## print
CRCH
## summary
summary(CRCH)
## left censored regression model with censoring point 0:
CRCH2 <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk,
dist = "gaussian", left = 0)
## left censored regression model with censoring point 0 and
## conditional heteroscedasticy:
CRCH3 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk,
dist = "gaussian", left = 0)
## left censored regression model with censoring point 0 and
## conditional heteroscedasticy with logistic distribution:
CRCH4 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk,
dist = "logistic", left = 0)
## compare AIC
AIC(CRCH, CRCH2, CRCH3, CRCH4)
Auxiliary Functions for Boosting crch Models
Description
Auxiliary functions to fit crch
models via boosting
Usage
crch.boost(maxit = 100, nu = 0.1, start = NULL, dot = "separate",
mstop = c("max", "aic", "bic", "cv"), nfolds = 10, foldid = NULL,
maxvar = NULL)
crch.boost.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian",
df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL,
control = crch.boost())
Arguments
maxit |
the maximum number of boosting iterations. |
nu |
boosting step size. Default is 0.1. |
start |
a previously boosted but not converged |
dot |
character specifying how to process formula parts with a dot
( |
mstop |
method to find optimum stopping iteration. Default is |
nfolds |
if |
foldid |
if |
maxvar |
Positive |
x , z , y , left , right , truncated , dist , df , link.scale , type , weights , offset , control |
see |
Details
crch.boost
extends crch
to fit censored (tobit) or
truncated regression models with conditional heteroscedasticy by
boosting. If crch.boost()
is supplied as control
in
crch
then crch.boost.fit
is used as lower level fitting
function. Note that crch.control()
with method=boosting
is equivalent to crch.boost()
. Thus, boosting can more
conveniently be called with crch(..., method = "boosting")
.
Value
For crch.boost
: A list with components named as the arguments.
For crch.boost.fit
: An object of class "crch.boost"
,
i.e., a list with the following elements.
coefficients |
list of coefficients for location and scale. Scale
coefficients are in log-scale. Coefficients are of optimum stopping
stopping iteration specified by |
df |
if |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
list of fitted location and scale parameters at
optimum stopping iteration specified by |
dist |
assumed distribution for the dependent variable |
cens |
list of censoring points. |
control |
list of control parameters. |
weights |
case weights used for fitting. |
offset |
list of offsets for location and scale. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
link |
a list with element |
truncated |
logical indicating wheter a truncated model has been fitted. |
iterations |
number of boosting iterations. |
stepsize |
boosting stepsize |
mstop |
criterion used to find optimum stopping iteration. |
mstopopt |
optimum stopping iterations for different criteria. |
standardize |
list of center and scale values used to standardize response and regressors. |
References
Messner JW, Mayr GJ, Zeileis A (2017). Non-Homogeneous Boosting for Predictor Selection in Ensemble Post-Processing. Monthly Weather Review, 145(1), 137–147. doi:10.1175/MWR-D-16-0088.1
See Also
Examples
# generate data
suppressWarnings(RNGversion("3.5.0"))
set.seed(5)
x <- matrix(rnorm(1000*20),1000,20)
y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3]))
y <- pmax(0, y)
data <- data.frame(cbind(y, x))
# fit model with maximum likelihood
CRCH <- crch(y ~ .|., data = data, dist = "gaussian", left = 0)
# fit model with boosting
boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0,
control = crch.boost(mstop = "aic"))
# more conveniently, the same model can also be fit through
# boost <- crch(y ~ .|., data = data, dist = "gaussian", left = 0,
# method = "boosting", mstop = "aic")
# AIC comparison
AIC(CRCH, boost)
# summary
summary(boost)
# plot
plot(boost)
Control Options for crch Models
Description
Auxiliary function for crch
fitting. Specifies a list of values passed
to optim
.
Usage
crch.control(method = "BFGS", maxit = NULL, hessian = NULL,
trace = FALSE, start = NULL, dot = "separate",
lower = -Inf, upper = Inf, ...)
Arguments
method |
optimization method passed to |
maxit |
the maximum number of iterations. Default is 5000 except for
|
hessian |
logical or NULL. If TRUE the numerical Hessian matrix from the
|
trace |
non-negative integer. If positive, tracing information on the progress of the optimization is produced. |
start |
initial values for the parameters to be optimized over. |
dot |
character specifying how to process formula parts with a dot
( |
lower , upper |
bounds on the variables for the |
... |
additional parameters passed to |
Value
A list with components named as the arguments.
See Also
Auxiliary Functions for Stability Selection Using Boosting
Description
Auxilirary function which allows to do stability selection on heteroscedastic
crch
models based on crch.boost
.
Usage
crch.stabsel(formula, data, ..., nu = 0.1, q, B = 100, thr = 0.9,
maxit = 2000, data_percentage = 0.5)
Arguments
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
... |
Additional attributes to control the |
nu |
Boosting step size (see |
q |
Positive |
B |
|
thr |
|
maxit |
Positive |
data_percentage |
Percentage of data which should be sampled in each of the
iterations. Default (and suggested) is |
Details
crch.boost
allows to perform gradient boosting on heteroscedastic
additive models. crch.stabsel
is a wrapper around the core crch.boost
algorithm to perform stability selection (see references).
Half of the data set (data
) is sampled B
times to perform boosting
(based on crch.boost
). Rather than perform the boosting iterations
until a certain stopping criterion is reached (e.g., maximum number of iterations
maxit
) the algorithm stops as soon as q
parameters have been selected.
The number of parameters is computed across both parameters location and scale.
Intercepts are not counted.
Value
Returns an object of class "stabsel.crch"
containing the stability
selection summary and the new formula based on the stability selection.
table |
A table object containing the parameters which have been selected and the corresponding frequency of selection. |
formula.org |
Original formula used to perform the stability selection. |
formula.new |
New formula based including the coefficients selected during stability selection. |
family |
A list object which contains the distribution-specification from
the |
parameter |
List with the parameters used to perform the stability selection
including |
References
Meinhausen N, Buehlmann P (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. doi:10.1111/j.1467-9868.2010.00740.x.
See Also
Examples
# generate data
suppressWarnings(RNGversion("3.5.0"))
set.seed(5)
x <- matrix(rnorm(1000*20),1000,20)
y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3]))
y <- pmax(0, y)
data <- data.frame(cbind(y, x))
# fit model with maximum likelihood
CRCH1 <- crch(y ~ .|., data = data, dist = "gaussian", left = 0)
# Perform stability selection
stabsel <- crch.stabsel(y ~ .|., data = data, dist = "gaussian", left = 0,
q = 8, B = 5)
# Show stability selection summary
print(stabsel); plot(stabsel)
CRCH2 <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0 )
BOOST <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0,
control = crch.boost() )
### AIC comparison
sapply( list(CRCH1,CRCH2,BOOST), logLik )
The Censored Student-t Distribution
Description
Density, distribution function, quantile function, and random generation
for the left and/or right censored student-t distribution with df
degrees of freedom.
Usage
dct(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE)
pct(q, location = 0, scale = 1, df, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qct(p, location = 0, scale = 1, df, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rct(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
df |
degrees of freedom (> 0, maybe non-integer). |
left |
left censoring point. |
right |
right censoring point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The censored student-t distribution has density f(x)
:
T((left - \mu)/\sigma) | if x \le left |
1 - T((right - \mu)/\sigma) | if x \ge right |
\tau((x - \mu)/\sigma)/\sigma | if left < x < right
|
where T
and \tau
are the cumulative distribution function
and probability density function of the student-t distribution with
df
degrees of freedom respectively, \mu
is the location of the
distribution, and \sigma
the scale.
Value
dct
gives the density, pct
gives the distribution
function, qct
gives the quantile function, and rct
generates random deviates.
See Also
Heteroscedastic Extended Logistic Regression
Description
This is a wrapper function for clm
(from package
ordinal) to fit (heteroscedastic) extended logistic regression (HXLR)
models (Messner et al. 2013).
Usage
hxlr(formula, data, subset, na.action, weights, thresholds, link, control, ...)
Arguments
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas. |
subset |
an optional vector specifying a subset of observations to be used for fitting. |
na.action |
a function which indicates what should happen when the data
contain |
weights |
optional case weights in fitting. |
thresholds |
vector of (transformed) thresholds that are used to cut the continuous response into categories. Data frames or matrices with multiple columns are allowed as well. Then each column is used as separate predictor variable for the intercept model. |
link |
link function, i.e., the type of location-scale distribution
assumed for the latent distribution. Default is |
control |
a list of control parameters passed to |
... |
arguments to be used to form the default |
Details
Extended logistic regression (Wilks 2009) extends binary logistic regression to multi-category responses by including the thresholds, that are used to cut a continuous variable into categories, in the regression equation. Heteroscedastic extended logistic regression (Messner et al. 2013) extends this model further and allows to add additional predictor variables that are used to predict the scale of the latent logistic distribution.
Value
An object of class "hxlr"
, i.e., a list with the following elements.
coefficients |
list of CLM coefficients for intercept, location, and scale model. |
fitted.values |
list of fitted location and scale parameters. |
optim |
output from optimization from |
method |
Optimization method used for |
control |
list of control parameters passed to |
start |
starting values of coefficients used in the optimization. |
weights |
case weights used for fitting. |
n |
number of observations. |
nobs |
number of observations with non-zero weights. |
loglik |
log-likelihood. |
vcov |
covariance matrix. |
converged |
logical variable whether optimization has converged or not. |
iterations |
number of iterations in optimization. |
call |
function call. |
scale |
the formula supplied. |
terms |
the |
levels |
list of levels of the factors used in fitting for location and scale respectively. |
thresholds |
the thresholds supplied. |
References
Messner JW, Mayr GJ, Zeileis A, Wilks DS (2014). Extending Extended Logistic Regression to Effectively Utilize the Ensemble Spread. Monthly Weather Review, 142, 448–456. doi:10.1175/MWR-D-13-00271.1.
Wilks DS (2009). Extending Logistic Regression to Provide Full-Probability-Distribution MOS Forecasts. Meteorological Applications, 368, 361–368.
See Also
Examples
data("RainIbk", package = "crch")
## mean and standard deviation of square root transformed ensemble forecasts
RainIbk$sqrtensmean <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean)
RainIbk$sqrtenssd <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd)
## climatological deciles
q <- unique(quantile(RainIbk$rain, seq(0.1, 0.9, 0.1)))
## fit ordinary extended logistic regression with ensemble mean as
## predictor variable
XLR <- hxlr(sqrt(rain) ~ sqrtensmean, data = RainIbk, thresholds = sqrt(q))
## print
XLR
## summary
summary(XLR)
## fit ordinary extended logistic regression with ensemble mean
## and standard deviation as predictor variables
XLRS <- hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = RainIbk,
thresholds = sqrt(q))
## fit heteroscedastic extended logistic regression with ensemble
## standard deviation as predictor for the scale
HXLR <- hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = RainIbk,
thresholds = sqrt(q))
## compare AIC of different models
AIC(XLR, XLRS, HXLR)
## XLRS and HXLR are nested in XLR -> likelihood-ratio-tests
if(require("lmtest")) {
lrtest(XLR, XLRS)
lrtest(XLR, HXLR)
}
## Not run:
###################################################################
## Cross-validation and bootstrapping RPS for different models
## (like in Messner 2013).
N <- NROW(RainIbk)
## function that returns model fits
fits <- function(data, weights = rep(1, N)) {
list(
"XLR" = hxlr(sqrt(rain) ~ sqrtensmean, data = data,
weights = weights, thresholds = sqrt(q)),
"XLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = data,
weights = weights, thresholds = sqrt(q)),
"XLR:SM" = hxlr(sqrt(rain) ~ sqrtensmean + I(sqrtensmean*sqrtenssd),
data = data, weights = weights, thresholds = sqrt(q)),
"HXLR" = hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = data,
weights = weights, thresholds = sqrt(q)),
"HXLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd | sqrtenssd,
data = data, weights = weights, thresholds = sqrt(q))
)
}
## cross validation
id <- sample(1:10, N, replace = TRUE)
obs <- NULL
pred <- list(NULL)
for(i in 1:10) {
## splitting into test and training data set
trainIndex <- which(id != i)
testIndex <- which(id == i)
## weights that are used for fitting the models
weights <- as.numeric(table(factor(trainIndex, levels = c(1:N))))
## testdata
testdata <- RainIbk[testIndex,]
## observations
obs <- c(obs, RainIbk$rain[testIndex])
## estimation
modelfits <- fits(RainIbk, weights)
## Prediction
pred2 <- lapply(modelfits, predict, newdata = testdata, type = "cumprob")
pred <- mapply(rbind, pred, pred2, SIMPLIFY = FALSE)
}
names(pred) <- c(names(modelfits))
## function to compute RPS
rps <- function(pred, obs) {
OBS <- NULL
for(i in 1:N)
OBS <- rbind(OBS, rep(0:1, c(obs[i] - 1, length(q) - obs[i] + 1)))
apply((OBS-pred)^2, 1, sum)
}
## compute rps
RPS <- lapply(pred, rps, obs = as.numeric(cut(obs, c(-Inf, q, Inf))))
## bootstrapping mean rps
rpsall <- NULL
for(i in 1:250) {
index <- sample(length(obs), replace = TRUE)
rpsall <- rbind(rpsall, sapply(RPS, function(x) mean(x[index])))
}
rpssall <- 1 - rpsall/rpsall[,1]
boxplot(rpssall[,-1], ylab = "RPSS", main = "RPSS relative to XLR")
abline(h = 0, lty = 2)
## End(Not run)
Control Options for hxlr Models
Description
Auxiliary function for hxlr
fitting. Specifies a list of values passed
to optim
.
Usage
hxlr.control(method = "BFGS", maxit = 5000, hessian = TRUE,
trace = FALSE, start = NULL, ...)
Arguments
method |
optimization method used in |
maxit |
the maximum number of iterations. |
hessian |
logical. Should a numerically differentiated Hessian matrix be returned? |
trace |
non-negative integer. If positive, tracing information on the progress of the optimization is produced. |
start |
initial values for the parameters to be optimized over. |
... |
Additional parameters passed to |
Value
A list with components named as the arguments.
See Also
Visualizing Coefficient Paths for Boosted crch Models
Description
Plot paths of coefficients or log-likelihood contributions for crch.boost
models.
Usage
## S3 method for class 'crch.boost'
plot(x, loglik = FALSE,
standardize = TRUE, which = c("both", "location", "scale"),
mstop = NULL, coef.label = TRUE, col = NULL, ...)
Arguments
x |
an object of class |
loglik |
logical whether log-likelihood contribution shall be plotted instead of coefficient value. |
standardize |
logical whether coefficients shall be standardized.
Not used if |
which |
which coefficients: |
mstop |
Stopping iteration for which a vertical line is plotted.
Possible choices are |
coef.label |
logical whether paths shall be labeled. |
col |
Color(s) for the paths. If |
... |
further arguments passed to |
See Also
Predictions for Fitted crch Models
Description
Obtains various types of predictions for crch
models.
Usage
## S3 method for class 'crch'
predict(object, newdata = NULL, type = c("location", "scale",
"response", "parameter", "density", "probability", "quantile", "crps"),
na.action = na.pass, at = 0.5, left = NULL, right = NULL, ...)
## S3 method for class 'crch'
prodist(object, newdata = NULL, na.action = na.pass,
left = NULL, right = NULL, ...)
Arguments
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction: |
na.action |
a function which indicates what should happen when the data
contain |
at |
a vector of values to evaluate the predictive density ( |
left |
left censoring or truncation point. Only used for |
right |
right censoring or truncation point. Only used for |
... |
further arguments passed to or from other methods. |
Value
The predict
method, for type "response"
, "location"
, or "scale"
,
returns a vector with either the location or the scale of the predicted distribution.
For type "quantile"
a matrix of predicted quantiles each column
corresponding to an element of at
.
The prodist
method returns the fitted/predict probability distribution object.
See Also
Predictions for Boosted crch Models
Description
Obtains various types of predictions for crch.boost
models.
Usage
## S3 method for class 'crch.boost'
predict(object, newdata = NULL, mstop = NULL, ...)
Arguments
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
mstop |
stopping iteration. Can be either a character ( |
... |
further arguments passed to or from other methods. |
Value
For type "response"
, "location"
, or "scale"
a vector with
either the location or the scale of the predicted distribution.
For type "quantile"
a matrix of predicted quantiles each column
corresponding to an element of at
.
See Also
Predictions for Fitted hxlr Models
Description
Obtains various types of predictions/fitted values for heteroscedastic extended logistic regression (HXLR) models.
Usage
## S3 method for class 'hxlr'
predict(object, newdata = NULL, type = c("class", "probability",
"cumprob", "location", "scale"), thresholds = object$thresholds,
na.action = na.pass, ...)
## S3 method for class 'hxlr'
fitted(object, type = c("class", "probability",
"cumprob", "location", "scale"), ...)
Arguments
object |
an object of class |
newdata |
an optional data frame in which to look for variables with which to predict. |
type |
type of prediction: |
thresholds |
optional thresholds used for defining the thresholds for
types |
na.action |
A function which indicates what should happen when the data
contain |
... |
further arguments passed to or from other methods. |
Value
For type "prob"
a matrix with number of intervals (= number of
thresholds + 1) columns is produced. Each row corresponds to a row in newdata
and contains the predicted probabilities to fall in the corresponding interval.
For type "cumprob"
a matrix with number of thresholds columns is
produced. Each row corresponds to a row in newdata
and contains the
predicted probabilities to fall below the corresponding threshold.
For types "class"
, "location"
, and "scale"
a vector is
returned respectively with either the most probable categories ("class"
)
or the location ("location"
) or scale (scale
) of the latent
distribution.
See Also
Precipitation Observations and Forecasts for Innsbruck
Description
Accumulated 5-8 days precipitation amount for Innsbruck. Data includes GEFS reforecasts (Hamill et al. 2013) and observations from SYNOP station Innsbruck Airport (11120) from 2000-01-01 to 2013-09-17.
Usage
data("RainIbk", package = "crch")
Format
A data frame with 4977 rows. The first column (rain
) are 3 days
accumulated precipitation amount observations, Columns 2-12 (rainfc
)
are 5-8 days accumulated precipitation amount forecasts from the individual
ensemble members.
Source
Observations: https://www.ogimet.com/synops.phtml.en
Reforecasts: https://psl.noaa.gov/forecasts/reforecast2/
References
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau Jr TJ, Zhu Y, Lapenta W (2013). NOAA's Second-Generation Global Medium-Range Ensemble Reforecast Data Set. Bulletin of the American Meteorological Society, 94(10), 1553-1565.
Examples
## Spread skill relationship ##
## load and prepare data
data("RainIbk", package = "crch")
## mean and standard deviation of square root transformed ensemble forecasts
RainIbk$sqrtensmean <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean)
RainIbk$sqrtenssd <-
apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd)
## quintiles of sqrtenssd
sdcat <- cut(RainIbk$sqrtenssd, c(-Inf, quantile(RainIbk$sqrtenssd,
seq(0.2,0.8,0.2)), Inf), labels = c(1:5))
## mean forecast errors for each quintile
m <- NULL
for(i in levels(sdcat)) {
m <- c(m, mean((sqrt(RainIbk$rain)[sdcat == i] -
RainIbk$sqrtensmean[sdcat == i])^2, na.rm = TRUE))
}
## plot
boxplot((sqrt(rain) - sqrtensmean)^2~sdcat, RainIbk,
xlab = "Quintile of ensemble standard deviation",
ylab = "mean squared error", main = "Spread skill relationship")
The Truncated Logistic Distribution
Description
Density, distribution function, quantile function, and random generation for the left and/or right truncated logistic distribution.
Usage
dtlogis(x, location = 0, scale = 1, left = -Inf, right = Inf, log = FALSE)
ptlogis(q, location = 0, scale = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qtlogis(p, location = 0, scale = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rtlogis(n, location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
left |
left truncation point. |
right |
right truncation point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated logistic distribution has density
f(x) = 1/\sigma \lambda((x - \mu)/\sigma) / (\Lambda((right - \mu)/\sigma) - \Lambda((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise.
\Lambda
and \lambda
are the cumulative distribution function
and probability density function of the standard logistic distribution
respectively, \mu
is the location of the distribution, and \sigma
the scale.
Value
dtlogis
gives the density, ptlogis
gives the distribution
function, qtlogis
gives the quantile function, and rtlogis
generates random deviates.
See Also
The Truncated Normal Distribution
Description
Density, distribution function, quantile function, and random generation for the left and/or right truncated normal distribution.
Usage
dtnorm(x, mean = 0, sd = 1, left = -Inf, right = Inf, log = FALSE)
ptnorm(q, mean = 0, sd = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qtnorm(p, mean = 0, sd = 1, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rtnorm(n, mean = 0, sd = 1, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
mean |
vector of means. |
sd |
vector of standard deviations. |
left |
left censoring point. |
right |
right censoring point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If mean
or sd
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated normal distribution has density
f(x) = 1/\sigma \phi((x - \mu)/\sigma) /
(\Phi((right - \mu)/\sigma) - \Phi((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise.
\Phi
and \phi
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively, \mu
is the mean of the distribution, and \sigma
the standard deviation.
Value
dtnorm
gives the density, ptnorm
gives the distribution
function, qtnorm
gives the quantile function, and rtnorm
generates random deviates.
See Also
Create a Truncated Logistic Distribution
Description
Class and methods for left-, right-, and interval-truncated logistic distributions using the workflow from the distributions3 package.
Usage
TruncatedLogistic(location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
location |
numeric. The location parameter of the underlying untruncated
logistic distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying untruncated logistic distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
Details
The constructor function TruncatedLogistic
sets up a distribution
object, representing the truncated logistic probability distribution by the
corresponding parameters: the latent mean location
= \mu
and
latent standard deviation scale
= \sigma
(i.e., the parameters
of the underlying untruncated logistic variable), the left
truncation
point (with -Inf
corresponding to untruncated), and the
right
truncation point (with Inf
corresponding to untruncated).
The truncated logistic distribution has probability density function (PDF):
f(x) = 1/\sigma \lambda((x - \mu)/\sigma) / (\Lambda((right - \mu)/\sigma) - \Lambda((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise,
where \Lambda
and \lambda
are the cumulative distribution function
and probability density function of the standard logistic distribution,
respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated logistic distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedLogistic
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated logistic distributions in the crch
package, see dtlogis
, and the crps_tlogis
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
Value
A TruncatedLogistic
distribution object.
See Also
dtlogis
, Logistic
, CensoredLogistic
,
TruncatedNormal
, TruncatedStudentsT
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three truncated logistic distributions:
## - untruncated standard logistic
## - left-truncated at zero with latent location = 1 and scale = 1
## - interval-truncated in [0, 5] with latent location = 2 and scale = 1
X <- TruncatedLogistic(
location = c( 0, 1, 2),
scale = c( 1, 1, 1),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean of the truncated distribution
mean(X)
## higher moments (variance, skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "untruncated")
hist(x[2, ], main = "left-truncated at zero")
hist(x[3, ], main = "interval-truncated in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses (except at truncation points)
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
Create a Truncated Normal Distribution
Description
Class and methods for left-, right-, and interval-truncated normal distributions using the workflow from the distributions3 package.
Usage
TruncatedNormal(mu = 0, sigma = 1, left = -Inf, right = Inf)
Arguments
mu |
numeric. The location parameter of the underlying untruncated
normal distribution, typically written |
sigma |
numeric. The scale parameter (standard deviation) of
the underlying untruncated normal distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
Details
The constructor function TruncatedNormal
sets up a distribution
object, representing the truncated normal probability distribution by the
corresponding parameters: the latent mean mu
= \mu
and
latent standard deviation sigma
= \sigma
(i.e., the parameters
of the underlying untruncated normal variable), the left
truncation
point (with -Inf
corresponding to untruncated), and the
right
truncation point (with Inf
corresponding to untruncated).
The truncated normal distribution has probability density function (PDF):
f(x) = 1/\sigma \phi((x - \mu)/\sigma) / (\Phi((right - \mu)/\sigma) - \Phi((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise,
where \Phi
and \phi
are the cumulative distribution function
and probability density function of the standard normal distribution
respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated normal distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedNormal
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated normal distributions in the crch
package, see dtnorm
, and the crps_tnorm
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
Value
A TruncatedNormal
distribution object.
See Also
dtnorm
, Normal
, CensoredNormal
,
TruncatedLogistic
, TruncatedStudentsT
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three truncated normal distributions:
## - untruncated standard normal
## - left-truncated at zero with latent mu = 1 and sigma = 1
## - interval-truncated in [0, 5] with latent mu = 1 and sigma = 2
X <- TruncatedNormal(
mu = c( 0, 1, 1),
sigma = c( 1, 1, 2),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean and variance of the truncated distribution
mean(X)
variance(X)
## higher moments (skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "untruncated")
hist(x[2, ], main = "left-truncated at zero")
hist(x[3, ], main = "interval-truncated in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
Create a Truncated Student's T Distribution
Description
Class and methods for left-, right-, and interval-truncated t distributions using the workflow from the distributions3 package.
Usage
TruncatedStudentsT(df, location = 0, scale = 1, left = -Inf, right = Inf)
Arguments
df |
numeric. The degrees of freedom of the underlying untruncated
t distribution. Can be any positive number, with |
location |
numeric. The location parameter of the underlying untruncated
t distribution, typically written |
scale |
numeric. The scale parameter (standard deviation) of
the underlying untruncated t distribution,
typically written |
left |
numeric. The left truncation point. Can be any real number,
defaults to |
right |
numeric. The right truncation point. Can be any real number,
defaults to |
Details
The constructor function TruncatedStudentsT
sets up a distribution
object, representing the truncated t probability distribution by the
corresponding parameters: the degrees of freedom df
, the latent mean
location
= \mu
and latent scale parameter scale
= \sigma
(i.e., the parameters of the underlying untruncated t variable),
the left
truncation point (with -Inf
corresponding to untruncated),
and the right
truncation point (with Inf
corresponding to untruncated).
The truncated t distribution has probability density function (PDF) f(x)
:
f(x) = 1/\sigma \tau((x - \mu)/\sigma) / (T((right - \mu)/\sigma) - T((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise,
where T
and \tau
are the cumulative distribution function
and probability density function of the standard t distribution with
df
degrees of freedom, respectively.
All parameters can also be vectors, so that it is possible to define a vector of truncated t distributions with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the TruncatedStudentsT
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
crps
for the continuous ranked probability score
(CRPS), and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided for the truncated t distributions in the crch
package, see dtt
, and the crps_tt
function from the scoringRules package.
The methods is_discrete
and is_continuous
can be used to query whether the distributions are discrete on the entire support
(always FALSE
) or continuous on the entire support (always TRUE
).
See the examples below for an illustration of the workflow for the class and methods.
Value
A TruncatedStudentsT
distribution object.
See Also
dtt
, StudentsT
, CensoredStudentsT
,
TruncatedNormal
, TruncatedLogistic
Examples
## package and random seed
library("distributions3")
set.seed(6020)
## three truncated t distributions:
## - untruncated standard t with 5 degrees of freedom
## - left-truncated at zero with 5 df, latent location = 1 and scale = 1
## - interval-truncated in [0, 5] with 5 df, latent location = 2 and scale = 2
X <- TruncatedStudentsT(
df = c( 5, 5, 5),
location = c( 0, 1, 2),
scale = c( 1, 1, 2),
left = c(-Inf, 0, 0),
right = c( Inf, Inf, 5)
)
X
## compute mean of the truncated distribution
mean(X)
## higher moments (variance, skewness, kurtosis) are not implemented yet
## support interval (minimum and maximum)
support(X)
## simulate random variables
random(X, 5)
## histograms of 1,000 simulated observations
x <- random(X, 1000)
hist(x[1, ], main = "untruncated")
hist(x[2, ], main = "left-truncated at zero")
hist(x[3, ], main = "interval-truncated in [0, 5]")
## probability density function (PDF) and log-density (or log-likelihood)
x <- c(0, 0, 1)
pdf(X, x)
pdf(X, x, log = TRUE)
log_pdf(X, x)
## cumulative distribution function (CDF)
cdf(X, x)
## quantiles
quantile(X, 0.5)
## cdf() and quantile() are inverses (except at truncation points)
cdf(X, quantile(X, 0.5))
quantile(X, cdf(X, 1))
## all methods above can either be applied elementwise or for
## all combinations of X and x, if length(X) = length(x),
## also the result can be assured to be a matrix via drop = FALSE
p <- c(0.05, 0.5, 0.95)
quantile(X, p, elementwise = FALSE)
quantile(X, p, elementwise = TRUE)
quantile(X, p, elementwise = TRUE, drop = FALSE)
## compare theoretical and empirical mean from 1,000 simulated observations
cbind(
"theoretical" = mean(X),
"empirical" = rowMeans(random(X, 1000))
)
## evaluate continuous ranked probability score (CRPS) using scoringRules
library("scoringRules")
crps(X, x)
The Truncated Student-t Distribution
Description
Density, distribution function, quantile function, and random generation
for the left and/or right truncated student-t distribution with df
degrees of freedom.
Usage
dtt(x, location = 0, scale = 1, df, left = -Inf, right = Inf, log = FALSE)
ptt(q, location = 0, scale = 1, df, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
qtt(p, location = 0, scale = 1, df, left = -Inf, right = Inf,
lower.tail = TRUE, log.p = FALSE)
rtt(n, location = 0, scale = 1, df, left = -Inf, right = Inf)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter. |
scale |
scale parameter. |
df |
degrees of freedom (> 0, maybe non-integer). |
left |
left censoring point. |
right |
right censoring point. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
Details
If location
or scale
are not specified they assume the default values
of 0
and 1
, respectively. left
and right
have the defaults -Inf
and Inf
respectively.
The truncated student-t distribution has density
f(x) = 1/\sigma \tau((x - \mu)/\sigma) /
(T((right - \mu)/\sigma) - T((left - \mu)/\sigma))
for left \le x \le right
, and 0 otherwise.
where T
and \tau
are the cumulative distribution function
and probability density function of the student-t distribution with
df
degrees of freedom respectively, \mu
is the location of the
distribution, and \sigma
the scale.
Value
dtt
gives the density, ptt
gives the distribution
function, qtt
gives the quantile function, and rtt
generates random deviates.