Type: | Package |
Title: | Individual Conditional Expectation Plot Toolbox |
Version: | 1.1.5 |
Date: | 2022-08-18 |
Author: | Alex Goldstein, Adam Kapelner, Justin Bleich |
Maintainer: | Adam Kapelner <kapelner@qc.cuny.edu> |
Description: | Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman's partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist. |
License: | GPL-2 | GPL-3 |
Depends: | sfsmisc |
Suggests: | randomForest, MASS |
NeedsCompilation: | no |
Packaged: | 2022-08-22 14:00:12 UTC; kapel |
Repository: | CRAN |
Date/Publication: | 2022-08-22 14:20:10 UTC |
Clustering of ICE and d-ICE curves by kmeans.
Description
Clustering if ICE and d-ICE curves by kmeans. All curves are centered to have mean 0 and then kmeans is applied to the curves with the specified number of clusters.
Usage
clusterICE(ice_obj, nClusters, plot = TRUE, plot_margin = 0.05,
colorvec, plot_pdp = FALSE, x_quantile = FALSE,
avg_lwd = 3, centered = FALSE,
plot_legend = FALSE, ...)
Arguments
ice_obj |
Object of class |
nClusters |
Number of clusters to find. |
plot |
If |
plot_margin |
Extra margin to pass to |
colorvec |
Optional vector of colors to use for each cluster. |
plot_pdp |
If |
x_quantile |
If |
avg_lwd |
Average line width to use when plotting the cluster means. Line width is proportional to the cluster's size. |
centered |
If |
plot_legend |
If |
... |
Additional arguments for plotting. |
Value
The ouput of the kmeans
call (a list of class kmeans
).
See Also
ice, dice
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bh_rf = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age",
frac_to_build = .1)
## cluster the curves into 2 groups.
clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE)
## cluster the curves into 3 groups, start all at 0.
clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE)
## End(Not run)
Creates an object of class dice
.
Description
Estimates the partial derivative function for each curve in an ice
object.
See Goldstein et al (2013) for further details.
Usage
dice(ice_obj, DerivEstimator)
Arguments
ice_obj |
Object of class |
DerivEstimator |
Optional function with a single argument |
Value
A list of class dice
with the following elements. Most are passed directly through
from ice_object
and exist to enable various plotting facilities.
d_ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_deriv |
Vector of length |
sd_deriv |
Vector of length |
logodds |
Passed from |
gridpts |
Passed from |
predictor |
Passed from |
xlab |
Passed from |
nominal_axis |
Passed from |
range_y |
Passed from |
Xice |
Passed from |
dpdp |
The estimated partial derivative of the PDP. |
References
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking
Inside the Black Box: Visualizing Statistical Learning With Plots of
Individual Conditional Expectation. (2014) Journal of Computational
and Graphical Statistics, in press
Martin Maechler et al. sfsmisc: Utilities from Seminar fuer Statistik ETH Zurich. R package version 1.0-24.
See Also
plot.dice, print.dice, summary.dice
Examples
## Not run:
# same examples as for 'ice', but now create a derivative estimate as well.
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
######## regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
# make a dice object:
bhd.dice = dice(bhd.ice)
#### classification example
data(Pima.te) #Pima Indians diabetes classification
y = Pima.te$type
X = Pima.te
X$type = NULL
## build a RF:
pima_rf = randomForest(x = X, y = y)
## Create an 'ice' object for the predictor "skin":
# For classification we plot the centered log-odds. If we pass a predict
# function that returns fitted probabilities, setting logodds = TRUE instructs
# the function to set each ice curve to the centered log-odds of the fitted
# probability.
pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE,
predictfcn = function(object, newdata){
predict(object, newdata, type = "prob")[, 2]
}
)
# make a dice object:
pima.dice = dice(pima.ice)
## End(Not run)
Creates an object of class ice
.
Description
Creates an ice
object with individual conditional expectation curves
for the passed model object, X
matrix, predictor, and response. See
Goldstein et al (2013) for further details.
Usage
ice(object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1,
indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, ...)
Arguments
object |
The fitted model to estimate ICE curves for. |
X |
The design matrix we wish to estimate ICE curves for. Rows are observations, columns are
predictors. Typically this is taken to be |
y |
Optional vector of the response values |
predictor |
The column number or variable name in |
predictfcn |
Optional function that accepts two arguments, |
verbose |
If |
frac_to_build |
Number between 0 and 1, with 1 as default. For large |
indices_to_build |
Vector of indices, |
num_grid_pts |
Optional number of values in the range of |
logodds |
If |
probit |
If |
... |
Other arguments to be passed to |
Value
A list of class ice
with the following elements.
gridpts |
Sorted values of |
ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_predictions |
Vector of length |
xlab |
String with the predictor name corresponding to |
nominal_axis |
If |
range_y |
If |
sd_y |
If |
Xice |
A matrix containing the subset of |
pdp |
A vector of size |
predictor |
Same as the argument, see argument description. |
logodds |
Same as the argument, see argument description. |
indices_to_build |
Same as the argument, see argument description. |
frac_to_build |
Same as the argument, see argument description. |
predictfcn |
Same as the argument, see argument description. |
References
Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press
See Also
plot.ice, print.ice, summary.ice
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
######## regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
#### classification example
data(Pima.te) #Pima Indians diabetes classification
y = Pima.te$type
X = Pima.te
X$type = NULL
## build a RF:
pima_rf_mod = randomForest(x = X, y = y)
## Create an 'ice' object for the predictor "skin":
# For classification we plot the centered log-odds. If we pass a predict
# function that returns fitted probabilities, setting logodds = TRUE instructs
# the function to set each ice curve to the centered log-odds of the fitted
# probability.
pima.ice = ice(object = pima_rf_mod, X = X, predictor = "skin", logodds = TRUE,
predictfcn = function(object, newdata){
predict(object, newdata, type = "prob")[, 2]
}
)
## End(Not run)
Create a plot of a dice
object.
Description
Plotting of dice
objects.
Usage
## S3 method for class 'dice'
plot(x, plot_margin = 0.05, frac_to_plot = 1,
plot_sd = TRUE, plot_orig_pts_deriv = TRUE, pts_preds_size = 1.5,
colorvec, color_by = NULL, x_quantile = TRUE, plot_dpdp = TRUE,
rug_quantile = seq(from = 0, to = 1, by = 0.1), ...)
Arguments
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_sd |
If |
plot_orig_pts_deriv |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name (or column number) in |
x_quantile |
If |
plot_dpdp |
If |
rug_quantile |
If not null, tick marks are drawn on the x-axis corresponding to the vector of quantiles specified by this parameter.
Forced to |
... |
Additional plotting arguments. |
Value
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
See Also
dice
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
# estimate derivatives, then plot.
bhd.dice = dice(bhd.ice)
plot(bhd.dice)
## End(Not run)
Plotting of ice
objects.
Description
Plotting of ice
objects.
Usage
## S3 method for class 'ice'
plot(x, plot_margin = 0.05, frac_to_plot = 1,
plot_points_indices = NULL, plot_orig_pts_preds = TRUE,
pts_preds_size = 1.5, colorvec, color_by = NULL,
x_quantile = TRUE, plot_pdp = TRUE,
centered = FALSE, prop_range_y = TRUE,
rug_quantile = seq(from = 0, to = 1, by = 0.1),
centered_percentile = 0,
point_labels = NULL, point_labels_size = NULL,
prop_type,...)
Arguments
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_points_indices |
If not |
plot_orig_pts_preds |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name in |
x_quantile |
If |
plot_pdp |
If |
centered |
If |
prop_range_y |
When |
centered_percentile |
The percentile of |
point_labels |
If not |
point_labels_size |
If not |
rug_quantile |
If not |
prop_type |
Scaling factor for the right vertical axis in centered plots if |
... |
Other arguments to be passed to the |
Value
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
See Also
ice
Examples
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age",
frac_to_build = .1)
## plot
plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1)
## centered plot
plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1,
centered = TRUE)
## color the curves by high and low values of 'rm'.
# First create an indicator variable which is 1 if the number of
# rooms is greater than the median:
median_rm = median(X$rm)
bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0)
plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE,
x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm")
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age",
frac_to_build = 1)
plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE,
x_quantile = T, plot_orig_pts_preds = T, color_by = y)
## End(Not run)
Print method for dice
objects.
Description
Prints a summary of a dice
object.
Usage
## S3 method for class 'dice'
print(x, ...)
Arguments
x |
Object of class |
... |
Ignored for now. |
Print method for ice
objects.
Description
Prints a summary of an ice
object.
Usage
## S3 method for class 'ice'
print(x, ...)
Arguments
x |
Object of class |
... |
Ignored for now. |
Summary function for dice
objects.
Description
Alias of print
method.
Usage
## S3 method for class 'dice'
summary(object, ...)
Arguments
object |
Object of class |
... |
Ignored for now. |
Summary function for ice
objects.
Description
Alias of print
method.
Usage
## S3 method for class 'ice'
summary(object, ...)
Arguments
object |
Object of class |
... |
Ignored for now. |
Data concerning white wine.
Description
The WhiteWine data frame has 4898 rows and 12 columns and concerns white wines from a region in Portugal. The response variable, quality, is a wine quality metric, taken to be the median preference score of three blind tasters on a scale of 1-10. The 11 covariates are physicochemical metrics of wine quality such as citric acid content, sulphates, etc.
Usage
data(WhiteWine)
Format
A data frame of 4898 cases on 12 variables.
Source
K Bache and M Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml