Version: | 2.2-3 |
Date: | 2019-05-04 |
Title: | Sparse Partial Least Squares (SPLS) Regression and Classification |
Author: | Dongjun Chung <chungdon@stat.wisc.edu>, Hyonho Chun <chun@stat.wisc.edu>, Sunduz Keles <keles@stat.wisc.edu> |
Maintainer: | Valentin Todorov <valentin.todorov@chello.at> |
Depends: | R (≥ 2.14) |
Imports: | MASS, nnet, parallel, pls |
Description: | Provides functions for fitting a sparse partial least squares (SPLS) regression and classification (Chun and Keles (2010) <doi:10.1111/j.1467-9868.2009.00723.x>). |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
NeedsCompilation: | no |
Packaged: | 2019-05-04 21:53:24 UTC; Share |
Repository: | CRAN |
Date/Publication: | 2019-05-04 23:10:03 UTC |
Calculate bootstrapped confidence intervals of SPLS coefficients
Description
Calculate bootstrapped confidence intervals of coefficients of the selected predictors and generate confidence interval plots.
Usage
ci.spls( object, coverage=0.95, B=1000,
plot.it=FALSE, plot.fix="y",
plot.var=NA, K=object$K, fit=object$fit )
Arguments
object |
A fitted SPLS object. |
coverage |
Coverage of confidence intervals.
|
B |
Number of bootstrap iterations. Default is 1000. |
plot.it |
Plot confidence intervals of coefficients? |
plot.fix |
If |
plot.var |
Index vector of responses (if |
K |
Number of hidden components.
Default is to use the same |
fit |
PLS algorithm for model fitting. Alternatives are
|
Value
Invisibly returns a list with components:
cibeta |
A list with as many matrix elements as the number of responses. Each matrix element is p by 2, where i-th row of the matrix lists the upper and lower bounds of the bootstrapped confidence interval of the i-th predictor. |
betahat |
Matrix of original coefficients of the SPLS fit. |
lbmat |
Matrix of lower bounds of confidence intervals (for internal use). |
ubmat |
Matrix of upper bounds of confidence intervals (for internal use). |
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
correct.spls
and spls
.
Examples
data(mice)
# SPLS with eta=0.6 & 1 hidden components
f <- spls( mice$x, mice$y, K=1, eta=0.6 )
# Calculate confidence intervals of coefficients
ci.f <- ci.spls( f, plot.it=TRUE, plot.fix="x", plot.var=20 )
# Bootstrapped confidence intervals
cis <- ci.f$cibeta
cis[[20]] # equivalent, 'cis$1422478_a_at'
Plot estimated coefficients of the SPLS object
Description
Plot estimated coefficients of the selected predictors in the SPLS object.
Usage
coefplot.spls( object, nwin=c(2,2),
xvar=c(1:length(object$A)), ylimit=NA )
Arguments
object |
A fitted SPLS object. |
nwin |
Vector of the number of rows and columns in a plotting area. Default is two rows and two columns, i.e., four plots. |
xvar |
Index of variables to be plotted among the set of the selected predictors. Default is to plot the coefficients of all the selected predictors. |
ylimit |
Range of the y axis (the coefficients) in the plot.
If |
Details
This plot is useful for visualizing coefficient estimates of a variable for different responses. Hence, the function is applicable only with multivariate response SPLS.
Value
NULL.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
ci.spls
, and correct.spls
and
plot.spls
.
Examples
data(yeast)
# SPLS with eta=0.7 & 8 hidden components
f <- spls( yeast$x, yeast$y, K=8, eta=0.7 )
# Draw estimated coefficient plot of the first four variables
# among the selected predictors
coefplot.spls( f, xvar=c(1:4), nwin=c(2,2) )
Correct the initial SPLS coefficient estimates based on bootstrapped confidence intervals
Description
Correct initial SPLS coefficient estimates of the selected predictors based on bootstrapped confidence intervals and draw heatmap of original and corrected coefficient estimates.
Usage
correct.spls( object, plot.it=TRUE )
Arguments
object |
An object obtained from the function |
plot.it |
Draw the heatmap of original coefficient estimates and corrected coefficient estimates? |
Details
The set of the selected variables is updated by setting the coefficients with zero-containing confidence intervals to zero.
Value
Invisibly returns a matrix of corrected coefficient estimates.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
Examples
data(mice)
# SPLS with eta=0.6 & 1 latent components
f <- spls( mice$x, mice$y, K=1, eta=0.6 )
# Calculate confidence intervals of coefficients
ci.f <- ci.spls(f)
# Corrected coefficient estimates
cf <- correct.spls( ci.f )
cf[20,1:5]
Compute and plot the cross-validated error for SGPLS classification
Description
Draw heatmap of v-fold cross-validated misclassification rates and return optimal eta (thresholding parameter) and K (number of hidden components).
Usage
cv.sgpls( x, y, fold=10, K, eta, scale.x=TRUE, plot.it=TRUE,
br=TRUE, ftype='iden', n.core=8 )
Arguments
x |
Matrix of predictors. |
y |
Vector of class indices. |
fold |
Number of cross-validation folds. Default is 10-folds. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
plot.it |
Draw the heatmap of cross-validated misclassification rates? |
br |
Apply Firth's bias reduction procedure? |
ftype |
Type of Firth's bias reduction procedure.
Alternatives are |
n.core |
Number of CPUs to be used when parallel computing is utilized. |
Details
Parallel computing can be utilized for faster computation.
Users can change the number of CPUs to be used
by changing the argument n.core
.
Value
Invisibly returns a list with components:
err.mat |
Matrix of cross-validated misclassification rates.
Rows correspond to |
eta.opt |
Optimal |
K.opt |
Optimal |
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
print.sgpls
, predict.sgpls
,
and coef.sgpls
.
Examples
data(prostate)
set.seed(1)
# misclassification rate plot. eta is searched between 0.1 and 0.9 and
# number of hidden components is searched between 1 and 5
## Not run:
cv <- cv.sgpls(prostate$x, prostate$y, K = c(1:5), eta = seq(0.1,0.9,0.1),
scale.x=FALSE, fold=5)
## End(Not run)
(sgpls(prostate$x, prostate$y, eta=cv$eta.opt, K=cv$K.opt, scale.x=FALSE))
Compute and plot cross-validated mean squared prediction error for SPLS regression
Description
Draw heatmap of v-fold cross-validated mean squared prediction error and return optimal eta (thresholding parameter) and K (number of hidden components).
Usage
cv.spls( x, y, fold=10, K, eta, kappa=0.5,
select="pls2", fit="simpls",
scale.x=TRUE, scale.y=FALSE, plot.it=TRUE )
Arguments
x |
Matrix of predictors. |
y |
Vector or matrix of responses. |
fold |
Number of cross-validation folds. Default is 10-folds. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
kappa |
Parameter to control the effect of
the concavity of the objective function
and the closeness of original and surrogate direction vectors.
|
select |
PLS algorithm for variable selection.
Alternatives are |
fit |
PLS algorithm for model fitting. Alternatives are
|
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
scale.y |
Scale responses by dividing each response variable by its sample standard deviation? |
plot.it |
Draw heatmap of cross-validated mean squared prediction error? |
Value
Invisibly returns a list with components:
mspemat |
Matrix of cross-validated mean squared prediction error.
Rows correspond to |
eta.opt |
Optimal |
K.opt |
Optimal |
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
print.spls
, plot.spls
, predict.spls
,
and coef.spls
.
Examples
data(yeast)
set.seed(1)
# MSPE plot. eta is searched between 0.1 and 0.9 and
# number of hidden components is searched between 1 and 10
## Not run:
cv <- cv.spls(yeast$x, yeast$y, K = c(1:10), eta = seq(0.1,0.9,0.1))
# Optimal eta and K
cv$eta.opt
cv$K.opt
(spls(yeast$x, yeast$y, eta=cv$eta.opt, K=cv$K.opt))
## End(Not run)
Compute and plot cross-validated error for SPLSDA classification
Description
Draw heatmap of v-fold cross-validated misclassification rates and return optimal eta (thresholding parameter) and K (number of hidden components).
Usage
cv.splsda( x, y, fold=10, K, eta, kappa=0.5,
classifier=c('lda','logistic'), scale.x=TRUE, plot.it=TRUE, n.core=8 )
Arguments
x |
Matrix of predictors. |
y |
Vector of class indices. |
fold |
Number of cross-validation folds. Default is 10-folds. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
kappa |
Parameter to control the effect of
the concavity of the objective function
and the closeness of original and surrogate direction vectors.
|
classifier |
Classifier used in the second step of SPLSDA.
Alternatives are |
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
plot.it |
Draw the heatmap of the cross-validated misclassification rates? |
n.core |
Number of CPUs to be used when parallel computing is utilized. |
Details
Parallel computing can be utilized for faster computation.
Users can change the number of CPUs to be used
by changing the argument n.core
.
Value
Invisibly returns a list with components:
err.mat |
Matrix of cross-validated misclassification rates.
Rows correspond to |
eta.opt |
Optimal |
K.opt |
Optimal |
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
print.splsda
, predict.splsda
,
and coef.splsda
.
Examples
data(prostate)
set.seed(1)
# misclassification rate plot. eta is searched between 0.1 and 0.9 and
# number of hidden components is searched between 1 and 5
## Not run: cv <- cv.splsda( prostate$x, prostate$y, K = c(1:5), eta = seq(0.1,0.9,0.1),
scale.x=FALSE, fold=5 )
## End(Not run)
(splsda( prostate$x, prostate$y, eta=cv$eta.opt, K=cv$K.opt, scale.x=FALSE ))
Lymphoma Gene Expression Dataset
Description
This is the Lymphoma Gene Expression dataset used in Chung and Keles (2010).
Usage
data(lymphoma)
Format
A list with two components:
- x
Gene expression data. A matrix with 62 rows and 4026 columns.
- y
Class index. A vector with 62 elements.
Details
The lymphoma dataset consists of 42 samples of diffuse large B-cell lymphoma (DLBCL),
9 samples of follicular lymphoma (FL),
and 11 samples of chronic lymphocytic leukemia (CLL).
DBLCL, FL, and CLL classes are coded in 0, 1, and 2, respectively, in y
vector.
Matrix x
is gene expression data and
arrays were normalized, imputed, log transformed, and standardized
to zero mean and unit variance across genes as described
in Dettling (2004) and Dettling and Beuhlmann (2002).
See Chung and Keles (2010) for more details.
Source
Alizadeh A, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, and Staudt LM (2000), "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling", Nature, Vol. 403, pp. 503–511.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Dettling M (2004), "BagBoosting for tumor classification with gene expression data", Bioinformatics, Vol. 20, pp. 3583–3593.
Dettling M and Beuhlmann P (2002), "Supervised clustering of genes", Genome Biology, Vol. 3, pp. research0069.1–0069.15.
Examples
data(lymphoma)
lymphoma$x[1:5,1:5]
lymphoma$y
Mice Dataset
Description
This is the Mice dataset used in Chun and Keles (2010).
Usage
data(mice)
Format
A list with two components:
- x
Marker map data. A matrix with 60 rows and 145 columns.
- y
Gene expression data. A matrix with 60 rows and 83 columns.
Details
The Mice dataset was published by Lan et al. (2006). Matrix x
is
the marker map consisting of 145 microsatellite markers from 19 non-sex mouse chromosomes.
Matrix y
is gene expression measurements of the 83 transcripts
from liver tissues of 60 mice. This group of the 83 transcripts is one of the clusters
analyzed by Chun and Keles (2010). See Chun and Keles (2010) for more details.
Source
Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, Mata CM, Mui E, Flowers MT, Schueler KL, Manly KF, Williams RW, Kendziorski C, and Attie AD (2006), "Combined expression trait correlations and expression quantitative trait locus mapping", PLoS Genetics, Vol. 2, e6.
References
Chun H and Keles S (2009), "Expression quantitative trait loci mapping with multivariate sparse partial least squares regression", Genetics, Vol. 182, pp. 79–90.
Examples
data(mice)
mice$x[1:5,1:5]
mice$y[1:5,1:5]
Plot the coefficient path of SPLS regression
Description
Provide the coefficient path plot of SPLS regression as a function of the number of hidden components (K) when eta is fixed.
Usage
## S3 method for class 'spls'
plot( x, yvar=c(1:ncol(x$y)), ... )
Arguments
x |
A fitted SPLS object. |
yvar |
Index vector of responses to be plotted. |
... |
Other parameters to be passed through to generic |
Details
plot.spls
provides the coefficient path plot of SPLS fits.
The plot shows how estimated coefficients change
as a function of the number of hidden components (K
),
when eta
is fixed at the value used by the original SPLS fit.
Value
NULL.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
print.spls
, predict.spls
,
and coef.spls
.
Examples
data(yeast)
# SPLS with eta=0.7 & 8 hidden components
f <- spls( yeast$x, yeast$y, K=8, eta=0.7 )
# Draw coefficient path plots for the first two responses
plot( f, yvar=c(1:2) )
Make predictions or extract coefficients from a fitted SGPLS model
Description
Make predictions or extract coefficients from a fitted SGPLS object.
Usage
## S3 method for class 'sgpls'
predict( object, newx, type = c("fit","coefficient"),
fit.type = c("class","response"), ... )
## S3 method for class 'sgpls'
coef( object, ... )
Arguments
object |
A fitted SGPLS object. |
newx |
If |
type |
If |
fit.type |
If |
... |
Any arguments for |
Details
Users can input either only selected variables or all variables for newx
.
Value
Matrix of coefficient estimates if type="coefficient"
.
Matrix of predicted responses if type="fit"
(responses will be predicted classes if fit.type="class"
or predicted probabilities if fit.type="response"
).
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
Examples
data(prostate)
# SGPLS with eta=0.55 & 3 hidden components
f <- sgpls( prostate$x, prostate$y, K=3, eta=0.55, scale.x=FALSE )
# Print out coefficients
coef.f <- coef(f)
coef.f[ coef.f!=0, ]
# Prediction on the training dataset
(pred.f <- predict( f, type="fit" ))
Make predictions or extract coefficients from a fitted SPLS model
Description
Make predictions or extract coefficients from a fitted SPLS object.
Usage
## S3 method for class 'spls'
predict( object, newx, type = c("fit","coefficient"), ... )
## S3 method for class 'spls'
coef( object, ... )
Arguments
object |
A fitted SPLS object. |
newx |
If |
type |
If |
... |
Any arguments for |
Details
Users can input either only selected variables or all variables for newx
.
Value
Matrix of coefficient estimates if type="coefficient"
.
Matrix of predicted responses if type="fit"
.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
plot.spls
and print.spls
.
Examples
data(yeast)
# SPLS with eta=0.7 & 8 latent components
f <- spls( yeast$x, yeast$y, K=8, eta=0.7 )
# Coefficient estimates of the SPLS fit
coef.f <- coef(f)
coef.f[1:5,]
# Prediction on the training dataset
pred.f <- predict( f, type="fit" )
pred.f[1:5,]
Make predictions or extract coefficients from a fitted SPLSDA model
Description
Make predictions or extract coefficients from a fitted SPLSDA object.
Usage
## S3 method for class 'splsda'
predict( object, newx, type = c("fit","coefficient"),
fit.type = c("class","response"), ... )
## S3 method for class 'splsda'
coef( object, ... )
Arguments
object |
A fitted SPLSDA object. |
newx |
If |
type |
If |
fit.type |
If |
... |
Any arguments for |
Details
Users can input either only selected variables or all variables for newx
.
Value
Matrix of coefficient estimates if type="coefficient"
.
Matrix of predicted responses if type="fit"
(responses will be predicted classes if fit.type="class"
or predicted probabilities if fit.type="response"
).
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
Examples
data(prostate)
# SPLSDA with eta=0.8 & 3 hidden components
f <- splsda( prostate$x, prostate$y, K=3, eta=0.8, scale.x=FALSE )
# Print out coefficients
coef.f <- coef(f)
coef.f[ coef.f!=0, ]
# Prediction on the training dataset
(pred.f <- predict( f, type="fit" ))
Print function for a SGPLS object
Description
Print out SGPLS fit, the number and the list of selected predictors.
Usage
## S3 method for class 'sgpls'
print( x, ... )
Arguments
x |
A fitted SGPLS object. |
... |
Additonal arguments for generic |
Value
NULL.
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
predict.sgpls
and coef.sgpls
.
Examples
data(prostate)
# SGPLS with eta=0.55 & 3 hidden components
f <- sgpls( prostate$x, prostate$y, K=3, eta=0.55, scale.x=FALSE )
print(f)
Print function for a SPLS object
Description
Print out SPLS fit, the number and the list of selected predictors.
Usage
## S3 method for class 'spls'
print( x, ... )
Arguments
x |
A fitted SPLS object. |
... |
Additonal arguments for generic |
Value
NULL.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection," Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
plot.spls
, predict.spls
,
and coef.spls
.
Examples
data(yeast)
# SPLS with eta=0.7 & 8 hidden components
f <- spls( yeast$x, yeast$y, K=8, eta=0.7 )
print(f)
Print function for a SPLSDA object
Description
Print out SPLSDA fits, the number and the list of selected predictors.
Usage
## S3 method for class 'splsda'
print( x, ... )
Arguments
x |
A fitted SPLSDA object. |
... |
Additonal arguments for generic |
Value
NULL.
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
See Also
predict.splsda
and coef.splsda
.
Examples
data(prostate)
# SPLSDA with eta=0.8 & 3 hidden components
f <- splsda( prostate$x, prostate$y, K=3, eta=0.8, scale.x=FALSE )
print(f)
Prostate Tumor Gene Expression Dataset
Description
This is the Prostate Tumor Gene Expression dataset used in Chung and Keles (2010).
Usage
data(prostate)
Format
A list with two components:
- x
Gene expression data. A matrix with 102 rows and 6033 columns.
- y
Class index. A vector with 102 elements.
Details
The prostate dataset consists of 52 prostate tumor and 50 normal samples.
Normal and tumor classes are coded in 0 and 1, respectively, in y
vector.
Matrix x
is gene expression data and
arrays were normalized, log transformed, and standardized
to zero mean and unit variance across genes as described
in Dettling (2004) and Dettling and Beuhlmann (2002).
See Chung and Keles (2010) for more details.
Source
Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, DAmico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, and Sellers W (2002), "Gene expression correlates of clinical prostate cancer behavior", Cancer Cell, Vol. 1, pp. 203–209.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Dettling M (2004), "BagBoosting for tumor classification with gene expression data", Bioinformatics, Vol. 20, pp. 3583–3593.
Dettling M and Beuhlmann P (2002), "Supervised clustering of genes", Genome Biology, Vol. 3, pp. research0069.1–0069.15.
Examples
data(prostate)
prostate$x[1:5,1:5]
prostate$y
Fit SGPLS classification models
Description
Fit a SGPLS classification model.
Usage
sgpls( x, y, K, eta, scale.x=TRUE,
eps=1e-5, denom.eps=1e-20, zero.eps=1e-5, maxstep=100,
br=TRUE, ftype='iden' )
Arguments
x |
Matrix of predictors. |
y |
Vector of class indices. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
eps |
An effective zero for change in estimates. Default is 1e-5. |
denom.eps |
An effective zero for denominators. Default is 1e-20. |
zero.eps |
An effective zero for success probabilities. Default is 1e-5. |
maxstep |
Maximum number of Newton-Raphson iterations. Default is 100. |
br |
Apply Firth's bias reduction procedure? |
ftype |
Type of Firth's bias reduction procedure.
Alternatives are |
Details
The SGPLS method is described in detail in Chung and Keles (2010).
SGPLS provides PLS-based classification with variable selection,
by incorporating sparse partial least squares (SPLS) proposed in Chun and Keles (2010)
into a generalized linear model (GLM) framework.
y
is assumed to have numerical values, 0, 1, ..., G,
where G is the number of classes subtracted by one.
Value
A sgpls
object is returned.
print, predict, coef methods use this object.
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
print.sgpls
, predict.sgpls
, and coef.sgpls
.
Examples
data(prostate)
# SGPLS with eta=0.6 & 3 hidden components
(f <- sgpls(prostate$x, prostate$y, K=3, eta=0.6, scale.x=FALSE))
# Print out coefficients
coef.f <- coef(f)
coef.f[coef.f!=0, ]
Fit SPLS regression models
Description
Fit a SPLS regression model.
Usage
spls( x, y, K, eta, kappa=0.5, select="pls2", fit="simpls",
scale.x=TRUE, scale.y=FALSE, eps=1e-4, maxstep=100, trace=FALSE)
Arguments
x |
Matrix of predictors. |
y |
Vector or matrix of responses. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
kappa |
Parameter to control the effect of
the concavity of the objective function
and the closeness of original and surrogate direction vectors.
|
select |
PLS algorithm for variable selection.
Alternatives are |
fit |
PLS algorithm for model fitting. Alternatives are
|
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
scale.y |
Scale responses by dividing each response variable by its sample standard deviation? |
eps |
An effective zero. Default is 1e-4. |
maxstep |
Maximum number of iterations when fitting direction vectors. Default is 100. |
trace |
Print out the progress of variable selection? |
Details
The SPLS method is described in detail in Chun and Keles (2010).
SPLS directly imposes sparsity on the dimension reduction step of PLS
in order to achieve accurate prediction and variable selection simultaneously.
The option select
refers to the PLS algorithm for variable selection.
The option fit
refers to the PLS algorithm for model fitting
and spls
utilizes algorithms offered by the pls package for this purpose.
See help files of the function plsr
in the pls package for more details.
The user should install the pls package before using spls functions.
The choices for select
and fit
are independent.
Value
A spls object is returned. print, plot, predict, coef, ci.spls, coefplot.spls methods use this object.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
print.spls
, plot.spls
, predict.spls
,
coef.spls
, ci.spls
, and coefplot.spls
.
Examples
data(yeast)
# SPLS with eta=0.7 & 8 hidden components
(f <- spls(yeast$x, yeast$y, K=8, eta=0.7))
# Print out coefficients
coef.f <- coef(f)
coef.f[,1]
# Coefficient path plot
plot(f, yvar=1)
dev.new()
# Coefficient plot of selected variables
coefplot.spls(f, xvar=c(1:4))
Internal SPLS functions
Description
Internal SPLS functions.
Usage
heatmap.spls( mat, coln=16, as='n', ... )
spls.dv( Z, eta, kappa, eps, maxstep )
ust( b, eta )
correctp( x, y, eta, K, kappa, select, fit )
cv.split( y, fold )
wpls( x, y, V, K=ncol(x), type="pls1",
center.x=TRUE, scale.x=FALSE )
sgpls.binary( x, y, K, eta, scale.x=TRUE,
eps=1e-5, denom.eps=1e-20, zero.eps=1e-5, maxstep=100,
br=TRUE, ftype='iden' )
sgpls.multi( x, y, K, eta, scale.x=TRUE,
eps=1e-5, denom.eps=1e-20, zero.eps=1e-5, maxstep=100,
br=TRUE, ftype='iden' )
cv.sgpls.binary( x, y, fold=10, K, eta, scale.x=TRUE, plot.it=TRUE,
br=TRUE, ftype='iden', n.core=8 )
cv.sgpls.multi( x, y, fold=10, K, eta, scale.x=TRUE, plot.it=TRUE,
br=TRUE, ftype='iden', n.core=8 )
Details
These are not to be called by the user.
Author(s)
Dongjun Chung, Hyonho Chun, and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
Fit SPLSDA classification models
Description
Fit a SPLSDA classification model.
Usage
splsda( x, y, K, eta, kappa=0.5,
classifier=c('lda','logistic'), scale.x=TRUE, ... )
Arguments
x |
Matrix of predictors. |
y |
Vector of class indices. |
K |
Number of hidden components. |
eta |
Thresholding parameter. |
kappa |
Parameter to control the effect of
the concavity of the objective function
and the closeness of original and surrogate direction vectors.
|
classifier |
Classifier used in the second step of SPLSDA.
Alternatives are |
scale.x |
Scale predictors by dividing each predictor variable by its sample standard deviation? |
... |
Other parameters to be passed through to |
Details
The SPLSDA method is described in detail in Chung and Keles (2010).
SPLSDA provides a two-stage approach for PLS-based classification with variable selection,
by directly imposing sparsity on the dimension reduction step of PLS
using sparse partial least squares (SPLS) proposed in Chun and Keles (2010).
y
is assumed to have numerical values, 0, 1, ..., G,
where G is the number of classes subtracted by one.
The option classifier
refers to the classifier used in the second step of SPLSDA
and splsda
utilizes algorithms offered by MASS and nnet packages
for this purpose.
If classifier="logistic"
, then either logistic regression or multinomial regression is used.
Linear discriminant analysis (LDA) is used if classifier="lda"
.
splsda
also utilizes algorithms offered by the pls package for fitting spls
.
The user should install pls, MASS and nnet packages before using splsda
functions.
Value
A splsda
object is returned.
print, predict, coef methods use this object.
Author(s)
Dongjun Chung and Sunduz Keles.
References
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
See Also
print.splsda
, predict.splsda
, and coef.splsda
.
Examples
data(prostate)
# SPLSDA with eta=0.8 & 3 hidden components
f <- splsda( prostate$x, prostate$y, K=3, eta=0.8, scale.x=FALSE )
print(f)
# Print out coefficients
coef.f <- coef(f)
coef.f[ coef.f!=0, ]
Yeast Cell Cycle Dataset
Description
This is the Yeast Cell Cycle dataset used in Chun and Keles (2010).
Usage
data(yeast)
Format
A list with two components:
- x
ChIP-chip data. A matrix with 542 rows and 106 columns.
- y
Cell cycle gene expression data. A matrix with 542 rows and 18 columns.
Details
Matrix y
is cell cycle gene expression data (Spellman et al., 1998)
of 542 genes from an \alpha
factor based experiment.
Each column corresponds to mRNA levels
measured at every 7 minutes during 119 minutes (a total of 18 measurements).
Matrix x
is the chromatin immunoprecipitation on chip (ChIP-chip) data of
Lee et al. (2002) and it contains the binding information for 106
transcription factors. See Chun and Keles (2010) for more details.
Source
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thomson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, and Young RA (2002), "Transcriptional regulatory networks in Saccharomyces cerevisiae", Science, Vol. 298, pp. 799–804.
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, and Futcher B (1998), "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hydrization", Molecular Biology of the Cell, Vol. 9, pp. 3273–3279.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
Examples
data(yeast)
yeast$x[1:5,1:5]
yeast$y[1:5,1:5]