class: center, middle # Basic usage of R - 4 ## Introductory Computer Programming ### Deepayan Sarkar
--- layout: true # Class, generic functions, and methods --- * The class of an object is fundamental to how R works * Every object in R must have a class * This is true even if the object does not have a class attribute ```r class(colnames) ``` ``` [1] "function" ``` ```r attr(colnames, "class") ``` ``` NULL ``` ```r class(VADeaths) ``` ``` [1] "matrix" "array" ``` ```r attr(VADeaths, "class") ``` ``` NULL ``` --- * The main use of the class of an object is in how _generic functions_ behave * Generic functions are intended to perform general tasks, like - `print()` - `plot()` - `summary()` * But details of what these functions should do depends on the input --- ```r print(VADeaths) ``` ``` Rural Male Rural Female Urban Male Urban Female 50-54 11.7 8.7 15.4 8.4 55-59 18.1 11.7 24.3 13.6 60-64 26.9 20.3 37.0 19.3 65-69 41.0 30.9 54.6 35.1 70-74 66.0 54.3 71.1 50.0 ``` ```r fm1 <- lm(mpg ~ wt, mtcars, subset = (am == 0)) print(fm1) ``` ``` Call: lm(formula = mpg ~ wt, data = mtcars, subset = (am == 0)) Coefficients: (Intercept) wt 31.416 -3.786 ``` --- ```r summary(VADeaths) ``` ``` Rural Male Rural Female Urban Male Urban Female Min. :11.70 Min. : 8.70 Min. :15.40 Min. : 8.40 1st Qu.:18.10 1st Qu.:11.70 1st Qu.:24.30 1st Qu.:13.60 Median :26.90 Median :20.30 Median :37.00 Median :19.30 Mean :32.74 Mean :25.18 Mean :40.48 Mean :25.28 3rd Qu.:41.00 3rd Qu.:30.90 3rd Qu.:54.60 3rd Qu.:35.10 Max. :66.00 Max. :54.30 Max. :71.10 Max. :50.00 ``` --- ```r summary(fm1) ``` ``` Call: lm(formula = mpg ~ wt, data = mtcars, subset = (am == 0)) Residuals: Min 1Q Median 3Q Max -3.6004 -1.5227 -0.2168 1.4816 5.0610 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.4161 2.9467 10.661 6.01e-09 *** wt -3.7859 0.7666 -4.939 0.000125 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.528 on 17 degrees of freedom Multiple R-squared: 0.5893, Adjusted R-squared: 0.5651 F-statistic: 24.39 on 1 and 17 DF, p-value: 0.0001246 ``` --- * But suppose we make a copy of `fm1` and remove the class attribute from it ```r fm2 <- fm1 class(fm2) <- NULL class(fm1) ``` ``` [1] "lm" ``` ```r class(fm2) ``` ``` [1] "list" ``` * `fm1` and `fm2` represent the same model fit * But the different class means that `print()` and `summary()` behave differently --- ```r summary(fm2) ``` ``` Length Class Mode coefficients 2 -none- numeric residuals 19 -none- numeric effects 19 -none- numeric rank 1 -none- numeric fitted.values 19 -none- numeric assign 2 -none- numeric qr 5 qr list df.residual 1 -none- numeric xlevels 0 -none- list call 4 -none- call terms 3 terms call model 2 data.frame list ``` --- .scrollable400[ ```r print(fm2) ``` ``` $coefficients (Intercept) wt 31.416055 -3.785908 $residuals Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D 2.155637322 0.307466517 -0.216815332 -3.600365503 5.060989634 Merc 230 Merc 280 Merc 280C Merc 450SE Merc 450SL 3.309553333 0.807466517 -0.592533483 0.392588263 0.005379702 Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Toyota Corona -1.905324922 -1.140040848 -0.481292938 3.519620367 -0.583793327 Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird -2.589660880 -3.211463020 -3.578170470 2.340759068 $effects (Intercept) wt -74.74364610 -12.48679188 -0.61354033 -4.00004944 4.67152748 2.92116716 0.41127950 -0.98872050 -0.02054539 -0.39860815 -2.31065774 -1.58491583 -0.93084842 3.07218994 -0.95375339 -2.98799985 -3.60751554 -3.98511725 1.93367779 $rank [1] 2 $fitted.values Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D 19.24436 18.39253 18.31682 17.90037 19.33901 Merc 230 Merc 280 Merc 280C Merc 450SE Merc 450SL 19.49045 18.39253 18.39253 16.00741 17.29462 Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Toyota Corona 17.10532 11.54004 10.88129 11.18038 22.08379 Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird 18.08966 18.41146 16.87817 16.85924 $assign [1] 0 1 $qr $qr (Intercept) wt Hornet 4 Drive -4.3588989 -16.42823129 Hornet Sportabout 0.2294157 3.29822949 Valiant 0.2294157 0.06231675 Duster 360 0.2294157 0.02896552 Merc 240D 0.2294157 0.14417885 Merc 230 0.2294157 0.15630657 Merc 280 0.2294157 0.06838061 Merc 280C 0.2294157 0.06838061 Merc 450SE 0.2294157 -0.12263097 Merc 450SL 0.2294157 -0.01954535 Merc 450SLC 0.2294157 -0.03470500 Cadillac Fleetwood 0.2294157 -0.48039867 Lincoln Continental 0.2294157 -0.53315425 Chrysler Imperial 0.2294157 -0.50920200 Toyota Corona 0.2294157 0.36399375 Dodge Challenger 0.2294157 0.04412517 AMC Javelin 0.2294157 0.06989657 Camaro Z28 0.2294157 -0.05289658 Pontiac Firebird 0.2294157 -0.05441255 attr(,"assign") [1] 0 1 $qraux [1] 1.229416 1.068381 $pivot [1] 1 2 $tol [1] 1e-07 $rank [1] 2 attr(,"class") [1] "qr" $df.residual [1] 17 $xlevels named list() $call lm(formula = mpg ~ wt, data = mtcars, subset = (am == 0)) $terms mpg ~ wt attr(,"variables") list(mpg, wt) attr(,"factors") wt mpg 0 wt 1 attr(,"term.labels") [1] "wt" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment")
attr(,"predvars") list(mpg, wt) attr(,"dataClasses") mpg wt "numeric" "numeric" $model mpg wt Hornet 4 Drive 21.4 3.215 Hornet Sportabout 18.7 3.440 Valiant 18.1 3.460 Duster 360 14.3 3.570 Merc 240D 24.4 3.190 Merc 230 22.8 3.150 Merc 280 19.2 3.440 Merc 280C 17.8 3.440 Merc 450SE 16.4 4.070 Merc 450SL 17.3 3.730 Merc 450SLC 15.2 3.780 Cadillac Fleetwood 10.4 5.250 Lincoln Continental 10.4 5.424 Chrysler Imperial 14.7 5.345 Toyota Corona 21.5 2.465 Dodge Challenger 15.5 3.520 AMC Javelin 15.2 3.435 Camaro Z28 13.3 3.840 Pontiac Firebird 19.2 3.845 ``` ] * This kind of customized output is achieved by _methods_ --- * Methods are specific implementations of a generic function customized to its input * The appropriate method is chosen by looking at the _class_ of the input argument -- * The methods available for a generic function can be obtained using the `methods()` function ```r methods("summary") ``` ``` [1] summary.aov summary.aovlist* summary.aspell* [4] summary.check_packages_in_dir* summary.connection summary.data.frame [7] summary.Date summary.default summary.ecdf* [10] summary.factor summary.glm summary.infl* [13] summary.lm summary.loess* summary.manova [16] summary.matrix summary.mlm* summary.nls* [19] summary.packageStatus* summary.POSIXct summary.POSIXlt [22] summary.ppr* summary.prcomp* summary.princomp* [25] summary.proc_time summary.srcfile summary.srcref [28] summary.stepfun summary.stl* summary.table [31] summary.tukeysmooth* summary.warnings see '?methods' for accessing help and source code ``` ```r methods("print") # similar but much longer list ``` --- * All available methods for a given class can be similarly obtained ```r methods(class = "lm") ``` ``` [1] add1 alias anova case.names coerce confint [7] cooks.distance deviance dfbeta dfbetas drop1 dummy.coef [13] effects extractAIC family formula hatvalues influence [19] initialize kappa labels logLik model.frame model.matrix [25] nobs plot predict print proj qr [31] residuals rstandard rstudent show simulate slotsFromS3 [37] summary variable.names vcov see '?methods' for accessing help and source code ``` --- * The name of a specific method appears to have the form `generic.class` * However, one should always call the generic function, not the method directly * This is not OK: ```r summary.lm(fm1) ``` * Instead, use ```r summary(fm1) ``` * In fact, many methods cannot be called directly because they are "hidden" --- layout: false # Getting help * R has an extensive collection of functions (even more if we include add-on packages) * It is impossible for anyone to know them all, or remember details * Fortunately, R also has an excellent help system -- * Every function and dataset in R (and add-on packages) must be documented * The documentation can be accessed by `help(name)` or `?name` * For example: `help(seq)`, `help(summary)`, etc. * A more general (but limited) search can be performed using `help.search("search-string")` --- # Getting help * How the help is shown depends on the _interface_ being used * RStudio has a separate help tab (which also allows searching) -- * However, before using the help system, you should know a bit about how methods are documented --- # Help for generic functions and methods * Generic functions and methods are distinct functions * They often have different help pages * In fact, many add-on packages define new methods for generics in another package * These are always documented in a separate help page --- # Help for generic functions and methods * To get help for the generic function `summary()`, type `help(summary)` * To get help for the `summary()` method for "matrix" objects, type `help(summary.matrix)` * To get help for the `summary()` method for "lm" objects, type `help(summary.lm)` * The first two happen to be the same help page, but the third is different -- * This is slightly confusing because you are __not__ supposed to call `summary.lm()` directly * More importantly, there may not actually be a `summary()` method for all classes * For example, "list" objects are handled by a _fallback_ method `summary.default()` -- * The list of available methods are obtained by `methods("summary")` as shown earlier * All these should have a corresponding help page --- # Help for generic functions and methods - The system we described is called "S3" (short for "S version 3") - The documentation refers to specific methods implemented using this system as "S3 methods" - To make things more complicated, there is a different system of defining classes and methods - We will skip the details of this system (called "S4") for now --- class: center middle # Questions?