class: center, middle # Basic usage of R - 2 ## Introductory Computer Programming ### Deepayan Sarkar
--- # Lists * Lists are vectors with arbitrary types of components * May or may not have names * Usual vector indexing by `[` works in the usual way * Individual elements can be extracted either by name (`$`) or by position `x[[i]]` --- # Lists: examples ```r m <- sample(1:12, 30, rep = TRUE) mf <- factor(m, levels = 1:12, labels = month.name) ml <- list(imonth = m, fmonth = mf) str(ml) ``` ``` List of 2 $ imonth: int [1:30] 4 2 9 7 3 4 3 12 8 11 ... $ fmonth: Factor w/ 12 levels "January","February",..: 4 2 9 7 3 4 3 12 8 11 ... ``` -- ```r ml$imonth ``` ``` [1] 4 2 9 7 3 4 3 12 8 11 4 12 11 1 3 12 5 3 6 6 4 3 11 3 11 6 11 9 2 12 ``` -- ```r ml[[2]] ``` ``` [1] April February September July March April March December August [10] November April December November January March December May March [19] June June April March November March November June November [28] September February December 12 Levels: January February March April May June July August September October ... December ``` --- # Lists: examples * Lists are commonly used to return analysis results (more details later) ```r tt <- t.test(rnorm(100)) str(tt, give.attr = FALSE) ``` ``` List of 10 $ statistic : Named num 1.19 $ parameter : Named num 99 $ p.value : num 0.235 $ conf.int : num [1:2] -0.0843 0.3392 $ estimate : Named num 0.127 $ null.value : Named num 0 $ stderr : num 0.107 $ alternative: chr "two.sided" $ method : chr "One Sample t-test" $ data.name : chr "rnorm(100)" ``` ```r tt$p.value ``` ``` [1] 0.235087 ``` --- layout: true # Common structures for statistical data --- * Vectors, matrices / arrays: vectors with dimension ```r VADeaths ``` ``` Rural Male Rural Female Urban Male Urban Female 50-54 11.7 8.7 15.4 8.4 55-59 18.1 11.7 24.3 13.6 60-64 26.9 20.3 37.0 19.3 65-69 41.0 30.9 54.6 35.1 70-74 66.0 54.3 71.1 50.0 ``` ```r dim(VADeaths) ``` ``` [1] 5 4 ``` --- * Indexing works in same way, but in two dimensions ```r VADeaths[1:2, c(2, 3)] ``` ``` Rural Female Urban Male 50-54 8.7 15.4 55-59 11.7 24.3 ``` * Example: Indexing by "empty" index (selects all) and name ```r VADeaths[, c("Rural Male", "Rural Female")] ``` ``` Rural Male Rural Female 50-54 11.7 8.7 55-59 18.1 11.7 60-64 26.9 20.3 65-69 41.0 30.9 70-74 66.0 54.3 ``` --- * Data frames: lists that also behave like a matrix ```r str(mtcars) ``` ``` 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17 18.6 19.4 17 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ... ``` ```r dim(mtcars) ``` ``` [1] 32 11 ``` --- layout: true # Data Frames --- * Rectangular (matrix-like) structure * Each column is (usually) a vector * Different columns can have different types (unlike a matrix) * Every column must have the same length * Every column must have a name -- * Most built-in data sets in R are data frames --- * List-like behaviour: Columns can be extracted like a list ```r mtcars$mpg ``` ``` [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 [20] 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4 ``` * Vector indexing extracts list elements ```r head(mtcars[c(1, 4, 7)]) ``` ``` mpg hp qsec Mazda RX4 21.0 110 16.46 Mazda RX4 Wag 21.0 110 17.02 Datsun 710 22.8 93 18.61 Hornet 4 Drive 21.4 110 19.44 Hornet Sportabout 18.7 175 17.02 Valiant 18.1 105 20.22 ``` --- * Matrix-like behaviour: Two-dimensional indexing ```r mtcars[1:6, c(1, 4, 7)] ``` ``` mpg hp qsec Mazda RX4 21.0 110 16.46 Mazda RX4 Wag 21.0 110 17.02 Datsun 710 22.8 93 18.61 Hornet 4 Drive 21.4 110 19.44 Hornet Sportabout 18.7 175 17.02 Valiant 18.1 105 20.22 ``` ```r mtcars[sample(nrow(mtcars), 6), c("mpg", "wt", "am", "gear")] ``` ``` mpg wt am gear AMC Javelin 15.2 3.435 0 3 Fiat 128 32.4 2.200 1 4 Merc 450SE 16.4 4.070 0 3 Lincoln Continental 10.4 5.424 0 3 Pontiac Firebird 19.2 3.845 0 3 Chrysler Imperial 14.7 5.345 0 3 ``` --- layout: true # Data input / output --- * Statistical data are usually structured like a spreadsheet (e.g., Excel) * Typical approach: read data from spreadsheet file into data frame * Easiest route: * R itself cannot read Excel files directly * Save as CSV file from Excel * Read with `read.csv()` or `read.table()` (more flexible) * Alternative option: * Use "Import Dataset" menu item in R Studio (requires add-on packages) --- * Data frames can be exported as a spreadsheet file using `write.csv()` or `write.table()` ```r data(Cars93, package = "MASS") write.csv(Cars93, file = "temp/cars93.csv") # export ``` * This creates a [text file](temp/cars93.csv) (that can be easily edited in Excel / LibreOffice) .scrollable300[ ``` cat: cars93.csv: No such file or directory ``` ] --- * We can use `read.csv()` to import it again .scrollable400[ ```r cars <- read.csv("temp/cars93.csv") # import (path relative to working directory) str(cars) ``` ``` 'data.frame': 93 obs. of 28 variables: $ X : int 1 2 3 4 5 6 7 8 9 10 ... $ Manufacturer : chr "Acura" "Acura" "Audi" "Audi" ... $ Model : chr "Integra" "Legend" "90" "100" ... $ Type : chr "Small" "Midsize" "Compact" "Midsize" ... $ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ... $ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ... $ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ... $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ... $ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ... $ AirBags : chr "None" "Driver & Passenger" "Driver only" "Driver & Passenger" ... $ DriveTrain : chr "Front" "Front" "Front" "Front" ... $ Cylinders : chr "4" "6" "6" "6" ... $ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ... $ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ... $ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ... $ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ... $ Man.trans.avail : chr "Yes" "Yes" "Yes" "Yes" ... $ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ... $ Passengers : int 5 5 5 6 4 6 6 6 5 6 ... $ Length : int 177 195 180 193 186 189 200 216 198 206 ... $ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ... $ Width : int 68 71 67 70 69 69 74 78 73 73 ... $ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ... $ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ... $ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ... $ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ... $ Origin : chr "non-USA" "non-USA" "non-USA" "non-USA" ... $ Make : chr "Acura Integra" "Acura Legend" "Audi 90" "Audi 100" ... ``` ] --- * These are the most basic data input / output functions * There are many other other specialized functions * Low-level utilties: `scan()`, `readLines()`, `readChar()`, `readBin()` * Various packages provide import / export to formats used by other software * R has its own "serialization" format using `save()` and `load()` --- class: center middle # Questions?