class: center, middle # Basic usage of R - 1 ## Introductory Computer Programming ### Deepayan Sarkar
--- # Basics of using R R is more flexible than a regular calculator * In fact, R is a full programming language * Most standard data analysis methods are already implemented * Can be extended by writing add-on packages * Thousands of add-on packages are available --- # Major concepts * Variables (in the context of programming) * Data structures needed for data analyis * Functions (set of instructions for performing a procedure) --- # Variables * Variables are symbols that may be associated with different values * Computations involving variables are done using their current value ```r x <- 10 # assignment sqrt(x) ``` ``` [1] 3.162278 ``` ```r x <- -1 sqrt(x) ``` ``` Warning in sqrt(x): NaNs produced ``` ``` [1] NaN ``` ```r x <- -1+0i sqrt(x) ``` ``` [1] 0+1i ``` --- # Data structures for data analysis * Vectors * Matrices * Data frames (a spreadsheet-like data set) * Lists (general collection of objects) --- # Atomic vectors * Indexed collection of homogeneous scalars, can be * Numeric / Integer * Categorical (factor) * Character * Logical (`TRUE` / `FALSE`) * Missing values are allowed, indicated as `NA` * Elements are indexed starting from 1 * $i$th element of vector `x` can be extracted using `x[i]` * There are also more sophisticated forms of (vector) indexing --- # Atomic vectors: examples ```r month.name # built-in ``` ``` [1] "January" "February" "March" "April" "May" "June" "July" "August" [9] "September" "October" "November" "December" ``` ```r x <- rnorm(10) x ``` ``` [1] 0.83685412 -2.42704018 0.09692961 1.55453603 0.77415311 -0.92299041 0.53968367 0.13350713 [9] -1.44594243 0.18168865 ``` -- ```r str(x) # useful function ``` ``` num [1:10] 0.8369 -2.427 0.0969 1.5545 0.7742 ... ``` ```r str(month.name) ``` ``` chr [1:12] "January" "February" "March" "April" "May" "June" "July" "August" "September" ... ``` --- # Atomic vectors: examples ```r m <- sample(1:12, 30, rep = TRUE) m ``` ``` [1] 3 2 11 11 6 10 5 10 8 6 4 5 1 6 12 4 5 10 9 6 7 9 4 8 4 3 1 2 1 10 ``` ```r mf <- factor(m, levels = 1:12, labels = month.name) mf ``` ``` [1] March February November November June October May October August [10] June April May January June December April May October [19] September June July September April August April March January [28] February January October 12 Levels: January February March April May June July August September October ... December ``` ```r str(m) ``` ``` int [1:30] 3 2 11 11 6 10 5 10 8 6 ... ``` ```r str(mf) ``` ``` Factor w/ 12 levels "January","February",..: 3 2 11 11 6 10 5 10 8 6 ... ``` --- # Atomic vectors * "Scalars" are just vectors of length 1 ```r str(numeric(2)) ``` ``` num [1:2] 0 0 ``` ```r str(numeric(1)) ``` ``` num 0 ``` ```r str(0) ``` ``` num 0 ``` --- # Atomic vectors * Vectors can have length zero ```r numeric(0) ``` ``` numeric(0) ``` ```r logical(0) ``` ``` logical(0) ``` --- # Types of indexing * Indexing refers to extracting subsets of vectors (or other kinds of data) * R supports several kinds of indexing: * Indexing by a vector of positive integers * Indexing by a vector of negative integers * Indexing by a logical vector * Indexing by a vector of names --- # Types of indexing * The "standard" C-like indexing with a scalar (vector of length 1): ```r month.name[2] # the first index is 1, not 0 ``` ``` [1] "February" ``` * The "index" can also be an integer vector ```r month.name[c(2, 4, 6, 9, 11)] ``` ``` [1] "February" "April" "June" "September" "November" ``` * Elements can be repeated ```r month.name[c(2, 2, 6, 4, 6, 11)] ``` ``` [1] "February" "February" "June" "April" "June" "November" ``` --- # Types of indexing * "Out-of-bounds" indexing give `NA` (missing) ```r month.name[13] ``` ``` [1] NA ``` ```r seq(1, by = 2, length.out = 8) ``` ``` [1] 1 3 5 7 9 11 13 15 ``` ```r month.name[seq(1, by = 2, length.out = 8)] ``` ``` [1] "January" "March" "May" "July" "September" "November" NA NA ``` --- # Types of indexing * Negative integers omit the specified entries ```r month.name[-2] ``` ``` [1] "January" "March" "April" "May" "June" "July" "August" "September" [9] "October" "November" "December" ``` ```r month.name[-c(2, 4, 6, 9, 11)] ``` ``` [1] "January" "March" "May" "July" "August" "October" "December" ``` * Cannot be mixed with positive integers ```r month.name[c(2, -3)] ``` ``` Error in month.name[c(2, -3)]: only 0's may be mixed with negative subscripts ``` --- # Types of indexing * Zero has a special meaning - doesn't select anything ```r month.name[0] ``` ``` character(0) ``` ```r month.name[integer(0)] ## same as empty index ``` ``` character(0) ``` ```r month.name[c(1, 2, 0, 11, 12)] ``` ``` [1] "January" "February" "November" "December" ``` ```r month.name[-c(1, 2, 0, 11, 12)] ``` ``` [1] "March" "April" "May" "June" "July" "August" "September" "October" ``` --- # Types of indexing * Indexing by logical vector: select `TRUE` elements ```r month.name[c(TRUE, FALSE, FALSE)] # index replicated ``` ``` [1] "January" "April" "July" "October" ``` ```r i <- substring(month.name, 1, 1) == "J" i ``` ``` [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE ``` ```r month.name[i] ``` ``` [1] "January" "June" "July" ``` --- # Types of indexing * Common use: extract subset satisfying a certain condition (also called "filtering") ```r (x <- rnorm(20)) ``` ``` [1] -2.851806639 0.711520932 -0.649975983 0.338946381 0.554063362 -1.856386243 -1.605922598 [8] -0.002954457 -0.670840417 -1.839289724 -1.222433477 0.466193505 -0.408518505 0.060178539 [15] 1.307296324 0.072137782 -4.085164248 1.385663548 -0.406905851 -0.722480560 ``` ```r x > 0 ``` ``` [1] FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE [17] FALSE TRUE FALSE FALSE ``` ```r x[x > 0] ``` ``` [1] 0.71152093 0.33894638 0.55406336 0.46619351 0.06017854 1.30729632 0.07213778 1.38566355 ``` ```r mean(x[x > 0]) ``` ``` [1] 0.612 ``` --- # Types of indexing * Sometimes logical indexing can be replaced by integer indexing using `which()` ```r i ``` ``` [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE ``` ```r which(i) ``` ``` [1] 1 6 7 ``` ```r month.name[ which(i) ] ``` ``` [1] "January" "June" "July" ``` ```r month.name[ -which(i) ] # same as month.name[ !i ] ``` ``` [1] "February" "March" "April" "May" "August" "September" "October" "November" [9] "December" ``` --- # Types of indexing * But be careful about zero-length indices ```r which(substring(month.name, 1, 1) == "B") ``` ``` integer(0) ``` ```r month.name[ which( substring(month.name, 1, 1) == "B") ] ``` ``` character(0) ``` ```r -which(substring(month.name, 1, 1) == "B") ``` ``` integer(0) ``` ```r month.name[ -which( substring(month.name, 1, 1) == "B") ] ``` ``` character(0) ``` --- class: center middle # Questions?