class: center, middle # Replicate and the Apply Family of Functions - 2 ## Introductory Computer Programming ### Deepayan Sarkar
--- # The `*apply()` family of functions * `replicate()` re-evaluates the same expression multiple times * We often want to something similar: - Evaluate the same expression multiple times, but with different arguments * This is what the various `*apply()` functions are intended for --- layout: true # The `lapply()` function --- * The most basic apply-type function is `lapply()` * It takes two mandatory arguments: `lapply(X, FUN)` - `X` is a vector object, with `i`-th element given by `X[[i]]` - `FUN` is a function * The result is a _list_ object, with the same length as `X` * The `i`-th element of the result is `FUN(X[[i]])` --- * Example: ```r lapply(1:5, sample) ``` ``` [[1]] [1] 1 [[2]] [1] 1 2 [[3]] [1] 2 1 3 [[4]] [1] 4 1 2 3 [[5]] [1] 2 5 1 4 3 ``` * This is essentially a "vectorized" version of the for loop ```r ans <- list() for (i in 1:5) ans[[i]] <- sample(i) # sample(n) produces a random permutation of 1:n ``` --- * Unlike `replicate()`, the second argument of `lapply()` is not an expression * In other words, `lapply()` does not do non-standard evaluation — whereas `replicate()` does * `lapply()` _always_ returns a list — similar to `simplify = FALSE` in `replicate()` -- * The somewhat novel feature of `lapply()` is that the `FUN` argument must be a _function_ * This is actually another important difference between R and C: - Functions are not "special" - They can be assigned to variables - They can be supplied as arguments to other functions - They can even be "anonymous", i.e., they don't need to be associated with a name - In short, they behave just as any other conventional object (vectors, lists, etc.) --- * In fact, it is expected that R users will be able to write short functions when necessary * Consider our previous example: ```r lapply(1:5, sample) ``` * This is conceptually equivalent to ```r ans <- list() for (i in 1:5) ans[[i]] <- sample(i) ``` * But suppose we instead want ```r for (i in 1:5) ans[[i]] <- sample(10, i, replace = TRUE) ``` --- * `FUN = sample` will no longer serve our purpose, but we can instead define ```r sample_of_size <- function(i) { return(sample(10, i, replace = TRUE)) } lapply(1:5, FUN = sample_of_size) ``` ``` [[1]] [1] 7 [[2]] [1] 7 10 [[3]] [1] 5 9 4 [[4]] [1] 9 9 3 6 [[5]] [1] 8 9 4 9 1 ``` --- * In fact, there is no need to _name_ the function by storing it in a variable ```r lapply(1:5, FUN = function(i) sample(10, i, replace = TRUE)) ``` ``` [[1]] [1] 1 [[2]] [1] 9 6 [[3]] [1] 4 9 3 [[4]] [1] 10 1 6 3 [[5]] [1] 2 7 7 2 9 ``` * This idea is often a big conceptual stumbling block for new R users --- * A potentially useful feature of `lapply()`: ```r ans <- lapply(X, FUN, ...) # with further arguments ``` * is equivalent to ```r for (i in seq_len(length(X))) ans[[i]] <- FUN(X[[i]], ...) ``` --- * For example, this allows ```r str(sample) ``` ``` function (x, size, replace = FALSE, prob = NULL) ``` ```r lapply(1:5, FUN = sample, size = 20, replace = TRUE) ``` ``` [[1]] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [[2]] [1] 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 2 1 1 2 1 [[3]] [1] 1 2 3 1 1 3 1 2 3 3 1 1 2 1 1 1 1 3 2 3 [[4]] [1] 2 1 3 2 1 3 2 3 2 3 1 3 3 4 4 2 4 3 1 1 [[5]] [1] 4 2 2 2 3 1 5 2 2 5 5 4 5 5 5 2 3 2 2 5 ``` --- * As named arguments match in preference to position, we can even do ```r lapply(1:6, FUN = sample, x = month.abb, replace = TRUE) # sample(x = month.abb, size = i, replace = TRUE) ``` ``` [[1]] [1] "Mar" [[2]] [1] "Dec" "Apr" [[3]] [1] "Sep" "Jun" "Mar" [[4]] [1] "Jan" "Jan" "Dec" "Mar" [[5]] [1] "Sep" "Jan" "Mar" "Aug" "Dec" [[6]] [1] "Apr" "May" "Sep" "May" "May" "Oct" ``` --- layout: false class: middle, center # Questions?