class: center, middle # Non-Standard Evaluation - 2 ## Introductory Computer Programming ### Deepayan Sarkar
--- layout: true # Recap --- * Lazy evaluation + `substitute()` lets a function obtain the expression supplied to an argument * A slightly modified version of an example from last time ```r withCall <- function(e) return(list(value = e, call = substitute(e))) withCall(sqrt(10)) ``` ``` $value [1] 3.162278 $call sqrt(10) ``` ```r withCall(rbind(rnorm(5), rexp(5))) ``` ``` $value [,1] [,2] [,3] [,4] [,5] [1,] -0.2111153 -0.06793764 1.189678 0.06567616 0.9932661 [2,] 0.3719345 0.84920208 1.885201 1.47168561 4.1290767 $call rbind(rnorm(5), rexp(5)) ``` --- * For example, this is how `plot()` constructs nice axis labels ```r x <- seq(-10, 10, length.out = 201) plot(x = x, y = x * sin(x), type = "l") ``` ![](figures/nseval-1-unnamed-chunk-14-1.png) --- layout: true # Non-standard evaluation --- * Another common use of this technique is to simplify variable references * Consider the following method to list all `choose(5, 2)` combinations ```r g <- expand.grid(a = 1:5, b = 1:5) g$b > g$a ``` ``` [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE [17] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE ``` ```r g[ g$b > g$a , ] ``` ``` a b 6 1 2 11 1 3 12 2 3 16 1 4 17 2 4 18 3 4 21 1 5 22 2 5 23 3 5 24 4 5 ``` --- * Referring to all variables in `g` using `g$...` is cumbersome * We can in principle simplify this (referring to `g` only once) using ```r g[ eval(quote(b > a), g), ] ``` ``` a b 6 1 2 11 1 3 12 2 3 16 1 4 17 2 4 18 3 4 21 1 5 22 2 5 23 3 5 24 4 5 ``` --- * Several R functions make this even simpler * A general purpose (generic) function is `with()` ```r with ``` ``` function (data, expr, ...) UseMethod("with")
``` ```r with.default ``` ``` function (data, expr, ...) eval(substitute(expr), data, enclos = parent.frame())
``` ```r with(g, b > a) # equivalent to eval(quote(b > a), envir = g) ``` ``` [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE [17] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE ``` --- * This can be used as follows ```r g[ with(g, b > a), ] ``` ``` a b 6 1 2 11 1 3 12 2 3 16 1 4 17 2 4 18 3 4 21 1 5 22 2 5 23 3 5 24 4 5 ``` --- * For this particular purpose, there is an even more convenient function ```r subset(g, b > a) ``` ``` a b 6 1 2 11 1 3 12 2 3 16 1 4 17 2 4 18 3 4 21 1 5 22 2 5 23 3 5 24 4 5 ``` --- * A popular add-on package for data manipulation, called [dplyr](https://dplyr.tidyverse.org/), has a similar function `filter()` ```r dplyr::filter(g, b > a) ``` ``` a b 1 1 2 2 1 3 3 2 3 4 1 4 5 2 4 6 3 4 7 1 5 8 2 5 9 3 5 10 4 5 ``` * [dplyr](https://dplyr.tidyverse.org/) has several other functions that behave similarly * It is highly recommended for routine data manipulation tasks --- * Other functions that support similar features are - `transform()`, `dplyr::mutate()` to add new variables that are functions of existing variables - `dplyr::select()` to select a subset of columns (which `subset()` can also do) - `lm()` and many other modeling functions - `xtabs()` for creating cross-tabulations - Data visulization packages like lattice and ggplot2 --- * Another example: ```r suppressMessages(library(dplyr)) mtcars <- mutate(mtcars, am = factor(am)) summarise(group_by(mtcars, am), mean_mpg = mean(mpg)) ``` ``` `summarise()` ungrouping output (override with `.groups` argument) ``` ``` # A tibble: 2 x 2 am mean_mpg
1 0 17.1 2 1 24.4 ``` * Equivalent calculation without using add-on packages ```r with(mtcars, tapply(mpg, am, FUN = mean)) ``` ``` 0 1 17.14737 24.39231 ``` --- * A more complicated example: ```r summarise(group_by(mtcars, am), ols_coef = coef(lm(mpg ~ wt)), which = c("intercept", "slope")) ``` ``` `summarise()` regrouping output by 'am' (override with `.groups` argument) ``` ``` # A tibble: 4 x 3 # Groups: am [2] am ols_coef which
1 0 31.4 intercept 2 0 -3.79 slope 3 1 46.3 intercept 4 1 -9.08 slope ``` * Equivalent calculation without using add-on packages ```r sapply(split(mtcars, mtcars$am), function(d) coef(lm(mpg ~ wt, data = d))) ``` ``` 0 1 (Intercept) 31.416055 46.294478 wt -3.785908 -9.084268 ``` --- * Of course, all these can be done using C-type loops, but that is much more complicated and error-prone * I will not explore dplyr any further, but you should learn more about it online * The `*apply()` type functions are generally useful, and we will discuss them in more detail --- * Another example: replicating a simulation experiment * Consider this example from our previous assignment: ```r n <- 20 x <- sample(n, 2) max(x) %% min(x) == 0 ``` ``` [1] FALSE ``` * To estimate the probability of this event, we want to run it several times and compute proportion --- * We have seen how to do this using a loop * An alternative approach is to use the `replicate()` function ```r replicate(10, { x <- sample(n, 2) max(x) %% min(x) == 0 }) ``` ``` [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE ``` * `replicate()` takes the _expression_ supplied as its second argument and evauates it multiple times ```r s <- replicate(5000, { x <- sample(n, 2) max(x) %% min(x) == 0 }) sum(s) / length(s) # estimated probability of success ``` ``` [1] 0.2402 ``` --- layout: true # Return value of `replicate()` --- * The return value of `replicate()` depends on what the expression returns when evaluated * In the above example, each evaluation results in a scalar logical * In that case, the results are aggregated to produce a vector -- * If the result is always a vector of same length (greater than one), the result is combined into a matrix ```r quantile(rnorm(500), probs = c(0.25, 0.5, 0.75)) ``` ``` 25% 50% 75% -0.61233300 0.04772787 0.66176972 ``` ```r replicate(6, quantile(rnorm(500), probs = c(0.25, 0.5, 0.75))) ``` ``` [,1] [,2] [,3] [,4] [,5] [,6] 25% -0.6465869 -0.64019043 -0.6402106 -0.71666089 -0.66217588 -0.69065654 50% 0.0543367 0.05821707 0.1592666 0.04837475 0.02085983 -0.01637232 75% 0.6885536 0.69380202 0.7960084 0.75286852 0.71655121 0.70818662 ``` --- * If the result has inconsistent length, the result is a list ```r set.seed(20200101) replicate(6, unique(sample(10, 5, replace = TRUE))) ``` ``` [[1]] [1] 1 4 5 8 [[2]] [1] 4 7 8 6 [[3]] [1] 10 9 5 2 1 [[4]] [1] 6 4 1 2 [[5]] [1] 6 9 2 3 [[6]] [1] 1 9 6 2 4 ``` --- * This may sometimes lead to unanticipated behaviour ```r set.seed(20200101) replicate(2, unique(sample(10, 5, replace = TRUE))) ``` ``` [,1] [,2] [1,] 1 4 [2,] 4 7 [3,] 5 8 [4,] 8 6 ``` -- * When output length is known to be variable, it is better to explicitly disable simplification ```r set.seed(20200101) replicate(2, unique(sample(10, 5, replace = TRUE)), simplify = FALSE) # always return list ``` ``` [[1]] [1] 1 4 5 8 [[2]] [1] 4 7 8 6 ``` --- layout: false class: middle, center # Questions?