+ - 0:00:00
Notes for current slide
Notes for next slide

Expressions, Evaluation and Scoping in R

Deepayan Sarkar

1 / 62

REPL

  • R works using what is known as a REPL (Read-Eval-Print-Loop)

    • R waits for user input when it starts

    • User types input expression and presses Enter

    • R reads input and tries to evaluate

    • Evaluation is either successful or produces an error

    • R starts to wait again for further user input

  • Today's talk takes a closer look at how the evaluation step works

2 / 62

REPL

  • R works using what is known as a REPL (Read-Eval-Print-Loop)

    • R waits for user input when it starts

    • User types input expression and presses Enter

    • R reads input and tries to evaluate

    • Evaluation is either successful or produces an error

    • R starts to wait again for further user input

  • Today's talk takes a closer look at how the evaluation step works

  • Useful links: Source (R + Markdown), Makefile (conversion using knitr and pandoc), Extracted R code

2 / 62

Prerequisites

  • Basic R programming

    • You should be comfortable with writing functions

    • If you know how to use eval() and quote(), you already know too much

  • Statistical concepts used for illustration

    • Maximum likelihood, regularization

    • Probability density, histograms

    • Linear regression

3 / 62

Evaluation

  • Consider the following expression that we wish to evaluate
sqrt(x)
Error in eval(expr, envir, enclos): object 'x' not found
  • This expression involves two "symbols", sqrt and x

  • To be able to evaluate the expression, R must be able to "find" both symbols

find("sqrt")
[1] "package:base"
find("x") # not found
character(0)
4 / 62

Evaluation

  • So let us define x and try again
x <- 10
sqrt(x)
[1] 3.162278
x <- -1
sqrt(x)
Warning in sqrt(x): NaNs produced
[1] NaN
5 / 62

Evaluation

x <- -1+0i
sqrt(x)
[1] 0+1i
x <- "-1"
sqrt(x)
Error in sqrt(x): non-numeric argument to mathematical function
6 / 62

Evaluation

  • Let us check again where sqrt and x are found
find("sqrt")
[1] "package:base"
find("x")
[1] ".GlobalEnv"
7 / 62

Evaluation

  • Suppose we want to define a "smarter" sqrt()

  • First check what sqrt() does, so we can mimic:

sqrt
function (x) .Primitive("sqrt")
  • What does .Primitive() do? See help(.Primitive) (but we don't really need to know)
8 / 62

Evaluation

  • Define a "smart" sqrt() as follows:
sqrt <- function(x)
{
orig.sqrt <- .Primitive("sqrt")
if (is.character(x)) x <- as.numeric(x)
if (is.numeric(x) && any(x < 0)) x <- complex(real = x, imaginary = 0)
orig.sqrt(x)
}
  • We now have
x <- -1
sqrt(x)
[1] 0+1i
9 / 62

Multiple symbol definitions

  • Where is R finding these symbols?
find("x")
[1] ".GlobalEnv"
find("sqrt")
[1] ".GlobalEnv" "package:base"
  • Note that the "original" sqrt() is not overwritten; instead a new one is defined
10 / 62

The search path

  • Why is it using the sqrt in ".GlobalEnv" and not the one in "package:base"?
search()
[1] ".GlobalEnv" "package:knitr" "package:stats" "package:graphics"
[5] "package:grDevices" "package:utils" "package:datasets" "package:methods"
[9] "Autoloads" "package:base"
  • R searches for symbols sequentially in the "search path"
11 / 62

Environments

  • What exactly are the objects in the search path?

  • They are known as "environments", which are central to how evaluation works in R

  • In essence, an environment is a collection of symbols (variables) bound to certain values

  • Each environment usually also has an associated "enclosing environment" or "parent environment"

  • .GlobalEnv is the environment where user-defined variables are stored

12 / 62

Environments

  • The chain of enclosing environments starting from .GlobalEnv is identical to the search path
e <- .GlobalEnv
while (TRUE) { str(e, give.attr = FALSE); e <- parent.env(e) }
<environment: R_GlobalEnv>
<environment: package:knitr>
<environment: package:stats>
<environment: package:graphics>
<environment: package:grDevices>
<environment: package:utils>
<environment: package:datasets>
<environment: package:methods>
<environment: 0x7f9063aa8bb8>
<environment: base>
<environment: R_EmptyEnv>
Error in parent.env(e): the empty environment has no parent
search() # compare
[1] ".GlobalEnv" "package:knitr" "package:stats" "package:graphics"
[5] "package:grDevices" "package:utils" "package:datasets" "package:methods"
[9] "Autoloads" "package:base"
13 / 62

Creating and manipulating environments

  • new.env() explicitly creates environments

  • Variables inside environments can be accessed using $ (like lists or data frames)

e1 <- new.env()
e1$x <- seq(0, 2 * pi, length.out = 101)
e1$y <- cos(e1$x)
plot(y ~ x, data = e1, type = "l", ylim = c(-1, 1)) # Formula interface

plot of chunk unnamed-chunk-12

14 / 62

Environments are not duplicated when copied

  • Environments are an exception to R's duplicate-on-copy rule

  • A copy of an environment is the same environment, not a new one

e2 <- e1
range(e1$y)
[1] -1 1
e2$y <- abs(e2$y)
range(e1$y) # changes even though e1$y has not been directly modified
[1] 1.608123e-16 1.000000e+00
15 / 62

Inspecting the variables in an environment

  • The names of objects available in an environment can be obtained by ls()
ls(e2)
[1] "x" "y"
ls.str(e2) ## summary of all objects in e2
x : num [1:101] 0 0.0628 0.1257 0.1885 0.2513 ...
y : num [1:101] 1 0.998 0.992 0.982 0.969 ...
16 / 62

Environments and functions

  • Environments are central to how R functions work

  • Functions in R are actually "closures"

  • In addition to the argument list and body (code), a function also has an associated environment

  • This is usually the environment in which the function was defined

environment(sqrt)
<environment: R_GlobalEnv>
environment(base::sqrt) # special "primitive" (C) function, so no environment
NULL
environment(hist)
<environment: namespace:graphics>
17 / 62

What happens when a function is executed?

  • Every time a function is invoked, a new environment is created

  • This is the "evaluation environment" and contains the local variables of the function

  • The environment of the function is the environment enclosing this "evaluation environment"

18 / 62

What happens when a function is executed?

  • Every time a function is invoked, a new environment is created

  • This is the "evaluation environment" and contains the local variables of the function

  • The environment of the function is the environment enclosing this "evaluation environment"

  • While the function is running, evaluation tries to find symbols as follows:

    • First, in the evaluation environment

    • Next, in its enclosing environment, i.e., the environment of the function

    • Then, in the enclosing environment of that environment, and so on

  • This behaviour is called "lexical scoping"

18 / 62

What happens when a function is executed?

  • Every time a function is invoked, a new environment is created

  • This is the "evaluation environment" and contains the local variables of the function

  • The environment of the function is the environment enclosing this "evaluation environment"

  • While the function is running, evaluation tries to find symbols as follows:

    • First, in the evaluation environment

    • Next, in its enclosing environment, i.e., the environment of the function

    • Then, in the enclosing environment of that environment, and so on

  • This behaviour is called "lexical scoping"

  • Evaluation in the global environment follows the same rule (hence the search path)
18 / 62

Are these details important to know?

  • Depends!

  • You can use R productively with only a superficial understanding of evaluation

  • However, these details often help explain and diagnose seemingly odd behaviour

  • They can also be a lot of fun

19 / 62

Example: The Box-Cox transformation

  • Given positive-valued (possibly skewed) data, one may want to transform them to look Normal

  • The Box-Cox transformation conveniently combines power transforms and log in a single family

  • Parameterized by a single parameter λ

gλ(x)={xλ1λλ0logxλ=0
20 / 62

Scoping rules can mask errors

  • Consider the following function (which uses an undefined variable lambda)
boxcox <- function(x) # applies Box-Cox transformation on data 'x'
{
if (lambda == 0) log(x)
else (x^lambda - 1) / lambda
}
b <- function(x) # computes histogram bins given data 'x'
{
n <- sum(is.finite(x))
r <- extendrange(x)
seq(r[1], r[2], length.out = round(sqrt(n)) + 1)
}
x <- rlnorm(1000)
hist(boxcox(x), breaks = b) # error, as it should be
Error in boxcox(x): object 'lambda' not found
21 / 62

Scoping rules can mask errors

  • The same code will "work" if lambda happens to be defined
lambda <- 0
hist(boxcox(x), breaks = b) # notice that 'b' is a function! (See ?hist)

plot of chunk unnamed-chunk-18

22 / 62

But sometimes we want to write such functions

  • Normally, depending on this kind of scoping behaviour should be unnecessary

  • In fact, a well-designed function should not depend on arbitrary variables

  • However, there is one situation where this evaluation rule becomes important

  • When a function creates and returns or uses another function!

23 / 62

Do we need functions that create functions?

  • Probably not very common...

  • But only because we are limited in our thinking

  • We already use functions as arguments to other functions

  • Thinking "functionally" opens up many other interesting approaches

  • This is particularly useful for quick prototyping

24 / 62

Do we need functions that create functions?

  • A somewhat silly extension of the example above:
fboxcox <- function(lambda)
{
f <- function(x)
{
if (lambda == 0) log(x)
else (x^lambda - 1) / lambda
}
return(f)
}
boxcox.sqrt <- fboxcox(0.5)
boxcox.log <- fboxcox(0)
25 / 62

Do we need functions that create functions?

par(mfrow = c(1, 2))
hist(boxcox.sqrt(x), breaks = b) # equivalently, hist(fboxcox(0.5)(x), breaks = b)
hist(boxcox.log(x), breaks = b) # hist(fboxcox(0)(x), breaks = b)

plot of chunk unnamed-chunk-21

26 / 62

Functions returned by functions

  • Note that boxcox.sqrt and boxcox.log are both functions (created by another function fboxcox)
boxcox.sqrt
function(x)
{
if (lambda == 0) log(x)
else (x^lambda - 1) / lambda
}
<environment: 0x7f90676098a8>
boxcox.log
function(x)
{
if (lambda == 0) log(x)
else (x^lambda - 1) / lambda
}
<bytecode: 0x7f90636851f0>
<environment: 0x7f9067669b98>
  • ... that appear to be identical
27 / 62

Functions returned by functions

  • But they have different "environments"
environment(boxcox.sqrt)
<environment: 0x7f90676098a8>
environment(boxcox.log)
<environment: 0x7f9067669b98>
28 / 62

Functions returned by functions

  • This is because they were created in different executions of fboxcox()

  • Although not immediately apparent, these environments retain the value of lambda

ls.str(environment(boxcox.sqrt))
f : function (x)
lambda : num 0.5
ls.str(environment(boxcox.log))
f : function (x)
lambda : num 0
29 / 62

Functions returned by functions

  • Without lexical scoping, this would not have been possible

  • Another example from The R Paper:

y <- 123
f <- function(x) {
y <- x * x
g <- function() print(y)
g()
}
f(10)
[1] 100
  • S, which is similar to R but does not have lexical scoping, would print 123
30 / 62

A more serious example

  • Suppose we want to transform (positive-valued) data x1,x2,,xn
31 / 62

A more serious example

  • Suppose we want to transform (positive-valued) data x1,x2,,xn
  • ... to look as Normal as possible, but we don't know what the correct transformation should be

  • A standard textbook solution is to look for an "optimum" choice within the Box-Cox family

  • A maximum-likelihood approach: assume λ-transformed data follow N(μ,σ2)

31 / 62

A more serious example

  • Suppose we want to transform (positive-valued) data x1,x2,,xn
  • ... to look as Normal as possible, but we don't know what the correct transformation should be

  • A standard textbook solution is to look for an "optimum" choice within the Box-Cox family

  • A maximum-likelihood approach: assume λ-transformed data follow N(μ,σ2)

  • A little mathematics tells us that

    • There is no closed form solution for λ, but

    • The optimum (MLE) λ can be found by minimizing f(λ)=n2(log2π+logσ^2(λ)+1)(λ1)logxi

    • where σ^2(λ) is the sample variance of the transformed data

31 / 62

A more serious example

  • Suppose we want to transform (positive-valued) data x1,x2,,xn
  • ... to look as Normal as possible, but we don't know what the correct transformation should be

  • A standard textbook solution is to look for an "optimum" choice within the Box-Cox family

  • A maximum-likelihood approach: assume λ-transformed data follow N(μ,σ2)

  • A little mathematics tells us that

    • There is no closed form solution for λ, but

    • The optimum (MLE) λ can be found by minimizing f(λ)=n2(log2π+logσ^2(λ)+1)(λ1)logxi

    • where σ^2(λ) is the sample variance of the transformed data

  • Here f is a function, which is itself a function of data x1,x2,,xn
31 / 62

A quick prototype

  • Suppose we have a general numerical optimizer that can optimize a function of one argument

  • Make a function that converts data into a function suitable for optimization:

negllBoxCox <- function(x)
{
n <- length(x)
slx <- sum(log(x)) # compute only once
function(lambda) {
y <- fboxcox(lambda)(x) # Note use of fboxcox() defined earlier
sy <- mean((y - mean(y))^2)
n * log(sy) / 2 - (lambda - 1) * slx # ignoring constant terms
}
}
32 / 62

A quick prototype

  • Let's compute the "negative log likelihood" function for some simulated data
x <- rlnorm(100) # log-normal, so true lambda = 0
f <- negllBoxCox(x)
f(0)
[1] 4.311438
  • We can can now use this function in any optimizer
optimize(f, lower = -10, upper = 10)
$minimum
[1] -0.1608655
$objective
[1] 2.230694
## OR optim(par = 1, fn = f) for alternative methods
33 / 62

A quick prototype

  • We can even plot f — but first we have to "vectorize" our function

  • The Vectorize() function creates a vectorized function from a non-vectorized one

plot(Vectorize(f), from = -2, to = 2, n = 1000, las = 1)

plot of chunk unnamed-chunk-29

34 / 62

A quick prototype

  • We can also do a quick simulation study of how well this method estimates λ when the true λ=1
lambda.hat <-
replicate(1000,
optimize(negllBoxCox(rnorm(100, mean = 10)), lower = -10, upper = 10)$minimum)
hist(lambda.hat, breaks = b, las = 1)

plot of chunk unnamed-chunk-30

35 / 62

Another example: linear regression with general loss function

  • Another common use case: optimization of a loss function

  • Consider the following general approach to linear regression given data (xi,yi),i=1,,n

β^=argminβi=1nL(yiβ0β1xi)=argminβf(β)
  • Common choices for L are

    • L(u)=u2

    • L(u)=|u|

    • L(u)={u2|u|cc(2|u|c)otherwise
36 / 62

Objective functions (to be optimized)

  • The following functions takes data and L, and creates a suitable objective function
make.objective <- function(x, y, L = abs)
{
function(beta)
{
sum(L(y - beta[1] - beta[2] * x))
}
}
data(phones, package = "MASS")
obj.lad <- with(phones, make.objective(year, calls, L = abs))
obj.lse <- with(phones, make.objective(year, calls, L = function(u) u*u))
37 / 62

Objective functions (to be optimized)

obj.lad
function(beta)
{
sum(L(y - beta[1] - beta[2] * x))
}
<environment: 0x7f9061b191e8>
environment(obj.lse)$L
function(u) u*u
<environment: 0x7f9061bad478>
environment(obj.lad)$L
function (x) .Primitive("abs")
38 / 62

Results of optimization

fm.lad <- optim(obj.lad, par = c(0,0))
fm.lse <- optim(obj.lse, par = c(0,0))
str(fm.lad)
List of 5
$ par : num [1:2] -66.05 1.35
$ value : num 844
$ counts : Named int [1:2] 117 NA
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
str(fm.lse)
List of 5
$ par : num [1:2] -260.12 5.04
$ value : num 69544
$ counts : Named int [1:2] 109 NA
..- attr(*, "names")= chr [1:2] "function" "gradient"
$ convergence: int 0
$ message : NULL
39 / 62

Results of optimization

plot(calls ~ year, phones)
abline(fm.lad$par, col = "blue")
abline(fm.lse$par, col = "red") # This is what we would get from lm()

plot of chunk unnamed-chunk-34

40 / 62

A more complex example

  • Suppose we want to plot a histogram

  • One important parameter is the bin width h

  • Fixing starting point, what is an optimal choice?

  • Maximum Likelihood fails because it "oversmooths", so needs regularization

  • Options

41 / 62

Minimize AIC

  • The Akaike Information Criterion is a popular model selection criterion

  • Simple definition: 2k2log(L^) where

    • k is the number of parameters (here the number of bins)

    • L^ is the maximum likelihood (here the joint density under the histogram)

  • Our prototype: make function of k from input data

42 / 62

AIC for histograms

chooseBinsAIC <- function(x, p = 0.25)
{
x <- x[is.finite(x)]
n <- length(x)
r <- extendrange(x)
d <- r[2] - r[1]
histAIC <- function(B)
{
b <- seq(r[1], r[2], length.out = B + 1)
h <- hist(x, breaks = b, plot = FALSE)
## likelihood is simply product of density^counts
keep <- h$counts > 0 # skip 0 counts (by convention 0 log 0 == 0)
logL <- with(h, sum(counts[keep] * log(density[keep])))
2 * B - 2 * logL
}
Bvec <- seq(floor(sqrt(n)*p), ceiling(sqrt(n)/p))
list(range = r, B = Bvec,
AIC = unname(sapply(Bvec, histAIC)))
}
43 / 62

AIC: implementation

xx <- c(rnorm(500), rnorm(300, mean = 6, sd = 1.5))
ll <- chooseBinsAIC(xx, p = 0.125)
str(ll)
List of 3
$ range: num [1:2] -3.79 11.35
$ B : int [1:225] 3 4 5 6 7 8 9 10 11 12 ...
$ AIC : num [1:225] 4146 4115 3876 3736 3723 ...
44 / 62

AIC: implementation

with(ll, {
plot(AIC ~ B, type = "o")
abline(v = B[which.min(AIC)])
})

plot of chunk unnamed-chunk-37

45 / 62

AIC: comparison with other "rules"

(nclass.aic <- with(ll, B[which.min(AIC)]))
[1] 26
nclass.Sturges(xx)
[1] 11
nclass.FD(xx)
[1] 12
nclass.scott(xx)
[1] 12
46 / 62

AIC: Resulting histogram

breaks <- seq(ll$range[1], ll$range[2], length.out = nclass.aic + 1)
hist(xx, breaks = breaks)

plot of chunk unnamed-chunk-39

47 / 62

Cross validation: summary

  • Denote the histogram for bin width h by f^h, and the unknown density by f

  • Want to choose h to minimize integrated square error

[f^h(x)f(x)]2dx=f^h(x)2dx2f^h(x)f(x)dx+f(x)2dx

  • The third term is unknown but constant (as h varies), so can be dropped

  • The first term can be calculated from f^h (which is piecewise constant)

  • The second term involves the unknown f, but the integral can be viewed as E(f^h(X)) when Xf

48 / 62

Cross validation: summary

  • Leave-one-out estimator: average of f^h(i)(xi)

  • The estimated integrated square error (without the constant term) simplifies to

2(n1)hn+1(n1)n2hkck2

  • where ck are bin counts
49 / 62

Cross validation: implementation

chooseBins <- function(x, p = 0.25)
{
x <- x[is.finite(x)]
n <- length(x)
r <- extendrange(x)
d <- r[2] - r[1]
CVE <- function(B)
{
h <- d / B
breaks <- seq(r[1], r[2], length.out = B+1)
k <- findInterval(x, breaks) # runtime O(n * log(B))
counts <- table(factor(k, levels = 1:B))
2 / ((n-1)*h) - sum(counts^2) * (n+1) / ((n-1) * n^2 * h)
}
Bvec <- seq(floor(sqrt(n)*p), ceiling(sqrt(n)/p))
list(range = r, B = Bvec, CVE = unname(sapply(Bvec, CVE)))
}
50 / 62

Cross validation: implementation

## xx <- c(rnorm(500), rnorm(300, mean = 6, sd = 1.5))
ll <- chooseBins(xx, p = 0.125)
str(ll)
List of 3
$ range: num [1:2] -3.79 11.35
$ B : int [1:225] 3 4 5 6 7 8 9 10 11 12 ...
$ CVE : num [1:225] -0.0831 -0.0802 -0.1034 -0.1207 -0.1195 ...
51 / 62

Cross validation: implementation

with(ll, {
plot(CVE ~ B, type = "o")
abline(v = B[which.min(CVE)])
})

plot of chunk unnamed-chunk-42

52 / 62

Cross validation: comparison with other "rules"

(nclass.cv <- with(ll, B[which.min(CVE)]))
[1] 20
nclass.Sturges(xx)
[1] 11
nclass.FD(xx)
[1] 12
nclass.scott(xx)
[1] 12
53 / 62

Cross validation: Resulting histogram

breaks <- seq(ll$range[1], ll$range[2], length.out = nclass.cv + 1)
hist(xx, breaks = breaks)

plot of chunk unnamed-chunk-44

54 / 62
  • Again, consider an expression that will cause an error when evaluated
rm(list = c("x", "y")) # remove x, y if defined earlier
plot(x, y, type = "l")
Error in plot(x, y, type = "l"): object 'x' not found
55 / 62

Quoted (unevaluated) expressions

  • R can actually work with an expression without evaluating it
e <- quote(plot(x, y, type = "l"))
e
plot(x, y, type = "l")
e[[1]]
plot
e[[2]]
x
e[[3]]
y
56 / 62

Evaluating Quoted expressions

  • A quoted expression can be evaluated using eval()
eval(e)
Error in plot(x, y, type = "l"): object 'x' not found
  • Fails because current environment (.GlobalEnv) does not contain variables x and y

  • But Recall: Environment e2 defined earlier contains

ls.str(e2)
x : num [1:101] 0 0.0628 0.1257 0.1885 0.2513 ...
y : num [1:101] 1 0.998 0.992 0.982 0.969 ...
57 / 62

Evaluating Quoted expressions

eval(e, envir = e2)

plot of chunk unnamed-chunk-49

58 / 62

Can we re-implement the replicate() function?

  • This is how functions like replicate() and with() work
my.replicate <- function(N, e)
{
ans <- numeric(N)
for (i in 1:N) ans[[i]] <- eval(e)
ans
}
  • But we have to be careful to use it
my.replicate(5, max(rnorm(10))) # not what we wanted
[1] 2.1574 2.1574 2.1574 2.1574 2.1574
my.replicate(5, quote(max(rnorm(10))))
[1] 0.5284321 0.6858938 1.3573033 0.8895072 1.6434051
59 / 62

Non-standard evaluation

  • The substitute() trick: non-standard evaluation (using lazy argument evaluation)
my.replicate <- function(N, e)
{
expr <- substitute(e) # avoids the need to quote()
ans <- numeric(N)
for (i in 1:N) ans[[i]] <- eval(expr)
ans
}
str(x <- my.replicate(1000, max(rnorm(20))))
num [1:1000] 1.648 2.524 1.636 1.346 0.729 ...
60 / 62

Lazy evaluation

  • Lazy evaluation: function arguments are only evaluated when needed
set.seed(123)
rm(sqrt) # remove the sqrt() we defined earlier
chooseOne <- function(a, b, u = runif(1))
{
if (u < 0.5) a else b
}
  • Only one of a and b is evaluated in any particular execution
chooseOne(sqrt(2), sqrt(-2)) # no warning message
[1] 1.414214
chooseOne(sqrt(2), sqrt(-2)) # warning message
Warning in sqrt(-2): NaNs produced
[1] NaN
61 / 62

Further reading

? environment
? eval
? expression
? quote
? bquote
? substitute
? with
with.default
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x7f9061b19fb0>
<environment: namespace:base>
replicate
function (n, expr, simplify = "array")
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
<bytecode: 0x7f9061ac9e08>
<environment: namespace:base>
62 / 62

REPL

  • R works using what is known as a REPL (Read-Eval-Print-Loop)

    • R waits for user input when it starts

    • User types input expression and presses Enter

    • R reads input and tries to evaluate

    • Evaluation is either successful or produces an error

    • R starts to wait again for further user input

  • Today's talk takes a closer look at how the evaluation step works

2 / 62
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow