Title: | Apply Mapping Functions in Parallel using Futures |
Version: | 0.3.1 |
Description: | Implementations of the family of map() functions from 'purrr' that can be resolved using any 'future'-supported backend, e.g. parallel on the local machine or distributed on a compute cluster. |
License: | MIT + file LICENSE |
URL: | https://github.com/DavisVaughan/furrr, https://furrr.futureverse.org/ |
BugReports: | https://github.com/DavisVaughan/furrr/issues |
Depends: | future (≥ 1.25.0), R (≥ 3.4.0) |
Imports: | globals (≥ 0.14.0), lifecycle (≥ 1.0.1), purrr (≥ 0.3.4), rlang (≥ 1.0.2), vctrs (≥ 0.4.1) |
Suggests: | carrier, covr, dplyr (≥ 0.7.4), knitr, listenv (≥ 0.6.0), magrittr, rmarkdown, testthat (≥ 3.0.0), tidyselect, withr |
Config/Needs/website: | progressr |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
NeedsCompilation: | no |
Packaged: | 2022-08-15 19:00:06 UTC; davis |
Author: | Davis Vaughan [aut, cre], Matt Dancho [aut], RStudio [cph, fnd] |
Maintainer: | Davis Vaughan <davis@rstudio.com> |
Repository: | CRAN |
Date/Publication: | 2022-08-15 19:40:02 UTC |
furrr: Apply Mapping Functions in Parallel using Futures
Description
Implementations of the family of map() functions from 'purrr' that can be resolved using any 'future'-supported backend, e.g. parallel on the local machine or distributed on a compute cluster.
Author(s)
Maintainer: Davis Vaughan davis@rstudio.com
Authors:
Matt Dancho mdancho@business-science.io
Other contributors:
RStudio [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/DavisVaughan/furrr/issues
Options to fine tune furrr
Description
These options fine tune furrr functions, such as future_map()
. They
are either used by furrr directly, or are passed on to future::future()
.
Usage
furrr_options(
...,
stdout = TRUE,
conditions = "condition",
globals = TRUE,
packages = NULL,
seed = FALSE,
scheduling = 1,
chunk_size = NULL,
prefix = NULL
)
Arguments
... |
These dots are reserved for future extensibility and must be empty. |
stdout |
A logical.
|
conditions |
A character string of conditions classes to be relayed.
The default is to relay all conditions, including messages and warnings.
Errors are always relayed. To not relay any conditions (besides errors),
use |
globals |
A logical, a character vector, a named list, or |
packages |
A character vector, or |
seed |
A logical, an integer of length |
scheduling |
A single integer, logical, or
This argument is only used if |
chunk_size |
A single integer, |
prefix |
A single character string, or |
Global variables
globals
controls how globals are identified, similar to the globals
argument of future::future()
. Since all function calls use the same set of
globals, furrr gathers globals upfront (once), which is more efficient than
if it was done for each future independently.
If
TRUE
orNULL
, then globals are automatically identified and gathered.If a character vector of names is specified, then those globals are gathered.
If a named list, then those globals are used as is.
In all cases,
.f
and any...
arguments are automatically passed as globals to each future created, as they are always needed.
Reproducible random number generation (RNG)
Unless seed = FALSE
, furrr functions are guaranteed to generate
the exact same sequence of random numbers given the same initial
seed / RNG state regardless of the type of futures and scheduling
("chunking") strategy.
Setting seed = NULL
is equivalent to seed = FALSE
, except that the
future.rng.onMisuse
option is not consulted to potentially monitor the
future for faulty random number usage. See the seed
argument of
future::future()
for more details.
RNG reproducibility is achieved by pre-generating the random seeds for all
iterations (over .x
) by using L'Ecuyer-CMRG RNG streams. In each
iteration, these seeds are set before calling .f(.x[[i]], ...)
.
Note, for large length(.x)
this may introduce a large overhead.
A fixed seed
may be given as an integer vector, either as a full
L'Ecuyer-CMRG RNG seed of length 7
, or as a seed of length 1
that
will be used to generate a full L'Ecuyer-CMRG seed.
If seed = TRUE
, then .Random.seed
is returned if it holds a
L'Ecuyer-CMRG RNG seed, otherwise one is created randomly.
If seed = NA
, a L'Ecuyer-CMRG RNG seed is randomly created.
If none of the function calls .f(.x[[i]], ...)
use random number
generation, then seed = FALSE
may be used.
In addition to the above, it is possible to specify a pre-generated
sequence of RNG seeds as a list such that length(seed) == length(.x)
and
where each element is an integer seed that can be assigned to .Random.seed
.
Use this alternative with caution. Note that as.list(seq_along(.x))
is
not a valid set of such .Random.seed
values.
In all cases but seed = FALSE
, after a furrr function returns, the RNG
state of the calling R process is guaranteed to be "forwarded one step" from
the RNG state before the call. This is true regardless of the future
strategy / scheduling used. This is done in order to guarantee that an R
script calling future_map()
multiple times should be numerically
reproducible given the same initial seed.
Examples
furrr_options()
Apply a function to each element of a vector, and its index via futures
Description
These functions work exactly the same as purrr::imap()
functions,
but allow you to map in parallel.
Usage
future_imap(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_chr(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_dbl(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_int(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_lgl(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_raw(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_dfr(
.x,
.f,
...,
.id = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_imap_dfc(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_iwalk(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.x |
A list or atomic vector. |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
.id |
Either a string or Only applies to |
Value
A vector the same length as .x.
Examples
plan(multisession, workers = 2)
future_imap_chr(sample(10), ~ paste0(.y, ": ", .x))
Invoke functions via futures
Description
These functions work exactly the same as purrr::invoke_map()
functions, but
allow you to invoke in parallel.
Usage
future_invoke_map(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_chr(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_dbl(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_int(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_lgl(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_raw(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_dfr(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_invoke_map_dfc(
.f,
.x = list(NULL),
...,
.env = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.f |
A list of functions. |
.x |
A list of argument-lists the same length as |
... |
Additional arguments passed to each function. |
.env |
Environment in which |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
Examples
plan(multisession, workers = 2)
df <- dplyr::tibble(
f = c("runif", "rpois", "rnorm"),
params = list(
list(n = 10),
list(n = 5, lambda = 10),
list(n = 10, mean = -3, sd = 10)
)
)
future_invoke_map(df$f, df$params, .options = furrr_options(seed = 123))
Apply a function to each element of a vector via futures
Description
These functions work exactly the same as purrr::map()
and its variants, but
allow you to map in parallel.
Usage
future_map(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_chr(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_dbl(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_int(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_lgl(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_raw(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_dfr(
.x,
.f,
...,
.id = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_dfc(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_walk(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.x |
A list or atomic vector. |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
.id |
Either a string or Only applies to |
Value
All functions return a vector the same length as .x
.
-
future_map()
returns a list -
future_map_lgl()
a logical vector -
future_map_int()
an integer vector -
future_map_dbl()
a double vector -
future_map_chr()
a character vector
The output of .f
will be automatically typed upwards, e.g. logical ->
integer -> double -> character.
Examples
library(magrittr)
plan(multisession, workers = 2)
1:10 %>%
future_map(rnorm, n = 10, .options = furrr_options(seed = 123)) %>%
future_map_dbl(mean)
# If each element of the output is a data frame, use
# `future_map_dfr()` to row-bind them together:
mtcars %>%
split(.$cyl) %>%
future_map(~ lm(mpg ~ wt, data = .x)) %>%
future_map_dfr(~ as.data.frame(t(as.matrix(coef(.)))))
# You can be explicit about what gets exported to the workers.
# To see this, use multisession (not multicore as the forked workers
# still have access to this environment)
plan(multisession)
x <- 1
y <- 2
# This will fail, y is not exported (no black magic occurs)
try(future_map(1, ~y, .options = furrr_options(globals = "x")))
# y is exported
future_map(1, ~y, .options = furrr_options(globals = "y"))
Apply a function to each element of a vector conditionally via futures
Description
These functions work exactly the same as purrr::map_if()
and
purrr::map_at()
, but allow you to run them in parallel.
Usage
future_map_if(
.x,
.p,
.f,
...,
.else = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map_at(
.x,
.at,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.x |
A list or atomic vector. |
.p |
A single predicate function, a formula describing such a
predicate function, or a logical vector of the same length as |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.else |
A function applied to elements of |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
.at |
A character vector of names, positive numeric vector of
positions to include, or a negative numeric vector of positions to
exlude. Only those elements corresponding to |
Value
Both functions return a list the same length as .x
with the
elements conditionally transformed.
Examples
plan(multisession, workers = 2)
# Modify the even elements
future_map_if(1:5, ~.x %% 2 == 0L, ~ -1)
future_map_at(1:5, c(1, 5), ~ -1)
Map over multiple inputs simultaneously via futures
Description
These functions work exactly the same as purrr::map2()
and its variants,
but allow you to map in parallel. Note that "parallel" as described in purrr
is just saying that you are working with multiple inputs, and parallel in
this case means that you can work on multiple inputs and process them all in
parallel as well.
Usage
future_map2(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_chr(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_dbl(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_int(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_lgl(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_raw(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_dfr(
.x,
.y,
.f,
...,
.id = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_map2_dfc(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_chr(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_dbl(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_int(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_lgl(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_raw(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_dfr(
.l,
.f,
...,
.id = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pmap_dfc(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_walk2(
.x,
.y,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_pwalk(
.l,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.x , .y |
Vectors of the same length. A vector of length 1 will be recycled. |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
.id |
Either a string or Only applies to |
.l |
A list of vectors, such as a data frame. The length of |
Value
An atomic vector, list, or data frame, depending on the suffix.
Atomic vectors and lists will be named if .x
or the first element of .l
is named.
If all input is length 0, the output will be length 0. If any input is length 1, it will be recycled to the length of the longest.
Examples
plan(multisession, workers = 2)
x <- list(1, 10, 100)
y <- list(1, 2, 3)
z <- list(5, 50, 500)
future_map2(x, y, ~ .x + .y)
# Split into pieces, fit model to each piece, then predict
by_cyl <- split(mtcars, mtcars$cyl)
mods <- future_map(by_cyl, ~ lm(mpg ~ wt, data = .))
future_map2(mods, by_cyl, predict)
future_pmap(list(x, y, z), sum)
# Matching arguments by position
future_pmap(list(x, y, z), function(a, b ,c) a / (b + c))
# Vectorizing a function over multiple arguments
df <- data.frame(
x = c("apple", "banana", "cherry"),
pattern = c("p", "n", "h"),
replacement = c("x", "f", "q"),
stringsAsFactors = FALSE
)
future_pmap(df, gsub)
future_pmap_chr(df, gsub)
Modify elements selectively via futures
Description
These functions work exactly the same as purrr::modify()
functions, but
allow you to modify in parallel.
Usage
future_modify(
.x,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_modify_at(
.x,
.at,
.f,
...,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
future_modify_if(
.x,
.p,
.f,
...,
.else = NULL,
.options = furrr_options(),
.env_globals = parent.frame(),
.progress = FALSE
)
Arguments
.x |
A list or atomic vector. |
.f |
A function, formula, or vector (not necessarily atomic). If a function, it is used as is. If a formula, e.g.
This syntax allows you to create very compact anonymous functions. If character vector, numeric vector, or list, it is
converted to an extractor function. Character vectors index by
name and numeric vectors index by position; use a list to index
by position and name at different levels. If a component is not
present, the value of |
... |
Additional arguments passed on to the mapped function. |
.options |
The |
.env_globals |
The environment to look for globals required by |
.progress |
A single logical. Should a progress bar be displayed? Only works with multisession, multicore, and multiprocess futures. Note that if a multicore/multisession future falls back to sequential, then a progress bar will not be displayed. Warning: The |
.at |
A character vector of names, positive numeric vector of
positions to include, or a negative numeric vector of positions to
exlude. Only those elements corresponding to |
.p |
A single predicate function, a formula describing such a
predicate function, or a logical vector of the same length as |
.else |
A function applied to elements of |
Details
From purrr:
Since the transformation can alter the structure of the input;
it's your responsibility to ensure that the transformation produces a valid
output. For example, if you're modifying a data frame, .f
must preserve the
length of the input.
Value
An object the same class as .x
Examples
library(magrittr)
plan(multisession, workers = 2)
# Convert each col to character, in parallel
future_modify(mtcars, as.character)
iris %>%
future_modify_if(is.factor, as.character) %>%
str()
mtcars %>%
future_modify_at(c(1, 4, 5), as.character) %>%
str()
Deprecated furrr options
Description
As of furrr 0.3.0, future_options()
is defunct in favor of
furrr_options()
.
Usage
future_options(globals = TRUE, packages = NULL, seed = FALSE, scheduling = 1)
Arguments
globals |
A logical, a character vector, a named list, or |
packages |
A character vector, or |
seed |
A logical, an integer of length |
scheduling |
A single integer, logical, or
This argument is only used if |
Examples
try(future_options())