Title: | Tidy Messy Data |
Version: | 1.3.1 |
Description: | Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit). |
License: | MIT + file LICENSE |
URL: | https://tidyr.tidyverse.org, https://github.com/tidyverse/tidyr |
BugReports: | https://github.com/tidyverse/tidyr/issues |
Depends: | R (≥ 3.6) |
Imports: | cli (≥ 3.4.1), dplyr (≥ 1.0.10), glue, lifecycle (≥ 1.0.3), magrittr, purrr (≥ 1.0.1), rlang (≥ 1.1.1), stringr (≥ 1.5.0), tibble (≥ 2.1.1), tidyselect (≥ 1.2.0), utils, vctrs (≥ 0.5.2) |
Suggests: | covr, data.table, knitr, readr, repurrrsive (≥ 1.1.0), rmarkdown, testthat (≥ 3.0.0) |
LinkingTo: | cpp11 (≥ 0.4.0) |
VignetteBuilder: | knitr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.0 |
NeedsCompilation: | yes |
Packaged: | 2024-01-23 14:27:23 UTC; hadleywickham |
Author: | Hadley Wickham [aut, cre], Davis Vaughan [aut], Maximilian Girlich [aut], Kevin Ushey [ctb], Posit Software, PBC [cph, fnd] |
Maintainer: | Hadley Wickham <hadley@posit.co> |
Repository: | CRAN |
Date/Publication: | 2024-01-24 14:50:09 UTC |
tidyr: Tidy Messy Data
Description
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Author(s)
Maintainer: Hadley Wickham hadley@posit.co
Authors:
Davis Vaughan davis@posit.co
Maximilian Girlich
Other contributors:
Kevin Ushey kevin@posit.co [contributor]
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidyverse/tidyr/issues
Pipe operator
Description
See %>%
for more details.
Usage
lhs %>% rhs
Song rankings for Billboard top 100 in the year 2000
Description
Song rankings for Billboard top 100 in the year 2000
Usage
billboard
Format
A dataset with variables:
- artist
Artist name
- track
Song name
- date.enter
Date the song entered the top 100
- wk1 – wk76
Rank of the song in each week after it entered
Source
The "Whitburn" project, https://waxy.org/2008/05/the_whitburn_project/, (downloaded April 2008)
Check assumptions about a pivot spec
Description
check_pivot_spec()
is a developer facing helper function for validating
the pivot spec used in pivot_longer_spec()
or pivot_wider_spec()
. It is
only useful if you are extending pivot_longer()
or pivot_wider()
with
new S3 methods.
check_pivot_spec()
makes the following assertions:
-
spec
must be a data frame. -
spec
must have a character column named.name
. -
spec
must have a character column named.value
. The
.name
column must be unique.The
.name
and.value
columns must be the first two columns in the data frame, and will be reordered if that is not true.
Usage
check_pivot_spec(spec, call = caller_env())
Arguments
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character |
Examples
# A valid spec
spec <- tibble(.name = "a", .value = "b", foo = 1)
check_pivot_spec(spec)
spec <- tibble(.name = "a")
try(check_pivot_spec(spec))
# `.name` and `.value` are forced to be the first two columns
spec <- tibble(foo = 1, .value = "b", .name = "a")
check_pivot_spec(spec)
Chop and unchop
Description
Chopping and unchopping preserve the width of a data frame, changing its
length. chop()
makes df
shorter by converting rows within each group
into list-columns. unchop()
makes df
longer by expanding list-columns
so that each element of the list-column gets its own row in the output.
chop()
and unchop()
are building blocks for more complicated functions
(like unnest()
, unnest_longer()
, and unnest_wider()
) and are generally
more suitable for programming than interactive data analysis.
Usage
chop(data, cols, ..., error_call = current_env())
unchop(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
error_call = current_env()
)
Arguments
data |
A data frame. |
cols |
< For |
... |
These dots are for future extensions and must be empty. |
error_call |
The execution environment of a currently
running function, e.g. |
keep_empty |
By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like |
ptype |
Optionally, a named list of column name-prototype pairs to
coerce |
Details
Generally, unchopping is more useful than chopping because it simplifies
a complex data structure, and nest()
ing is usually more appropriate
than chop()
ing since it better preserves the connections between
observations.
chop()
creates list-columns of class vctrs::list_of()
to ensure
consistent behaviour when the chopped data frame is emptied. For
instance this helps getting back the original column types after
the roundtrip chop and unchop. Because <list_of>
keeps tracks of
the type of its elements, unchop()
is able to reconstitute the
correct vector type even for empty list-columns.
Examples
# Chop ----------------------------------------------------------------------
df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
# Note that we get one row of output for each unique combination of
# non-chopped variables
df %>% chop(c(y, z))
# cf nest
df %>% nest(data = c(y, z))
# Unchop --------------------------------------------------------------------
df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3))
df %>% unchop(y)
df %>% unchop(y, keep_empty = TRUE)
# unchop will error if the types are not compatible:
df <- tibble(x = 1:2, y = list("1", 1:3))
try(df %>% unchop(y))
# Unchopping a list-col of data frames must generate a df-col because
# unchop leaves the column names unchanged
df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2)))
df %>% unchop(y)
df %>% unchop(y, keep_empty = TRUE)
Data from the Centers for Medicare & Medicaid Services
Description
Two datasets from public data provided the Centers for Medicare & Medicaid Services, https://data.cms.gov.
-
cms_patient_experience
contains some lightly cleaned data from "Hospice - Provider Data", which provides a list of hospice agencies along with some data on quality of patient care, https://data.cms.gov/provider-data/dataset/252m-zfp9. -
cms_patient_care
"Doctors and Clinicians Quality Payment Program PY 2020 Virtual Group Public Reporting", https://data.cms.gov/provider-data/dataset/8c70-d353
Usage
cms_patient_experience
cms_patient_care
Format
cms_patient_experience
is a data frame with 500 observations and
five variables:
- org_pac_id,org_nm
Organisation ID and name
- measure_cd,measure_title
Measure code and title
- prf_rate
Measure performance rate
cms_patient_care
is a data frame with 252 observations and
five variables:
- ccn,facility_name
Facility ID and name
- measure_abbr
Abbreviated measurement title, suitable for use as variable name
- score
Measure score
- type
Whether score refers to the rating out of 100 ("observed"), or the maximum possible value of the raw score ("denominator")
Examples
cms_patient_experience %>%
dplyr::distinct(measure_cd, measure_title)
cms_patient_experience %>%
pivot_wider(
id_cols = starts_with("org"),
names_from = measure_cd,
values_from = prf_rate
)
cms_patient_care %>%
pivot_wider(
names_from = type,
values_from = score
)
cms_patient_care %>%
pivot_wider(
names_from = measure_abbr,
values_from = score
)
cms_patient_care %>%
pivot_wider(
names_from = c(measure_abbr, type),
values_from = score
)
Complete a data frame with missing combinations of data
Description
Turns implicit missing values into explicit missing values. This is a wrapper
around expand()
, dplyr::full_join()
and replace_na()
that's useful for
completing missing combinations of data.
Usage
complete(data, ..., fill = list(), explicit = TRUE)
Arguments
data |
A data frame. |
... |
<
When used with factors, When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
|
fill |
A named list that for each variable supplies a single value to
use instead of |
explicit |
Should both implicit (newly created) and explicit
(pre-existing) missing values be filled by |
Grouped data frames
With grouped data frames created by dplyr::group_by()
, complete()
operates within each group. Because of this, you cannot complete a grouping
column.
Examples
df <- tibble(
group = c(1:2, 1, 2),
item_id = c(1:2, 2, 3),
item_name = c("a", "a", "b", "b"),
value1 = c(1, NA, 3, 4),
value2 = 4:7
)
df
# Combinations --------------------------------------------------------------
# Generate all possible combinations of `group`, `item_id`, and `item_name`
# (whether or not they appear in the data)
df %>% complete(group, item_id, item_name)
# Cross all possible `group` values with the unique pairs of
# `(item_id, item_name)` that already exist in the data
df %>% complete(group, nesting(item_id, item_name))
# Within each `group`, generate all possible combinations of
# `item_id` and `item_name` that occur in that group
df %>%
dplyr::group_by(group) %>%
complete(item_id, item_name)
# Supplying values for new rows ---------------------------------------------
# Use `fill` to replace NAs with some value. By default, affects both new
# (implicit) and pre-existing (explicit) missing values.
df %>%
complete(
group,
nesting(item_id, item_name),
fill = list(value1 = 0, value2 = 99)
)
# Limit the fill to only the newly created (i.e. previously implicit)
# missing values with `explicit = FALSE`
df %>%
complete(
group,
nesting(item_id, item_name),
fill = list(value1 = 0, value2 = 99),
explicit = FALSE
)
Completed construction in the US in 2018
Description
Completed construction in the US in 2018
Usage
construction
Format
A dataset with variables:
- Year,Month
Record date
1 unit
,2 to 4 units
,5 units or mote
Number of completed units of each size
- Northeast,Midwest,South,West
Number of completed units in each region
Source
Completions of "New Residential Construction" found in Table 5 at https://www.census.gov/construction/nrc/xls/newresconst.xls (downloaded March 2019)
Deprecated SE versions of main verbs
Description
tidyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with tidyr. However, tidyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
Unquoting triggers immediate evaluation of its operand and inlines
the result within the captured expression. This result can be a
value or an expression to be evaluated later with the rest of the
argument. See vignette("programming", "dplyr")
for more information.
Usage
complete_(data, cols, fill = list(), ...)
drop_na_(data, vars)
expand_(data, dots, ...)
crossing_(x)
nesting_(x)
extract_(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
fill_(data, fill_cols, .direction = c("down", "up"))
gather_(
data,
key_col,
value_col,
gather_cols,
na.rm = FALSE,
convert = FALSE,
factor_key = FALSE
)
nest_(...)
separate_rows_(data, cols, sep = "[^[:alnum:].]+", convert = FALSE)
separate_(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
spread_(
data,
key_col,
value_col,
fill = NA,
convert = FALSE,
drop = TRUE,
sep = NULL
)
unite_(data, col, from, sep = "_", remove = TRUE)
unnest_(...)
Arguments
data |
A data frame |
fill |
A named list that for each variable supplies a single value to
use instead of |
... |
<
When used with factors, When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
|
vars , cols , col |
Name of columns. |
x |
For |
into |
Names of new variables to create as character vector.
Use |
regex |
A string representing a regular expression used to extract the
desired values. There should be one group (defined by |
remove |
If |
convert |
If NB: this will cause string |
fill_cols |
Character vector of column names. |
.direction |
Direction in which to fill missing values. Currently either "down" (the default), "up", "downup" (i.e. first down and then up) or "updown" (first up and then down). |
key_col , value_col |
Strings giving names of key and value cols. |
gather_cols |
Character vector giving column names to be gathered into pair of key-value columns. |
na.rm |
If |
factor_key |
If |
sep |
Separator delimiting collapsed values. |
extra |
If
|
drop |
If |
from |
Names of existing columns as character vector |
Drop rows containing missing values
Description
drop_na()
drops rows where any column specified by ...
contains a
missing value.
Usage
drop_na(data, ...)
Arguments
data |
A data frame. |
... |
< |
Details
Another way to interpret drop_na()
is that it only keeps the "complete"
rows (where no rows contain missing values). Internally, this completeness is
computed through vctrs::vec_detect_complete()
.
Examples
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na()
df %>% drop_na(x)
vars <- "y"
df %>% drop_na(x, any_of(vars))
Expand data frame to include all possible combinations of values
Description
expand()
generates all combination of variables found in a dataset.
It is paired with nesting()
and crossing()
helpers. crossing()
is a wrapper around expand_grid()
that de-duplicates and sorts its inputs;
nesting()
is a helper that only finds combinations already present in the
data.
expand()
is often useful in conjunction with joins:
use it with
right_join()
to convert implicit missing values to explicit missing values (e.g., fill in gaps in your data frame).use it with
anti_join()
to figure out which combinations are missing (e.g., identify gaps in your data frame).
Usage
expand(data, ..., .name_repair = "check_unique")
crossing(..., .name_repair = "check_unique")
nesting(..., .name_repair = "check_unique")
Arguments
data |
A data frame. |
... |
<
When used with factors, When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
|
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
Grouped data frames
With grouped data frames created by dplyr::group_by()
, expand()
operates
within each group. Because of this, you cannot expand on a grouping column.
See Also
complete()
to expand list objects. expand_grid()
to input vectors rather than a data frame.
Examples
# Finding combinations ------------------------------------------------------
fruits <- tibble(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2011, 2012),
size = factor(
c("XS", "S", "M", "S", "S", "M"),
levels = c("XS", "S", "M", "L")
),
weights = rnorm(6, as.numeric(size) + 2)
)
# All combinations, including factor levels that are not used
fruits %>% expand(type)
fruits %>% expand(size)
fruits %>% expand(type, size)
fruits %>% expand(type, size, year)
# Only combinations that already appear in the data
fruits %>% expand(nesting(type))
fruits %>% expand(nesting(size))
fruits %>% expand(nesting(type, size))
fruits %>% expand(nesting(type, size, year))
# Other uses ----------------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables
fruits %>% expand(type, size, full_seq(year, 1))
fruits %>% expand(type, size, 2010:2013)
# Use `anti_join()` to determine which observations are missing
all <- fruits %>% expand(type, size, year)
all
all %>% dplyr::anti_join(fruits)
# Use with `right_join()` to fill in missing rows (like `complete()`)
fruits %>% dplyr::right_join(all)
# Use with `group_by()` to expand within each group
fruits %>%
dplyr::group_by(type) %>%
expand(year, size)
Create a tibble from all combinations of inputs
Description
expand_grid()
is heavily motivated by expand.grid()
.
Compared to expand.grid()
, it:
Produces sorted output (by varying the first column the slowest, rather than the fastest).
Returns a tibble, not a data frame.
Never converts strings to factors.
Does not add any additional attributes.
Can expand any generalised vector, including data frames.
Usage
expand_grid(..., .name_repair = "check_unique")
Arguments
... |
Name-value pairs. The name will become the column name in the output. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
Value
A tibble with one column for each input in ...
. The output
will have one row for each combination of the inputs, i.e. the size
be equal to the product of the sizes of the inputs. This implies
that if any input has length 0, the output will have zero rows.
Examples
expand_grid(x = 1:3, y = 1:2)
expand_grid(l1 = letters, l2 = LETTERS)
# Can also expand data frames
expand_grid(df = tibble(x = 1:2, y = c(2, 1)), z = 1:3)
# And matrices
expand_grid(x1 = matrix(1:4, nrow = 2), x2 = matrix(5:8, nrow = 2))
Extract a character column into multiple columns using regular expression groups
Description
extract()
has been superseded in favour of separate_wider_regex()
because it has a more polished API and better handling of problems.
Superseded functions will not go away, but will only receive critical bug
fixes.
Given a regular expression with capturing groups, extract()
turns
each group into a new column. If the groups don't match, or the input
is NA, the output will be NA.
Usage
extract(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
Arguments
data |
A data frame. |
col |
< |
into |
Names of new variables to create as character vector.
Use |
regex |
A string representing a regular expression used to extract the
desired values. There should be one group (defined by |
remove |
If |
convert |
If NB: this will cause string |
... |
Additional arguments passed on to methods. |
See Also
separate()
to split up by a separator.
Examples
df <- tibble(x = c(NA, "a-b", "a-d", "b-c", "d-e"))
df %>% extract(x, "A")
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
# Now recommended
df %>%
separate_wider_regex(
x,
patterns = c(A = "[[:alnum:]]+", "-", B = "[[:alnum:]]+")
)
# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
Extract numeric component of variable.
Description
DEPRECATED: please use readr::parse_number()
instead.
Usage
extract_numeric(x)
Arguments
x |
A character vector (or a factor). |
Fill in missing values with previous or next value
Description
Fills missing values in selected columns using the next or previous entry. This is useful in the common output format where values are not repeated, and are only recorded when they change.
Usage
fill(data, ..., .direction = c("down", "up", "downup", "updown"))
Arguments
data |
A data frame. |
... |
< |
.direction |
Direction in which to fill missing values. Currently either "down" (the default), "up", "downup" (i.e. first down and then up) or "updown" (first up and then down). |
Details
Missing values are replaced in atomic vectors; NULL
s are replaced in lists.
Grouped data frames
With grouped data frames created by dplyr::group_by()
, fill()
will be
applied within each group, meaning that it won't fill across group
boundaries.
Examples
# direction = "down" --------------------------------------------------------
# Value (year) is recorded only when it changes
sales <- tibble::tribble(
~quarter, ~year, ~sales,
"Q1", 2000, 66013,
"Q2", NA, 69182,
"Q3", NA, 53175,
"Q4", NA, 21001,
"Q1", 2001, 46036,
"Q2", NA, 58842,
"Q3", NA, 44568,
"Q4", NA, 50197,
"Q1", 2002, 39113,
"Q2", NA, 41668,
"Q3", NA, 30144,
"Q4", NA, 52897,
"Q1", 2004, 32129,
"Q2", NA, 67686,
"Q3", NA, 31768,
"Q4", NA, 49094
)
# `fill()` defaults to replacing missing data from top to bottom
sales %>% fill(year)
# direction = "up" ----------------------------------------------------------
# Value (pet_type) is missing above
tidy_pets <- tibble::tribble(
~rank, ~pet_type, ~breed,
1L, NA, "Boston Terrier",
2L, NA, "Retrievers (Labrador)",
3L, NA, "Retrievers (Golden)",
4L, NA, "French Bulldogs",
5L, NA, "Bulldogs",
6L, "Dog", "Beagles",
1L, NA, "Persian",
2L, NA, "Maine Coon",
3L, NA, "Ragdoll",
4L, NA, "Exotic",
5L, NA, "Siamese",
6L, "Cat", "American Short"
)
# For values that are missing above you can use `.direction = "up"`
tidy_pets %>%
fill(pet_type, .direction = "up")
# direction = "downup" ------------------------------------------------------
# Value (n_squirrels) is missing above and below within a group
squirrels <- tibble::tribble(
~group, ~name, ~role, ~n_squirrels,
1, "Sam", "Observer", NA,
1, "Mara", "Scorekeeper", 8,
1, "Jesse", "Observer", NA,
1, "Tom", "Observer", NA,
2, "Mike", "Observer", NA,
2, "Rachael", "Observer", NA,
2, "Sydekea", "Scorekeeper", 14,
2, "Gabriela", "Observer", NA,
3, "Derrick", "Observer", NA,
3, "Kara", "Scorekeeper", 9,
3, "Emily", "Observer", NA,
3, "Danielle", "Observer", NA
)
# The values are inconsistently missing by position within the group
# Use .direction = "downup" to fill missing values in both directions
squirrels %>%
dplyr::group_by(group) %>%
fill(n_squirrels, .direction = "downup") %>%
dplyr::ungroup()
# Using `.direction = "updown"` accomplishes the same goal in this example
Fish encounters
Description
Information about fish swimming down a river: each station represents an autonomous monitor that records if a tagged fish was seen at that location. Fish travel in one direction (migrating downstream). Information about misses is just as important as hits, but is not directly recorded in this form of the data.
Usage
fish_encounters
Format
A dataset with variables:
- fish
Fish identifier
- station
Measurement station
- seen
Was the fish seen? (1 if yes, and true for all rows)
Source
Dataset provided by Myfanwy Johnston; more details at https://fishsciences.github.io/post/visualizing-fish-encounter-histories/
Create the full sequence of values in a vector
Description
This is useful if you want to fill in missing values that should have
been observed but weren't. For example, full_seq(c(1, 2, 4, 6), 1)
will return 1:6
.
Usage
full_seq(x, period, tol = 1e-06)
Arguments
x |
A numeric vector. |
period |
Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity. |
tol |
Numerical tolerance for checking periodicity. |
Examples
full_seq(c(1, 2, 4, 5, 10), 1)
Gather columns into key-value pairs
Description
Development on gather()
is complete, and for new code we recommend
switching to pivot_longer()
, which is easier to use, more featureful, and
still under active development.
df %>% gather("key", "value", x, y, z)
is equivalent to
df %>% pivot_longer(c(x, y, z), names_to = "key", values_to = "value")
See more details in vignette("pivot")
.
Usage
gather(
data,
key = "key",
value = "value",
...,
na.rm = FALSE,
convert = FALSE,
factor_key = FALSE
)
Arguments
data |
A data frame. |
key , value |
Names of new key and value columns, as strings or symbols. This argument is passed by expression and supports
quasiquotation (you can unquote strings
and symbols). The name is captured from the expression with
|
... |
A selection of columns. If empty, all variables are
selected. You can supply bare variable names, select all
variables between x and z with |
na.rm |
If |
convert |
If |
factor_key |
If |
Rules for selection
Arguments for selecting columns are passed to tidyselect::vars_select()
and are treated specially. Unlike other verbs, selecting functions make a
strict distinction between data expressions and context expressions.
A data expression is either a bare name like
x
or an expression likex:y
orc(x, y)
. In a data expression, you can only refer to columns from the data frame.Everything else is a context expression in which you can only refer to objects that you have defined with
<-
.
For instance, col1:col3
is a data expression that refers to data
columns, while seq(start, end)
is a context expression that
refers to objects from the contexts.
If you need to refer to contextual objects from a data expression, you can
use all_of()
or any_of()
. These functions are used to select
data-variables whose names are stored in a env-variable. For instance,
all_of(a)
selects the variables listed in the character vector a
.
For more details, see the tidyselect::select_helpers()
documentation.
Examples
# From https://stackoverflow.com/questions/1181060
stocks <- tibble(
time = as.Date("2009-01-01") + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
gather(stocks, "stock", "price", -time)
stocks %>% gather("stock", "price", -time)
# get first observation for each Species in iris data -- base R
mini_iris <- iris[c(1, 51, 101), ]
# gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
gather(mini_iris, key = "flower_att", value = "measurement",
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
# same result but less verbose
gather(mini_iris, key = "flower_att", value = "measurement", -Species)
Hoist values out of list-columns
Description
hoist()
allows you to selectively pull components of a list-column
into their own top-level columns, using the same syntax as purrr::pluck()
.
Learn more in vignette("rectangle")
.
Usage
hoist(
.data,
.col,
...,
.remove = TRUE,
.simplify = TRUE,
.ptype = NULL,
.transform = NULL
)
Arguments
.data |
A data frame. |
.col |
< |
... |
< The column names must be unique in a call to |
.remove |
If |
.simplify |
If |
.ptype |
Optionally, a named list of prototypes declaring the desired output type of each component. Alternatively, a single empty prototype can be supplied, which will be applied to all components. Use this argument if you want to check that each element has the type you expect when simplifying. If a |
.transform |
Optionally, a named list of transformation functions applied to each component. Alternatively, a single function can be supplied, which will be applied to all components. Use this argument if you want to transform or parse individual elements as they are extracted. When both |
See Also
Other rectangling:
unnest_longer()
,
unnest_wider()
,
unnest()
Examples
df <- tibble(
character = c("Toothless", "Dory"),
metadata = list(
list(
species = "dragon",
color = "black",
films = c(
"How to Train Your Dragon",
"How to Train Your Dragon 2",
"How to Train Your Dragon: The Hidden World"
)
),
list(
species = "blue tang",
color = "blue",
films = c("Finding Nemo", "Finding Dory")
)
)
)
df
# Extract only specified components
df %>% hoist(metadata,
"species",
first_film = list("films", 1L),
third_film = list("films", 3L)
)
Household data
Description
This dataset is based on an example in
vignette("datatable-reshape", package = "data.table")
Usage
household
Format
A data frame with 5 rows and 5 columns:
- family
Family identifier
- dob_child1
Date of birth of first child
- dob_child2
Date of birth of second child
- name_child1
Name of first child
?
- name_child2
Name of second child
Nest rows into a list-column of data frames
Description
Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.
Learn more in vignette("nest")
.
Usage
nest(.data, ..., .by = NULL, .key = NULL, .names_sep = NULL)
Arguments
.data |
A data frame. |
... |
< Specified using name-variable pairs of the form
If not supplied, then
|
.by |
<
If not supplied, then |
.key |
The name of the resulting nested column. Only applicable when
If |
.names_sep |
If |
Details
If neither ...
nor .by
are supplied, nest()
will nest all variables,
and will use the column name supplied through .key
.
New syntax
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
Grouped data frames
df %>% nest(data = c(x, y))
specifies the columns to be nested; i.e. the
columns that will appear in the inner data frame. df %>% nest(.by = c(x, y))
specifies the columns to nest by; i.e. the columns that will remain in
the outer data frame. An alternative way to achieve the latter is to nest()
a grouped data frame created by dplyr::group_by()
. The grouping variables
remain in the outer data frame and the others are nested. The result
preserves the grouping of the input.
Variables supplied to nest()
will override grouping variables so that
df %>% group_by(x, y) %>% nest(data = !z)
will be equivalent to
df %>% nest(data = !z)
.
You can't supply .by
with a grouped data frame, as the groups already
represent what you are nesting by.
Examples
df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
# Specify variables to nest using name-variable pairs.
# Note that we get one row of output for each unique combination of
# non-nested variables.
df %>% nest(data = c(y, z))
# Specify variables to nest by (rather than variables to nest) using `.by`
df %>% nest(.by = x)
# In this case, since `...` isn't used you can specify the resulting column
# name with `.key`
df %>% nest(.by = x, .key = "cols")
# Use tidyselect syntax and helpers, just like in `dplyr::select()`
df %>% nest(data = any_of(c("y", "z")))
# `...` and `.by` can be used together to drop columns you no longer need,
# or to include the columns you are nesting by in the inner data frame too.
# This drops `z`:
df %>% nest(data = y, .by = x)
# This includes `x` in the inner data frame:
df %>% nest(data = everything(), .by = x)
# Multiple nesting structures can be specified at once
iris %>%
nest(petal = starts_with("Petal"), sepal = starts_with("Sepal"))
iris %>%
nest(width = contains("Width"), length = contains("Length"))
# Nesting a grouped data frame nests all variables apart from the group vars
fish_encounters %>%
dplyr::group_by(fish) %>%
nest()
# That is similar to `nest(.by = )`, except here the result isn't grouped
fish_encounters %>%
nest(.by = fish)
# Nesting is often useful for creating per group models
mtcars %>%
nest(.by = cyl) %>%
dplyr::mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df)))
Legacy versions of nest()
and unnest()
Description
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
. The majority
of existing usage should be automatically translated to the new syntax with a
warning. However, if you need to quickly roll back to the previous behaviour,
these functions provide the previous interface. To make old code work as is,
add the following code to the top of your script:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
Usage
nest_legacy(data, ..., .key = "data")
unnest_legacy(data, ..., .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)
Arguments
data |
A data frame. |
... |
Specification of columns to unnest. Use bare variable names or functions of variables. If omitted, defaults to all list-cols. |
.key |
The name of the new column, as a string or symbol. This argument
is passed by expression and supports
quasiquotation (you can unquote strings and
symbols). The name is captured from the expression with |
.drop |
Should additional list columns be dropped? By default,
|
.id |
Data frame identifier - if supplied, will create a new column with
name |
.sep |
If non- |
.preserve |
Optionally, list-columns to preserve in the output. These
will be duplicated in the same way as atomic vectors. This has
|
Examples
# Nest and unnest are inverses
df <- tibble(x = c(1, 1, 2), y = 3:1)
df %>% nest_legacy(y)
df %>% nest_legacy(y) %>% unnest_legacy()
# nesting -------------------------------------------------------------------
as_tibble(iris) %>% nest_legacy(!Species)
as_tibble(chickwts) %>% nest_legacy(weight)
# unnesting -----------------------------------------------------------------
df <- tibble(
x = 1:2,
y = list(
tibble(z = 1),
tibble(z = 3:4)
)
)
df %>% unnest_legacy(y)
# You can also unnest multiple columns simultaneously
df <- tibble(
a = list(c("a", "b"), "c"),
b = list(1:2, 3),
c = c(11, 22)
)
df %>% unnest_legacy(a, b)
# If you omit the column names, it'll unnest all list-cols
df %>% unnest_legacy()
Pack and unpack
Description
Packing and unpacking preserve the length of a data frame, changing its
width. pack()
makes df
narrow by collapsing a set of columns into a
single df-column. unpack()
makes data
wider by expanding df-columns
back out into individual columns.
Usage
pack(.data, ..., .names_sep = NULL, .error_call = current_env())
unpack(
data,
cols,
...,
names_sep = NULL,
names_repair = "check_unique",
error_call = current_env()
)
Arguments
... |
For For |
data , .data |
A data frame. |
cols |
< |
names_sep , .names_sep |
If If a string, the inner and outer names will be used together. In
|
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
error_call , .error_call |
The execution environment of a currently
running function, e.g. |
Details
Generally, unpacking is more useful than packing because it simplifies a complex data structure. Currently, few functions work with df-cols, and they are mostly a curiosity, but seem worth exploring further because they mimic the nested column headers that are so popular in Excel.
Examples
# Packing -------------------------------------------------------------------
# It's not currently clear why you would ever want to pack columns
# since few functions work with this sort of data.
df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3)
df
df %>% pack(x = starts_with("x"))
df %>% pack(x = c(x1, x2, x3), y = y)
# .names_sep allows you to strip off common prefixes; this
# acts as a natural inverse to name_sep in unpack()
iris %>%
as_tibble() %>%
pack(
Sepal = starts_with("Sepal"),
Petal = starts_with("Petal"),
.names_sep = "."
)
# Unpacking -----------------------------------------------------------------
df <- tibble(
x = 1:3,
y = tibble(a = 1:3, b = 3:1),
z = tibble(X = c("a", "b", "c"), Y = runif(3), Z = c(TRUE, FALSE, NA))
)
df
df %>% unpack(y)
df %>% unpack(c(y, z))
df %>% unpack(c(y, z), names_sep = "_")
Pivot data from wide to long
Description
pivot_longer()
"lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
pivot_wider()
Learn more in vignette("pivot")
.
Usage
pivot_longer(
data,
cols,
...,
cols_vary = "fastest",
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL
)
Arguments
data |
A data frame to pivot. |
cols |
< |
... |
Additional arguments passed on to methods. |
cols_vary |
When pivoting
|
names_to |
A character vector specifying the new column or columns to
create from the information stored in the column names of
|
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep , names_pattern |
If
If these arguments do not give you enough control, use
|
names_ptypes , values_ptypes |
Optionally, a list of column name-prototype
pairs. Alternatively, a single empty prototype can be supplied, which will
be applied to all columns. A prototype (or ptype for short) is a
zero-length vector (like |
names_transform , values_transform |
Optionally, a list of column
name-function pairs. Alternatively, a single function can be supplied,
which will be applied to all columns. Use these arguments if you need to
change the types of specific columns. For example, If not specified, the type of the columns generated from |
names_repair |
What happens if the output has invalid column names?
The default, |
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
values_drop_na |
If |
Details
pivot_longer()
is an updated approach to gather()
, designed to be both
simpler to use and to handle more use cases. We recommend you use
pivot_longer()
for new code; gather()
isn't going away but is no longer
under active development.
Examples
# See vignette("pivot") for examples and explanation
# Simplest case where column names are character data
relig_income
relig_income %>%
pivot_longer(!religion, names_to = "income", values_to = "count")
# Slightly more complex case where columns have common prefix,
# and missing missings are structural so should be dropped.
billboard
billboard %>%
pivot_longer(
cols = starts_with("wk"),
names_to = "week",
names_prefix = "wk",
values_to = "rank",
values_drop_na = TRUE
)
# Multiple variables stored in column names
who %>% pivot_longer(
cols = new_sp_m014:newrel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
values_to = "count"
)
# Multiple observations per row. Since all columns are used in the pivoting
# process, we'll use `cols_vary` to keep values from the original columns
# close together in the output.
anscombe
anscombe %>%
pivot_longer(
everything(),
cols_vary = "slowest",
names_to = c(".value", "set"),
names_pattern = "(.)(.)"
)
Pivot data from wide to long using a spec
Description
This is a low level interface to pivoting, inspired by the cdata package, that allows you to describe pivoting with a data frame.
Usage
pivot_longer_spec(
data,
spec,
...,
cols_vary = "fastest",
names_repair = "check_unique",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL,
error_call = current_env()
)
build_longer_spec(
data,
cols,
...,
names_to = "name",
values_to = "value",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
error_call = current_env()
)
Arguments
data |
A data frame to pivot. |
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the column names turns into columns in the result. Must be a data frame containing character |
... |
These dots are for future extensions and must be empty. |
cols_vary |
When pivoting
|
names_repair |
What happens if the output has invalid column names?
The default, |
values_drop_na |
If |
error_call |
The execution environment of a currently
running function, e.g. |
cols |
< |
names_to |
A character vector specifying the new column or columns to
create from the information stored in the column names of
|
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep , names_pattern |
If
If these arguments do not give you enough control, use
|
names_ptypes , values_ptypes |
Optionally, a list of column name-prototype
pairs. Alternatively, a single empty prototype can be supplied, which will
be applied to all columns. A prototype (or ptype for short) is a
zero-length vector (like |
names_transform , values_transform |
Optionally, a list of column
name-function pairs. Alternatively, a single function can be supplied,
which will be applied to all columns. Use these arguments if you need to
change the types of specific columns. For example, If not specified, the type of the columns generated from |
Examples
# See vignette("pivot") for examples and explanation
# Use `build_longer_spec()` to build `spec` using similar syntax to `pivot_longer()`
# and run `pivot_longer_spec()` based on `spec`.
spec <- relig_income %>% build_longer_spec(
cols = !religion,
names_to = "income",
values_to = "count"
)
spec
pivot_longer_spec(relig_income, spec)
# Is equivalent to:
relig_income %>% pivot_longer(
cols = !religion,
names_to = "income",
values_to = "count"
)
Pivot data from long to wide
Description
pivot_wider()
"widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer()
.
Learn more in vignette("pivot")
.
Usage
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL
)
Arguments
data |
A data frame to pivot. |
... |
Additional arguments passed on to methods. |
id_cols |
< Defaults to all columns in |
id_expand |
Should the values in the |
names_from , values_from |
< If |
names_prefix |
String added to the start of every variable name. This is
particularly useful if |
names_sep |
If |
names_glue |
Instead of |
names_sort |
Should the column names be sorted? If |
names_vary |
When
|
names_expand |
Should the values in the |
names_repair |
What happens if the output has invalid column names?
The default, |
values_fill |
Optionally, a (scalar) value that specifies what each
This can be a named list if you want to apply different fill values to different value columns. |
values_fn |
Optionally, a function applied to the value in each cell
in the output. You will typically use this when the combination of
This can be a named list if you want to apply different aggregations
to different |
unused_fn |
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by The default drops all unused columns from the result. This can be a named list if you want to apply different aggregations to different unused columns.
This is similar to grouping by the |
Details
pivot_wider()
is an updated approach to spread()
, designed to be both
simpler to use and to handle more use cases. We recommend you use
pivot_wider()
for new code; spread()
isn't going away but is no longer
under active development.
See Also
pivot_wider_spec()
to pivot "by hand" with a data frame that
defines a pivoting specification.
Examples
# See vignette("pivot") for examples and explanation
fish_encounters
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen)
# Fill in missing values
fish_encounters %>%
pivot_wider(names_from = station, values_from = seen, values_fill = 0)
# Generate column names from multiple variables
us_rent_income
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe)
)
# You can control whether `names_from` values vary fastest or slowest
# relative to the `values_from` column names using `names_vary`.
us_rent_income %>%
pivot_wider(
names_from = variable,
values_from = c(estimate, moe),
names_vary = "slowest"
)
# When there are multiple `names_from` or `values_from`, you can use
# use `names_sep` or `names_glue` to control the output variable names
us_rent_income %>%
pivot_wider(
names_from = variable,
names_sep = ".",
values_from = c(estimate, moe)
)
us_rent_income %>%
pivot_wider(
names_from = variable,
names_glue = "{variable}_{.value}",
values_from = c(estimate, moe)
)
# Can perform aggregation with `values_fn`
warpbreaks <- as_tibble(warpbreaks[c("wool", "tension", "breaks")])
warpbreaks
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = mean
)
# Can pass an anonymous function to `values_fn` when you
# need to supply additional arguments
warpbreaks$breaks[1] <- NA
warpbreaks %>%
pivot_wider(
names_from = wool,
values_from = breaks,
values_fn = ~ mean(.x, na.rm = TRUE)
)
Pivot data from long to wide using a spec
Description
This is a low level interface to pivoting, inspired by the cdata package, that allows you to describe pivoting with a data frame.
Usage
pivot_wider_spec(
data,
spec,
...,
names_repair = "check_unique",
id_cols = NULL,
id_expand = FALSE,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL,
error_call = current_env()
)
build_wider_spec(
data,
...,
names_from = name,
values_from = value,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
error_call = current_env()
)
Arguments
data |
A data frame to pivot. |
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character |
... |
These dots are for future extensions and must be empty. |
names_repair |
What happens if the output has invalid column names?
The default, |
id_cols |
< |
id_expand |
Should the values in the |
values_fill |
Optionally, a (scalar) value that specifies what each
This can be a named list if you want to apply different fill values to different value columns. |
values_fn |
Optionally, a function applied to the value in each cell
in the output. You will typically use this when the combination of
This can be a named list if you want to apply different aggregations
to different |
unused_fn |
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by The default drops all unused columns from the result. This can be a named list if you want to apply different aggregations to different unused columns.
This is similar to grouping by the |
error_call |
The execution environment of a currently
running function, e.g. |
names_from , values_from |
< If |
names_prefix |
String added to the start of every variable name. This is
particularly useful if |
names_sep |
If |
names_glue |
Instead of |
names_sort |
Should the column names be sorted? If |
names_vary |
When
|
names_expand |
Should the values in the |
Examples
# See vignette("pivot") for examples and explanation
us_rent_income
spec1 <- us_rent_income %>%
build_wider_spec(names_from = variable, values_from = c(estimate, moe))
spec1
us_rent_income %>%
pivot_wider_spec(spec1)
# Is equivalent to
us_rent_income %>%
pivot_wider(names_from = variable, values_from = c(estimate, moe))
# `pivot_wider_spec()` provides more control over column names and output format
# instead of creating columns with estimate_ and moe_ prefixes,
# keep original variable name for estimates and attach _moe as suffix
spec2 <- tibble(
.name = c("income", "rent", "income_moe", "rent_moe"),
.value = c("estimate", "estimate", "moe", "moe"),
variable = c("income", "rent", "income", "rent")
)
us_rent_income %>%
pivot_wider_spec(spec2)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- tibble
- tidyselect
all_of
,any_of
,contains
,ends_with
,everything
,last_col
,matches
,num_range
,one_of
,starts_with
Pew religion and income survey
Description
Pew religion and income survey
Usage
relig_income
Format
A dataset with variables:
- religion
Name of religion
<$10k
-Don\'t know/refused
Number of respondees with income range in column name
Source
Downloaded from https://www.pewresearch.org/religion/religious-landscape-study/ (downloaded November 2009)
Replace NAs with specified values
Description
Replace NAs with specified values
Usage
replace_na(data, replace, ...)
Arguments
data |
A data frame or vector. |
replace |
If If |
... |
Additional arguments for methods. Currently unused. |
Value
replace_na()
returns an object with the same type as data
.
See Also
dplyr::na_if()
to replace specified values with NA
s;
dplyr::coalesce()
to replaces NA
s with values from other vectors.
Examples
# Replace NAs in a data frame
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% replace_na(list(x = 0, y = "unknown"))
# Replace NAs in a vector
df %>% dplyr::mutate(x = replace_na(x, 0))
# OR
df$x %>% replace_na(0)
df$y %>% replace_na("unknown")
# Replace NULLs in a list: NULLs are the list-col equivalent of NAs
df_list <- tibble(z = list(1:5, NULL, 10:20))
df_list %>% replace_na(list(z = list(5)))
Separate a character column into multiple columns with a regular expression or numeric locations
Description
separate()
has been superseded in favour of separate_wider_position()
and separate_wider_delim()
because the two functions make the two uses
more obvious, the API is more polished, and the handling of problems is
better. Superseded functions will not go away, but will only receive
critical bug fixes.
Given either a regular expression or a vector of character positions,
separate()
turns a single character column into multiple columns.
Usage
separate(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
extra = "warn",
fill = "warn",
...
)
Arguments
data |
A data frame. |
col |
< |
into |
Names of new variables to create as character vector.
Use |
sep |
Separator between columns. If character, If numeric, |
remove |
If |
convert |
If NB: this will cause string |
extra |
If
|
fill |
If
|
... |
Additional arguments passed on to methods. |
See Also
unite()
, the complement, extract()
which uses regular
expression capturing groups.
Examples
# If you want to split by any non-alphanumeric value (the default):
df <- tibble(x = c(NA, "x.y", "x.z", "y.z"))
df %>% separate(x, c("A", "B"))
# If you just want the second variable:
df %>% separate(x, c(NA, "B"))
# We now recommend separate_wider_delim() instead:
df %>% separate_wider_delim(x, ".", names = c("A", "B"))
df %>% separate_wider_delim(x, ".", names = c(NA, "B"))
# Controlling uneven splits -------------------------------------------------
# If every row doesn't split into the same number of pieces, use
# the extra and fill arguments to control what happens:
df <- tibble(x = c("x", "x y", "x y z", NA))
df %>% separate(x, c("a", "b"))
# The same behaviour as previous, but drops the c without warnings:
df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")
# Opposite of previous, keeping the c and filling left:
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
# Or you can keep all three:
df %>% separate(x, c("a", "b", "c"))
# To only split a specified number of times use extra = "merge":
df <- tibble(x = c("x: 123", "y: error: 7"))
df %>% separate(x, c("key", "value"), ": ", extra = "merge")
# Controlling column types --------------------------------------------------
# convert = TRUE detects column classes:
df <- tibble(x = c("x:1", "x:2", "y:4", "z", NA))
df %>% separate(x, c("key", "value"), ":") %>% str()
df %>% separate(x, c("key", "value"), ":", convert = TRUE) %>% str()
Split a string into rows
Description
Each of these functions takes a string and splits it into multiple rows:
-
separate_longer_delim()
splits by a delimiter. -
separate_longer_position()
splits by a fixed width.
Usage
separate_longer_delim(data, cols, delim, ...)
separate_longer_position(data, cols, width, ..., keep_empty = FALSE)
Arguments
data |
A data frame. |
cols |
< |
delim |
For |
... |
These dots are for future extensions and must be empty. |
width |
For |
keep_empty |
By default, you'll get |
Value
A data frame based on data
. It has the same columns, but different
rows.
Examples
df <- tibble(id = 1:4, x = c("x", "x y", "x y z", NA))
df %>% separate_longer_delim(x, delim = " ")
# You can separate multiple columns at once if they have the same structure
df <- tibble(id = 1:3, x = c("x", "x y", "x y z"), y = c("a", "a b", "a b c"))
df %>% separate_longer_delim(c(x, y), delim = " ")
# Or instead split by a fixed length
df <- tibble(id = 1:3, x = c("ab", "def", ""))
df %>% separate_longer_position(x, 1)
df %>% separate_longer_position(x, 2)
df %>% separate_longer_position(x, 2, keep_empty = TRUE)
Separate a collapsed column into multiple rows
Description
separate_rows()
has been superseded in favour of separate_longer_delim()
because it has a more consistent API with other separate functions.
Superseded functions will not go away, but will only receive critical bug
fixes.
If a variable contains observations with multiple delimited values,
separate_rows()
separates the values and places each one in its own row.
Usage
separate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)
Arguments
data |
A data frame. |
... |
< |
sep |
Separator delimiting collapsed values. |
convert |
If |
Examples
df <- tibble(
x = 1:3,
y = c("a", "d,e,f", "g,h"),
z = c("1", "2,3,4", "5,6")
)
separate_rows(df, y, z, convert = TRUE)
# Now recommended
df %>%
separate_longer_delim(c(y, z), delim = ",")
Split a string into columns
Description
Each of these functions takes a string column and splits it into multiple new columns:
-
separate_wider_delim()
splits by delimiter. -
separate_wider_position()
splits at fixed widths. -
separate_wider_regex()
splits with regular expression matches.
These functions are equivalent to separate()
and extract()
, but use
stringr as the underlying string
manipulation engine, and their interfaces reflect what we've learned from
unnest_wider()
and unnest_longer()
.
Usage
separate_wider_delim(
data,
cols,
delim,
...,
names = NULL,
names_sep = NULL,
names_repair = "check_unique",
too_few = c("error", "debug", "align_start", "align_end"),
too_many = c("error", "debug", "drop", "merge"),
cols_remove = TRUE
)
separate_wider_position(
data,
cols,
widths,
...,
names_sep = NULL,
names_repair = "check_unique",
too_few = c("error", "debug", "align_start"),
too_many = c("error", "debug", "drop"),
cols_remove = TRUE
)
separate_wider_regex(
data,
cols,
patterns,
...,
names_sep = NULL,
names_repair = "check_unique",
too_few = c("error", "debug", "align_start"),
cols_remove = TRUE
)
Arguments
data |
A data frame. |
cols |
< |
delim |
For |
... |
These dots are for future extensions and must be empty. |
names |
For |
names_sep |
If supplied, output names will be composed
of the input column name followed by the separator followed by the
new column name. Required when For |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
too_few |
What should happen if a value separates into too few pieces?
|
too_many |
What should happen if a value separates into too many pieces?
|
cols_remove |
Should the input |
widths |
A named numeric vector where the names become column names, and the values specify the column width. Unnamed components will match, but not be included in the output. |
patterns |
A named character vector where the names become column names and the values are regular expressions that match the contents of the vector. Unnamed components will match, but not be included in the output. |
Value
A data frame based on data
. It has the same rows, but different
columns:
The primary purpose of the functions are to create new columns from components of the string. For
separate_wider_delim()
the names of new columns come fromnames
. Forseparate_wider_position()
the names come from the names ofwidths
. Forseparate_wider_regex()
the names come from the names ofpatterns
.If
too_few
ortoo_many
is"debug"
, the output will contain additional columns useful for debugging:-
{col}_ok
: a logical vector which tells you if the input was ok or not. Use to quickly find the problematic rows. -
{col}_remainder
: any text remaining after separation. -
{col}_pieces
,{col}_width
,{col}_matches
: number of pieces, number of characters, and number of matches forseparate_wider_delim()
,separate_wider_position()
andseparate_regexp_wider()
respectively.
-
If
cols_remove = TRUE
(the default), the inputcols
will be removed from the output.
Examples
df <- tibble(id = 1:3, x = c("m-123", "f-455", "f-123"))
# There are three basic ways to split up a string into pieces:
# 1. with a delimiter
df %>% separate_wider_delim(x, delim = "-", names = c("gender", "unit"))
# 2. by length
df %>% separate_wider_position(x, c(gender = 1, 1, unit = 3))
# 3. defining each component with a regular expression
df %>% separate_wider_regex(x, c(gender = ".", ".", unit = "\\d+"))
# Sometimes you split on the "last" delimiter
df <- tibble(var = c("race_1", "race_2", "age_bucket_1", "age_bucket_2"))
# _delim won't help because it always splits on the first delimiter
try(df %>% separate_wider_delim(var, "_", names = c("var1", "var2")))
df %>% separate_wider_delim(var, "_", names = c("var1", "var2"), too_many = "merge")
# Instead, you can use _regex
df %>% separate_wider_regex(var, c(var1 = ".*", "_", var2 = ".*"))
# this works because * is greedy; you can mimic the _delim behaviour with .*?
df %>% separate_wider_regex(var, c(var1 = ".*?", "_", var2 = ".*"))
# If the number of components varies, it's most natural to split into rows
df <- tibble(id = 1:4, x = c("x", "x y", "x y z", NA))
df %>% separate_longer_delim(x, delim = " ")
# But separate_wider_delim() provides some tools to deal with the problem
# The default behaviour tells you that there's a problem
try(df %>% separate_wider_delim(x, delim = " ", names = c("a", "b")))
# You can get additional insight by using the debug options
df %>%
separate_wider_delim(
x,
delim = " ",
names = c("a", "b"),
too_few = "debug",
too_many = "debug"
)
# But you can suppress the warnings
df %>%
separate_wider_delim(
x,
delim = " ",
names = c("a", "b"),
too_few = "align_start",
too_many = "merge"
)
# Or choose to automatically name the columns, producing as many as needed
df %>% separate_wider_delim(x, delim = " ", names_sep = "", too_few = "align_start")
Some data about the Smith family
Description
A small demo dataset describing John and Mary Smith.
Usage
smiths
Format
A data frame with 2 rows and 5 columns.
Spread a key-value pair across multiple columns
Description
Development on spread()
is complete, and for new code we recommend
switching to pivot_wider()
, which is easier to use, more featureful, and
still under active development.
df %>% spread(key, value)
is equivalent to
df %>% pivot_wider(names_from = key, values_from = value)
See more details in vignette("pivot")
.
Usage
spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL)
Arguments
data |
A data frame. |
key , value |
< |
fill |
If set, missing values will be replaced with this value. Note
that there are two types of missingness in the input: explicit missing
values (i.e. |
convert |
If |
drop |
If |
sep |
If |
Examples
stocks <- tibble(
time = as.Date("2009-01-01") + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
stocksm <- stocks %>% gather(stock, price, -time)
stocksm %>% spread(stock, price)
stocksm %>% spread(time, price)
# Spread and gather are complements
df <- tibble(x = c("a", "b"), y = c(3, 4), z = c(5, 6))
df %>%
spread(x, y) %>%
gather("x", "y", a:b, na.rm = TRUE)
# Use 'convert = TRUE' to produce variables of mixed type
df <- tibble(
row = rep(c(1, 51), each = 3),
var = rep(c("Sepal.Length", "Species", "Species_num"), 2),
value = c(5.1, "setosa", 1, 7.0, "versicolor", 2)
)
df %>% spread(var, value) %>% str()
df %>% spread(var, value, convert = TRUE) %>% str()
Example tabular representations
Description
Data sets that demonstrate multiple ways to layout the same tabular data.
Usage
table1
table2
table3
table4a
table4b
table5
Details
table1
, table2
, table3
, table4a
, table4b
,
and table5
all display the number of TB cases documented by the World
Health Organization in Afghanistan, Brazil, and China between 1999 and 2000.
The data contains values associated with four variables (country, year,
cases, and population), but each table organizes the values in a different
layout.
The data is a subset of the data contained in the World Health Organization Global Tuberculosis Report
Source
https://www.who.int/teams/global-tuberculosis-programme/data
Argument type: data-masking
Description
This page describes the <data-masking>
argument modifier which
indicates that the argument uses data masking, a sub-type of
tidy evaluation. If you've never heard of tidy evaluation before,
start with the practical introduction in
https://r4ds.hadley.nz/functions.html#data-frame-functions then
then read more about the underlying theory in
https://rlang.r-lib.org/reference/topic-data-mask.html.
Key techniques
To allow the user to supply the column name in a function argument, embrace the argument, e.g.
filter(df, {{ var }})
.dist_summary <- function(df, var) { df %>% summarise(n = n(), min = min({{ var }}), max = max({{ var }})) } mtcars %>% dist_summary(mpg) mtcars %>% group_by(cyl) %>% dist_summary(mpg)
To work with a column name recorded as a string, use the
.data
pronoun, e.g.summarise(df, mean = mean(.data[[var]]))
.for (var in names(mtcars)) { mtcars %>% count(.data[[var]]) %>% print() } lapply(names(mtcars), function(var) mtcars %>% count(.data[[var]]))
To suppress
R CMD check
NOTE
s about unknown variables use.data$var
instead ofvar
:# has NOTE df %>% mutate(z = x + y) # no NOTE df %>% mutate(z = .data$x + .data$y)
You'll also need to import
.data
from rlang with (e.g.)@importFrom rlang .data
.
Dot-dot-dot (...)
...
automatically provides indirection, so you can use it as is
(i.e. without embracing) inside a function:
grouped_mean <- function(df, var, ...) { df %>% group_by(...) %>% summarise(mean = mean({{ var }})) }
You can also use :=
instead of =
to enable a glue-like syntax for
creating variables from user supplied data:
var_name <- "l100km" mtcars %>% mutate("{var_name}" := 235 / mpg) summarise_mean <- function(df, var) { df %>% summarise("mean_of_{{var}}" := mean({{ var }})) } mtcars %>% group_by(cyl) %>% summarise_mean(mpg)
Learn more in https://rlang.r-lib.org/reference/topic-data-mask-programming.html.
Legacy name repair
Description
Ensures all column names are unique using the approach found in
tidyr 0.8.3 and earlier. Only use this function if you want to preserve
the naming strategy, otherwise you're better off adopting the new
tidyverse standard with name_repair = "universal"
Usage
tidyr_legacy(nms, prefix = "V", sep = "")
Arguments
nms |
Character vector of names |
prefix |
prefix Prefix to use for unnamed column |
sep |
Separator to use between name and unique suffix |
Examples
df <- tibble(x = 1:2, y = list(tibble(x = 3:5), tibble(x = 4:7)))
# Doesn't work because it would produce a data frame with two
# columns called x
## Not run:
unnest(df, y)
## End(Not run)
# The new tidyverse standard:
unnest(df, y, names_repair = "universal")
# The old tidyr approach
unnest(df, y, names_repair = tidyr_legacy)
Argument type: tidy-select
Description
This page describes the <tidy-select>
argument modifier which
indicates that the argument uses tidy selection, a sub-type of
tidy evaluation. If you've never heard of tidy evaluation before,
start with the practical introduction in
https://r4ds.hadley.nz/functions.html#data-frame-functions then
then read more about the underlying theory in
https://rlang.r-lib.org/reference/topic-data-mask.html.
Overview of selection features
tidyselect implements a DSL for selecting variables. It provides helpers for selecting variables:
-
var1:var10
: variables lying betweenvar1
on the left andvar10
on the right.
-
starts_with("a")
: names that start with"a"
. -
ends_with("z")
: names that end with"z"
. -
contains("b")
: names that contain"b"
. -
matches("x.y")
: names that match regular expressionx.y
. -
num_range(x, 1:4)
: names following the pattern,x1
,x2
, ...,x4
. -
all_of(vars)
/any_of(vars)
: matches names stored in the character vectorvars
.all_of(vars)
will error if the variables aren't present;any_of(var)
will match just the variables that exist. -
everything()
: all variables. -
last_col()
: furthest column on the right. -
where(is.numeric)
: all variables whereis.numeric()
returnsTRUE
.
As well as operators for combining those selections:
-
!selection
: only variables that don't matchselection
. -
selection1 & selection2
: only variables included in bothselection1
andselection2
. -
selection1 | selection2
: all variables that match eitherselection1
orselection2
.
Key techniques
If you want the user to supply a tidyselect specification in a function argument, you need to tunnel the selection through the function argument. This is done by embracing the function argument
{{ }}
, e.gunnest(df, {{ vars }})
.If you have a character vector of column names, use
all_of()
orany_of()
, depending on whether or not you want unknown variable names to cause an error, e.gunnest(df, all_of(vars))
,unnest(df, !any_of(vars))
.To suppress
R CMD check
NOTE
s about unknown variables use"var"
instead ofvar
:
# has NOTE df %>% select(x, y, z) # no NOTE df %>% select("x", "y", "z")
"Uncount" a data frame
Description
Performs the opposite operation to dplyr::count()
, duplicating rows
according to a weighting variable (or expression).
Usage
uncount(data, weights, ..., .remove = TRUE, .id = NULL)
Arguments
data |
A data frame, tibble, or grouped tibble. |
weights |
A vector of weights. Evaluated in the context of |
... |
Additional arguments passed on to methods. |
.remove |
If |
.id |
Supply a string to create a new variable which gives a unique identifier for each created row. |
Examples
df <- tibble(x = c("a", "b"), n = c(1, 2))
uncount(df, n)
uncount(df, n, .id = "id")
# You can also use constants
uncount(df, 2)
# Or expressions
uncount(df, 2 / n)
Unite multiple columns into one by pasting strings together
Description
Convenience function to paste together multiple columns into one.
Usage
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
Arguments
data |
A data frame. |
col |
The name of the new column, as a string or symbol. This argument is passed by expression and supports
quasiquotation (you can unquote strings
and symbols). The name is captured from the expression with
|
... |
< |
sep |
Separator to use between values. |
remove |
If |
na.rm |
If |
See Also
separate()
, the complement.
Examples
df <- expand_grid(x = c("a", NA), y = c("b", NA))
df
df %>% unite("z", x:y, remove = FALSE)
# To remove missing values:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
# Separate is almost the complement of unite
df %>%
unite("xy", x:y) %>%
separate(xy, c("x", "y"))
# (but note `x` and `y` contain now "NA" not NA)
Unnest a list-column of data frames into rows and columns
Description
Unnest expands a list-column containing data frames into rows and columns.
Usage
unnest(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop = deprecated(),
.id = deprecated(),
.sep = deprecated(),
.preserve = deprecated()
)
Arguments
data |
A data frame. |
cols |
< When selecting multiple columns, values from the same row will be recycled to their common size. |
... |
|
keep_empty |
By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like |
ptype |
Optionally, a named list of column name-prototype pairs to
coerce |
names_sep |
If |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
.drop , .preserve |
|
.id |
|
.sep |
New syntax
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll receive) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
See Also
Other rectangling:
hoist()
,
unnest_longer()
,
unnest_wider()
Examples
# unnest() is designed to work with lists of data frames
df <- tibble(
x = 1:3,
y = list(
NULL,
tibble(a = 1, b = 2),
tibble(a = 1:3, b = 3:1, c = 4)
)
)
# unnest() recycles input rows for each row of the list-column
# and adds a column for each column
df %>% unnest(y)
# input rows with 0 rows in the list-column will usually disappear,
# but you can keep them (generating NAs) with keep_empty = TRUE:
df %>% unnest(y, keep_empty = TRUE)
# Multiple columns ----------------------------------------------------------
# You can unnest multiple columns simultaneously
df <- tibble(
x = 1:2,
y = list(
tibble(a = 1, b = 2),
tibble(a = 3:4, b = 5:6)
),
z = list(
tibble(c = 1, d = 2),
tibble(c = 3:4, d = 5:6)
)
)
df %>% unnest(c(y, z))
# Compare with unnesting one column at a time, which generates
# the Cartesian product
df %>%
unnest(y) %>%
unnest(z)
Automatically call unnest_wider()
or unnest_longer()
Description
unnest_auto()
picks between unnest_wider()
or unnest_longer()
by inspecting the inner names of the list-col:
If all elements are unnamed, it uses
unnest_longer(indices_include = FALSE)
.If all elements are named, and there's at least one name in common across all components, it uses
unnest_wider()
.Otherwise, it falls back to
unnest_longer(indices_include = TRUE)
.
It's handy for very rapid interactive exploration but I don't recommend using it in scripts, because it will succeed even if the underlying data radically changes.
Usage
unnest_auto(data, col)
Arguments
data |
A data frame. |
col |
< |
Unnest a list-column into rows
Description
unnest_longer()
turns each element of a list-column into a row. It
is most naturally suited to list-columns where the elements are unnamed
and the length of each element varies from row to row.
unnest_longer()
generally preserves the number of columns of x
while
modifying the number of rows.
Learn more in vignette("rectangle")
.
Usage
unnest_longer(
data,
col,
values_to = NULL,
indices_to = NULL,
indices_include = NULL,
keep_empty = FALSE,
names_repair = "check_unique",
simplify = TRUE,
ptype = NULL,
transform = NULL
)
Arguments
data |
A data frame. |
col |
< When selecting multiple columns, values from the same row will be recycled to their common size. |
values_to |
A string giving the column name (or names) to store the
unnested values in. If multiple columns are specified in |
indices_to |
A string giving the column name (or names) to store the
inner names or positions (if not named) of the values. If multiple columns
are specified in |
indices_include |
A single logical value specifying whether or not to
add an index column. If any value has inner names, the index column will be
a character vector of those names, otherwise it will be an integer vector
of positions. If If |
keep_empty |
By default, you get one row of output for each element
of the list that you are unchopping/unnesting. This means that if there's a
size-0 element (like |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
simplify |
If |
ptype |
Optionally, a named list of prototypes declaring the desired output type of each component. Alternatively, a single empty prototype can be supplied, which will be applied to all components. Use this argument if you want to check that each element has the type you expect when simplifying. If a |
transform |
Optionally, a named list of transformation functions applied to each component. Alternatively, a single function can be supplied, which will be applied to all components. Use this argument if you want to transform or parse individual elements as they are extracted. When both |
See Also
Other rectangling:
hoist()
,
unnest_wider()
,
unnest()
Examples
# `unnest_longer()` is useful when each component of the list should
# form a row
df <- tibble(
x = 1:4,
y = list(NULL, 1:3, 4:5, integer())
)
df %>% unnest_longer(y)
# Note that empty values like `NULL` and `integer()` are dropped by
# default. If you'd like to keep them, set `keep_empty = TRUE`.
df %>% unnest_longer(y, keep_empty = TRUE)
# If the inner vectors are named, the names are copied to an `_id` column
df <- tibble(
x = 1:2,
y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_longer(y)
# Multiple columns ----------------------------------------------------------
# If columns are aligned, you can unnest simultaneously
df <- tibble(
x = 1:2,
y = list(1:2, 3:4),
z = list(5:6, 7:8)
)
df %>%
unnest_longer(c(y, z))
# This is important because sequential unnesting would generate the
# Cartesian product of the rows
df %>%
unnest_longer(y) %>%
unnest_longer(z)
Unnest a list-column into columns
Description
unnest_wider()
turns each element of a list-column into a column. It
is most naturally suited to list-columns where every element is named,
and the names are consistent from row-to-row.
unnest_wider()
preserves the rows of x
while modifying the columns.
Learn more in vignette("rectangle")
.
Usage
unnest_wider(
data,
col,
names_sep = NULL,
simplify = TRUE,
strict = FALSE,
names_repair = "check_unique",
ptype = NULL,
transform = NULL
)
Arguments
data |
A data frame. |
col |
< When selecting multiple columns, values from the same row will be recycled to their common size. |
names_sep |
If If any values being unnested are unnamed, then |
simplify |
If |
strict |
A single logical specifying whether or not to apply strict
vctrs typing rules. If |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
ptype |
Optionally, a named list of prototypes declaring the desired output type of each component. Alternatively, a single empty prototype can be supplied, which will be applied to all components. Use this argument if you want to check that each element has the type you expect when simplifying. If a |
transform |
Optionally, a named list of transformation functions applied to each component. Alternatively, a single function can be supplied, which will be applied to all components. Use this argument if you want to transform or parse individual elements as they are extracted. When both |
See Also
Other rectangling:
hoist()
,
unnest_longer()
,
unnest()
Examples
df <- tibble(
character = c("Toothless", "Dory"),
metadata = list(
list(
species = "dragon",
color = "black",
films = c(
"How to Train Your Dragon",
"How to Train Your Dragon 2",
"How to Train Your Dragon: The Hidden World"
)
),
list(
species = "blue tang",
color = "blue",
films = c("Finding Nemo", "Finding Dory")
)
)
)
df
# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
# Choose not to simplify list-cols of length-1 elements
df %>% unnest_wider(metadata, simplify = FALSE)
df %>% unnest_wider(metadata, simplify = list(color = FALSE))
# You can also widen unnamed list-cols:
df <- tibble(
x = 1:3,
y = list(NULL, 1:3, 4:5)
)
# but you must supply `names_sep` to do so, which generates automatic names:
df %>% unnest_wider(y, names_sep = "_")
# 0-length elements ---------------------------------------------------------
# The defaults of `unnest_wider()` treat empty types (like `list()`) as `NULL`.
json <- list(
list(x = 1:2, y = 1:2),
list(x = list(), y = 3:4),
list(x = 3L, y = list())
)
df <- tibble(json = json)
df %>%
unnest_wider(json)
# To instead enforce strict vctrs typing rules, use `strict`
df %>%
unnest_wider(json, strict = TRUE)
US rent and income data
Description
Captured from the 2017 American Community Survey using the tidycensus package.
Usage
us_rent_income
Format
A dataset with variables:
- GEOID
FIP state identifier
- NAME
Name of state
- variable
Variable name: income = median yearly income, rent = median monthly rent
- estimate
Estimated value
- moe
90% margin of error
World Health Organization TB data
Description
A subset of data from the World Health Organization Global Tuberculosis
Report, and accompanying global populations. who
uses the original
codes from the World Health Organization. The column names for columns
5 through 60 are made by combining new_
with:
the method of diagnosis (
rel
= relapse,sn
= negative pulmonary smear,sp
= positive pulmonary smear,ep
= extrapulmonary),gender (
f
= female,m
= male), andage group (
014
= 0-14 yrs of age,1524
= 15-24,2534
= 25-34,3544
= 35-44 years of age,4554
= 45-54,5564
= 55-64,65
= 65 years or older).
who2
is a lightly modified version that makes teaching the basics
easier by tweaking the variables to be slightly more consistent and
dropping iso2
and iso3
. newrel
is replaced by new_rel
, and a
_
is added after the gender.
Usage
who
who2
population
Format
who
A data frame with 7,240 rows and 60 columns:
- country
Country name
- iso2, iso3
2 & 3 letter ISO country codes
- year
Year
- new_sp_m014 - new_rel_f65
Counts of new TB cases recorded by group. Column names encode three variables that describe the group.
who2
A data frame with 7,240 rows and 58 columns.
population
A data frame with 4,060 rows and three columns:
- country
Country name
- year
Year
- population
Population
Source
https://www.who.int/teams/global-tuberculosis-programme/data
Population data from the World Bank
Description
Data about population from the World Bank.
Usage
world_bank_pop
Format
A dataset with variables:
- country
Three letter country code
- indicator
Indicator name:
SP.POP.GROW
= population growth,SP.POP.TOTL
= total population,SP.URB.GROW
= urban population growth,SP.URB.TOTL
= total urban population- 2000-2018
Value for each year
Source
Dataset from the World Bank data bank: https://data.worldbank.org