Help for package vctrs

Title:

Vector Helpers

Version:

0.6.5

Description:

Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.

License:

MIT + file LICENSE

URL:

https://vctrs.r-lib.org/, https://github.com/r-lib/vctrs

BugReports:

https://github.com/r-lib/vctrs/issues

Depends:

R (≥ 3.5.0)

Imports:

cli (≥ 3.4.0), glue, lifecycle (≥ 1.0.3), rlang (≥ 1.1.0)

Suggests:

bit64, covr, crayon, dplyr (≥ 0.8.5), generics, knitr, pillar (≥ 1.4.4), pkgdown (≥ 2.0.1), rmarkdown, testthat (≥ 3.0.0), tibble (≥ 3.1.3), waldo (≥ 0.2.0), withr, xml2, zeallot

VignetteBuilder:

knitr

Config/Needs/website:

tidyverse/tidytemplate

Config/testthat/edition:

Encoding:

UTF-8

Language:

en-GB

RoxygenNote:

7.2.3

NeedsCompilation:

yes

Packaged:

2023-12-01 16:27:12 UTC; davis

Author:

Hadley Wickham [aut], Lionel Henry [aut], Davis Vaughan [aut, cre], data.table team [cph] (Radix sort based on data.table's forder() and their contribution to R's order()), Posit Software, PBC [cph, fnd]

Maintainer:

Davis Vaughan <davis@posit.co>

Repository:

CRAN

Date/Publication:

2023-12-01 23:50:02 UTC

vctrs: Vector Helpers

Description

Author(s)

Maintainer: Davis Vaughan davis@posit.co

Authors:

Hadley Wickham hadley@posit.co
Lionel Henry lionel@posit.co

Other contributors:

data.table team (Radix sort based on data.table's forder() and their contribution to R's order()) [copyright holder]
Posit Software, PBC [copyright holder, funder]

Default value for empty vectors

Description

Use this inline operator when you need to provide a default value for empty (as defined by vec_is_empty()) vectors.

Usage

x %0% y

Arguments

x

A vector

y

Value to use if x is empty. To preserve type-stability, should be the same type as x.

Examples

1:10 %0% 5
integer() %0% 5

AsIs S3 class

Description

These functions help the base AsIs class fit into the vctrs type system by providing coercion and casting functions.

Usage

## S3 method for class 'AsIs'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

Construct a data frame

Description

data_frame() constructs a data frame. It is similar to base::data.frame(), but there are a few notable differences that make it more in line with vctrs principles. The Properties section outlines these.

Usage

data_frame(
  ...,
  .size = NULL,
  .name_repair = c("check_unique", "unique", "universal", "minimal", "unique_quiet",
    "universal_quiet"),
  .error_call = current_env()
)

Arguments

...

Vectors to become columns in the data frame. When inputs are named, those names are used for column names.

.size

The number of rows in the data frame. If NULL, this will be computed as the common size of the inputs.

.name_repair

One of "check_unique", "unique", "universal", "minimal", "unique_quiet", or "universal_quiet". See vec_as_names() for the meaning of these options.

.error_call

The execution environment of a currently running function, e.g. caller_env(). The function will be mentioned in error messages as the source of the error. See the call argument of abort() for more information.

Details

If no column names are supplied, "" will be used as a default name for all columns. This is applied before name repair occurs, so the default name repair of "check_unique" will error if any unnamed inputs are supplied and "unique" (or "unique_quiet") will repair the empty string column names appropriately. If the column names don't matter, use a "minimal" name repair for convenience and performance.

Properties

Inputs are recycled to a common size with vec_recycle_common().
With the exception of data frames, inputs are not modified in any way. Character vectors are never converted to factors, and lists are stored as-is for easy creation of list-columns.
Unnamed data frame inputs are automatically unpacked. Named data frame inputs are stored unmodified as data frame columns.
NULL inputs are completely ignored.
The dots are dynamic, allowing for splicing of lists with ⁠!!!⁠ and unquoting.

Examples

data_frame(x = 1, y = 2)

# Inputs are recycled using tidyverse recycling rules
data_frame(x = 1, y = 1:3)

# Strings are never converted to factors
class(data_frame(x = "foo")$x)

# List columns can be easily created
df <- data_frame(x = list(1:2, 2, 3:4), y = 3:1)

# However, the base print method is suboptimal for displaying them,
# so it is recommended to convert them to tibble
if (rlang::is_installed("tibble")) {
  tibble::as_tibble(df)
}

# Named data frame inputs create data frame columns
df <- data_frame(x = data_frame(y = 1:2, z = "a"))

# The `x` column itself is another data frame
df$x

# Again, it is recommended to convert these to tibbles for a better
# print method
if (rlang::is_installed("tibble")) {
  tibble::as_tibble(df)
}

# Unnamed data frame input is automatically unpacked
data_frame(x = 1, data_frame(y = 1:2, z = "a"))

Collect columns for data frame construction

Description

df_list() constructs the data structure underlying a data frame, a named list of equal-length vectors. It is often used in combination with new_data_frame() to safely and consistently create a helper function for data frame subclasses.

Usage

df_list(
  ...,
  .size = NULL,
  .unpack = TRUE,
  .name_repair = c("check_unique", "unique", "universal", "minimal", "unique_quiet",
    "universal_quiet"),
  .error_call = current_env()
)

Arguments

...

Vectors of equal-length. When inputs are named, those names are used for names of the resulting list.

.size

The common size of vectors supplied in .... If NULL, this will be computed as the common size of the inputs.

.unpack

Should unnamed data frame inputs be unpacked? Defaults to TRUE.

.name_repair

One of "check_unique", "unique", "universal", "minimal", "unique_quiet", or "universal_quiet". See vec_as_names() for the meaning of these options.

.error_call

Properties

Inputs are recycled to a common size with vec_recycle_common().
With the exception of data frames, inputs are not modified in any way. Character vectors are never converted to factors, and lists are stored as-is for easy creation of list-columns.
Unnamed data frame inputs are automatically unpacked. Named data frame inputs are stored unmodified as data frame columns.
NULL inputs are completely ignored.
The dots are dynamic, allowing for splicing of lists with ⁠!!!⁠ and unquoting.

Examples

# `new_data_frame()` can be used to create custom data frame constructors
new_fancy_df <- function(x = list(), n = NULL, ..., class = NULL) {
  new_data_frame(x, n = n, ..., class = c(class, "fancy_df"))
}

# Combine this constructor with `df_list()` to create a safe,
# consistent helper function for your data frame subclass
fancy_df <- function(...) {
  data <- df_list(...)
  new_fancy_df(data)
}

df <- fancy_df(x = 1)
class(df)

Coercion between two data frames

Description

df_ptype2() and df_cast() are the two functions you need to call from vec_ptype2() and vec_cast() methods for data frame subclasses. See ?howto-faq-coercion-data-frame. Their main job is to determine the common type of two data frames, adding and coercing columns as needed, or throwing an incompatible type error when the columns are not compatible.

Usage

df_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())

df_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())

tib_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())

tib_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())

Arguments

x, y, to

Subclasses of data frame.

...

If you call df_ptype2() or df_cast() from a vec_ptype2() or vec_cast() method, you must forward the dots passed to your method on to df_ptype2() or df_cast().

x_arg, y_arg

Argument names for x and y. These are used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

call

to_arg

Argument name to used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

Value

When x and y are not compatible, an error of class vctrs_error_incompatible_type is thrown.
When x and y are compatible, df_ptype2() returns the common type as a bare data frame. tib_ptype2() returns the common type as a bare tibble.

FAQ - How is the compatibility of vector types decided?

Description

Two vectors are compatible when you can safely:

Combine them into one larger vector.
Assign values from one of the vectors into the other vector.

Examples of compatible types are integer and double vectors. On the other hand, integer and character vectors are not compatible.

Common type of multiple vectors

There are two possible outcomes when multiple vectors of different types are combined into a larger vector:

An incompatible type error is thrown because some of the types are not compatible:

df1 <- data.frame(x = 1:3)
df2 <- data.frame(x = "foo")
dplyr::bind_rows(df1, df2)
#> Error in `dplyr::bind_rows()`:
#> ! Can't combine `..1$x` <integer> and `..2$x` <character>.

The vectors are combined into a vector that has the common type of all inputs. In this example, the common type of integer and logical is integer:
```
df1 <- data.frame(x = 1:3)
df2 <- data.frame(x = FALSE)
dplyr::bind_rows(df1, df2)
#>   x
#> 1 1
#> 2 2
#> 3 3
#> 4 0
```

In general, the common type is the richer type, in other words the type that can represent the most values. Logical vectors are at the bottom of the hierarchy of numeric types because they can only represent two values (not counting missing values). Then come integer vectors, and then doubles. Here is the vctrs type hierarchy for the fundamental vectors:

Type conversion and lossy cast errors

Type compatibility does not necessarily mean that you can convert one type to the other type. That’s because one of the types might support a larger set of possible values. For instance, integer and double vectors are compatible, but double vectors can’t be converted to integer if they contain fractional values.

When vctrs can’t convert a vector because the target type is not as rich as the source type, it throws a lossy cast error. Assigning a fractional number to an integer vector is a typical example of a lossy cast error:

int_vector <- 1:3
vec_assign(int_vector, 2, 0.001)
#> Error in `vec_assign()`:
#> ! Can't convert from <double> to <integer> due to loss of precision.
#> * Locations: 1

How to make two vector classes compatible?

If you encounter two vector types that you think should be compatible, they might need to implement coercion methods. Reach out to the author(s) of the classes and ask them if it makes sense for their classes to be compatible.

These developer FAQ items provide guides for implementing coercion methods:

For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.

FAQ - Error/Warning: Some attributes are incompatible

Description

This error occurs when vec_ptype2() or vec_cast() are supplied vectors of the same classes with different attributes. In this case, vctrs doesn't know how to combine the inputs.

To fix this error, the maintainer of the class should implement self-to-self coercion methods for vec_ptype2() and vec_cast().

Implementing coercion methods

For an overview of how these generics work and their roles in vctrs, see ?theory-faq-coercion.
For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.
For a tutorial about implementing vctrs classes from scratch, see vignette("s3-vector").

FAQ - Error: Input must be a vector

Description

This error occurs when a function expects a vector and gets a scalar object instead. This commonly happens when some code attempts to assign a scalar object as column in a data frame:

fn <- function() NULL
tibble::tibble(x = fn)
#> Error in `tibble::tibble()`:
#> ! All columns in a tibble must be vectors.
#> x Column `x` is a function.

fit <- lm(1:3 ~ 1)
tibble::tibble(x = fit)
#> Error in `tibble::tibble()`:
#> ! All columns in a tibble must be vectors.
#> x Column `x` is a `lm` object.

Vectorness in base R and in the tidyverse

In base R, almost everything is a vector or behaves like a vector. In the tidyverse we have chosen to be a bit stricter about what is considered a vector. The main question we ask ourselves to decide on the vectorness of a type is whether it makes sense to include that object as a column in a data frame.

The main difference is that S3 lists are considered vectors by base R but in the tidyverse that’s not the case by default:

fit <- lm(1:3 ~ 1)

typeof(fit)
#> [1] "list"
class(fit)
#> [1] "lm"

# S3 lists can be subset like a vector using base R:
fit[c(1, 4)]
#> $coefficients
#> (Intercept) 
#>           2 
#> 
#> $rank
#> [1] 1

# But not in vctrs
vctrs::vec_slice(fit, c(1, 4))
#> Error in `vctrs::vec_slice()`:
#> ! `x` must be a vector, not a <lm> object.

Defused function calls are another (more esoteric) example:

call <- quote(foo(bar = TRUE, baz = FALSE))
call
#> foo(bar = TRUE, baz = FALSE)

# They can be subset like a vector using base R:
call[1:2]
#> foo(bar = TRUE)
lapply(call, function(x) x)
#> [[1]]
#> foo
#> 
#> $bar
#> [1] TRUE
#> 
#> $baz
#> [1] FALSE

# But not with vctrs:
vctrs::vec_slice(call, 1:2)
#> Error in `vctrs::vec_slice()`:
#> ! `x` must be a vector, not a call.

I get a scalar type error but I think this is a bug

It’s possible the author of the class needs to do some work to declare their class a vector. Consider reaching out to the author. We have written a developer FAQ page to help them fix the issue.

Tools for accessing the fields of a record.

Description

A rcrd behaves like a vector, so length(), names(), and $ can not provide access to the fields of the underlying list. These helpers do: fields() is equivalent to names(); n_fields() is equivalent to length(); field() is equivalent to $.

Usage

fields(x)

n_fields(x)

field(x, i)

field(x, i) <- value

Arguments

x

A rcrd, i.e. a list of equal length vectors with unique names.

Examples

x <- new_rcrd(list(x = 1:3, y = 3:1, z = letters[1:3]))
n_fields(x)
fields(x)

field(x, "y")
field(x, "y") <- runif(3)
field(x, "y")

FAQ - How to implement ptype2 and cast methods?

Description

This guide illustrates how to implement vec_ptype2() and vec_cast() methods for existing classes. Related topics:

For an overview of how these generics work and their roles in vctrs, see ?theory-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.
For a tutorial about implementing vctrs classes from scratch, see vignette("s3-vector")

The natural number class

We’ll illustrate how to implement coercion methods with a simple class that represents natural numbers. In this scenario we have an existing class that already features a constructor and methods for print() and subset.

#' @export
new_natural <- function(x) {
  if (is.numeric(x) || is.logical(x)) {
    stopifnot(is_whole(x))
    x <- as.integer(x)
  } else {
    stop("Can't construct natural from unknown type.")
  }
  structure(x, class = "my_natural")
}
is_whole <- function(x) {
  all(x %% 1 == 0 | is.na(x))
}

#' @export
print.my_natural <- function(x, ...) {
  cat("<natural>\n")
  x <- unclass(x)
  NextMethod()
}
#' @export
`[.my_natural` <- function(x, i, ...) {
  new_natural(NextMethod())
}

new_natural(1:3)
#> <natural>
#> [1] 1 2 3
new_natural(c(1, NA))
#> <natural>
#> [1]  1 NA

Roxygen workflow

To implement methods for generics, first import the generics in your namespace and redocument:

#' @importFrom vctrs vec_ptype2 vec_cast
NULL

Note that for each batches of methods that you add to your package, you need to export the methods and redocument immediately, even during development. Otherwise they won’t be in scope when you run unit tests e.g. with testthat.

Implementing double dispatch methods is very similar to implementing regular S3 methods. In these examples we are using roxygen2 tags to register the methods, but you can also register the methods manually in your NAMESPACE file or lazily with s3_register().

Implementing `vec_ptype2()`

The self-self method

The first method to implement is the one that signals that your class is compatible with itself:

#' @export
vec_ptype2.my_natural.my_natural <- function(x, y, ...) {
  x
}

vec_ptype2(new_natural(1), new_natural(2:3))
#> <natural>
#> integer(0)

vec_ptype2() implements a fallback to try and be compatible with simple classes, so it may seem that you don’t need to implement the self-self coercion method. However, you must implement it explicitly because this is how vctrs knows that a class that is implementing vctrs methods (for instance this disable fallbacks to base::c()). Also, it makes your class a bit more efficient.

The parent and children methods

Our natural number class is conceptually a parent of ⁠<logical>⁠ and a child of ⁠<integer>⁠, but the class is not compatible with logical, integer, or double vectors yet:

vec_ptype2(TRUE, new_natural(2:3))
#> Error:
#> ! Can't combine `TRUE` <logical> and `new_natural(2:3)` <my_natural>.

vec_ptype2(new_natural(1), 2:3)
#> Error:
#> ! Can't combine `new_natural(1)` <my_natural> and `2:3` <integer>.

We’ll specify the twin methods for each of these classes, returning the richer class in each case.

#' @export
vec_ptype2.my_natural.logical <- function(x, y, ...) {
  # The order of the classes in the method name follows the order of
  # the arguments in the function signature, so `x` is the natural
  # number and `y` is the logical
  x
}
#' @export
vec_ptype2.logical.my_natural <- function(x, y, ...) {
  # In this case `y` is the richer natural number
  y
}

Between a natural number and an integer, the latter is the richer class:

#' @export
vec_ptype2.my_natural.integer <- function(x, y, ...) {
  y
}
#' @export
vec_ptype2.integer.my_natural <- function(x, y, ...) {
  x
}

We no longer get common type errors for logical and integer:

vec_ptype2(TRUE, new_natural(2:3))
#> <natural>
#> integer(0)

vec_ptype2(new_natural(1), 2:3)
#> integer(0)

We are not done yet. Pairwise coercion methods must be implemented for all the connected nodes in the coercion hierarchy, which include double vectors further up. The coercion methods for grand-parent types must be implemented separately:

#' @export
vec_ptype2.my_natural.double <- function(x, y, ...) {
  y
}
#' @export
vec_ptype2.double.my_natural <- function(x, y, ...) {
  x
}

Incompatible attributes

Most of the time, inputs are incompatible because they have different classes for which no vec_ptype2() method is implemented. More rarely, inputs could be incompatible because of their attributes. In that case incompatibility is signalled by calling stop_incompatible_type().

In the following example, we implement a self-self ptype2 method for a hypothetical subclass of ⁠<factor>⁠ that has stricter combination semantics. The method throws an error when the levels of the two factors are not compatible.

#' @export
vec_ptype2.my_strict_factor.my_strict_factor <- function(x, y, ..., x_arg = "", y_arg = "") {
  if (!setequal(levels(x), levels(y))) {
    stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
  }

  x
}

Note how the methods need to take x_arg and y_arg parameters and pass them on to stop_incompatible_type(). These argument tags help create more informative error messages when the common type determination is for a column of a data frame. They are part of the generic signature but can usually be left out if not used.

Implementing `vec_cast()`

Corresponding vec_cast() methods must be implemented for all vec_ptype2() methods. The general pattern is to convert the argument x to the type of to. The methods should validate the values in x and make sure they conform to the values of to.

Please note that for historical reasons, the order of the classes in the method name is in reverse order of the arguments in the function signature. The first class represents to, whereas the second class represents x.

The self-self method is easy in this case, it just returns the target input:

#' @export
vec_cast.my_natural.my_natural <- function(x, to, ...) {
  x
}

The other types need to be validated. We perform input validation in the new_natural() constructor, so that’s a good fit for our vec_cast() implementations.

#' @export
vec_cast.my_natural.logical <- function(x, to, ...) {
  # The order of the classes in the method name is in reverse order
  # of the arguments in the function signature, so `to` is the natural
  # number and `x` is the logical
  new_natural(x)
}
vec_cast.my_natural.integer <- function(x, to, ...) {
  new_natural(x)
}
vec_cast.my_natural.double <- function(x, to, ...) {
  new_natural(x)
}

With these methods, vctrs is now able to combine logical and natural vectors. It properly returns the richer type of the two, a natural vector:

vec_c(TRUE, new_natural(1), FALSE)
#> <natural>
#> [1] 1 1 0

Because we haven’t implemented conversions from natural, it still doesn’t know how to combine natural with the richer integer and double types:

vec_c(new_natural(1), 10L)
#> Error in `vec_c()`:
#> ! Can't convert `..1` <my_natural> to <integer>.
vec_c(1.5, new_natural(1))
#> Error in `vec_c()`:
#> ! Can't convert `..2` <my_natural> to <double>.

This is quick work which completes the implementation of coercion methods for vctrs:

#' @export
vec_cast.logical.my_natural <- function(x, to, ...) {
  # In this case `to` is the logical and `x` is the natural number
  attributes(x) <- NULL
  as.logical(x)
}
#' @export
vec_cast.integer.my_natural <- function(x, to, ...) {
  attributes(x) <- NULL
  as.integer(x)
}
#' @export
vec_cast.double.my_natural <- function(x, to, ...) {
  attributes(x) <- NULL
  as.double(x)
}

And we now get the expected combinations.

vec_c(new_natural(1), 10L)
#> [1]  1 10

vec_c(1.5, new_natural(1))
#> [1] 1.5 1.0

FAQ - How to implement ptype2 and cast methods? (Data frames)

Description

This guide provides a practical recipe for implementing vec_ptype2() and vec_cast() methods for coercions of data frame subclasses. Related topics:

For an overview of the coercion mechanism in vctrs, see ?theory-faq-coercion.
For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.

Coercion of data frames occurs when different data frame classes are combined in some way. The two main methods of combination are currently row-binding with vec_rbind() and col-binding with vec_cbind() (which are in turn used by a number of dplyr and tidyr functions). These functions take multiple data frame inputs and automatically coerce them to their common type.

vctrs is generally strict about the kind of automatic coercions that are performed when combining inputs. In the case of data frames we have decided to be a bit less strict for convenience. Instead of throwing an incompatible type error, we fall back to a base data frame or a tibble if we don’t know how to combine two data frame subclasses. It is still a good idea to specify the proper coercion behaviour for your data frame subclasses as soon as possible.

We will see two examples in this guide. The first example is about a data frame subclass that has no particular attributes to manage. In the second example, we implement coercion methods for a tibble subclass that includes potentially incompatible attributes.

Roxygen workflow

To implement methods for generics, first import the generics in your namespace and redocument:

#' @importFrom vctrs vec_ptype2 vec_cast
NULL

Parent methods

Most of the common type determination should be performed by the parent class. In vctrs, double dispatch is implemented in such a way that you need to call the methods for the parent class manually. For vec_ptype2() this means you need to call df_ptype2() (for data frame subclasses) or tib_ptype2() (for tibble subclasses). Similarly, df_cast() and tib_cast() are the workhorses for vec_cast() methods of subtypes of data.frame and tbl_df. These functions take the union of the columns in x and y, and ensure shared columns have the same type.

These functions are much less strict than vec_ptype2() and vec_cast() as they accept any subclass of data frame as input. They always return a data.frame or a tbl_df. You will probably want to write similar functions for your subclass to avoid repetition in your code. You may want to export them as well if you are expecting other people to derive from your class.

A `data.table` example

This example is the actual implementation of vctrs coercion methods for data.table. This is a simple example because we don’t have to keep track of attributes for this class or manage incompatibilities. See the tibble section for a more complicated example.

We first create the dt_ptype2() and dt_cast() helpers. They wrap around the parent methods df_ptype2() and df_cast(), and transform the common type or converted input to a data table. You may want to export these helpers if you expect other packages to derive from your data frame class.

These helpers should always return data tables. To this end we use the conversion generic as.data.table(). Depending on the tools available for the particular class at hand, a constructor might be appropriate as well.

dt_ptype2 <- function(x, y, ...) {
  as.data.table(df_ptype2(x, y, ...))
}
dt_cast <- function(x, to, ...) {
  as.data.table(df_cast(x, to, ...))
}

We start with the self-self method:

#' @export
vec_ptype2.data.table.data.table <- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}

Between a data frame and a data table, we consider the richer type to be data table. This decision is not based on the value coverage of each data structures, but on the idea that data tables have richer behaviour. Since data tables are the richer type, we call dt_type2() from the vec_ptype2() method. It always returns a data table, no matter the order of arguments:

#' @export
vec_ptype2.data.table.data.frame <- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}
#' @export
vec_ptype2.data.frame.data.table <- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}

The vec_cast() methods follow the same pattern, but note how the method for coercing to data frame uses df_cast() rather than dt_cast().

Also, please note that for historical reasons, the order of the classes in the method name is in reverse order of the arguments in the function signature. The first class represents to, whereas the second class represents x.

#' @export
vec_cast.data.table.data.table <- function(x, to, ...) {
  dt_cast(x, to, ...)
}
#' @export
vec_cast.data.table.data.frame <- function(x, to, ...) {
  # `x` is a data.frame to be converted to a data.table
  dt_cast(x, to, ...)
}
#' @export
vec_cast.data.frame.data.table <- function(x, to, ...) {
  # `x` is a data.table to be converted to a data.frame
  df_cast(x, to, ...)
}

With these methods vctrs is now able to combine data tables with data frames:

vec_cbind(data.frame(x = 1:3), data.table(y = "foo"))
#>    x   y
#> 1: 1 foo
#> 2: 2 foo
#> 3: 3 foo

A tibble example

In this example we implement coercion methods for a tibble subclass that carries a colour as a scalar metadata:

# User constructor
my_tibble <- function(colour = NULL, ...) {
  new_my_tibble(tibble::tibble(...), colour = colour)
}
# Developer constructor
new_my_tibble <- function(x, colour = NULL) {
  stopifnot(is.data.frame(x))
  tibble::new_tibble(
    x,
    colour = colour,
    class = "my_tibble",
    nrow = nrow(x)
  )
}

df_colour <- function(x) {
  if (inherits(x, "my_tibble")) {
    attr(x, "colour")
  } else {
    NULL
  }
}

#'@export
print.my_tibble <- function(x, ...) {
  cat(sprintf("<%s: %s>\n", class(x)[[1]], df_colour(x)))
  cli::cat_line(format(x)[-1])
}

This subclass is very simple. All it does is modify the header.

red <- my_tibble("red", x = 1, y = 1:2)
red
#> <my_tibble: red>
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2

red[2]
#> <my_tibble: red>
#>       y
#>   <int>
#> 1     1
#> 2     2

green <- my_tibble("green", z = TRUE)
green
#> <my_tibble: green>
#>   z    
#>   <lgl>
#> 1 TRUE

Combinations do not work properly out of the box, instead vctrs falls back to a bare tibble:

vec_rbind(red, tibble::tibble(x = 10:12))
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3    10    NA
#> 4    11    NA
#> 5    12    NA

Instead of falling back to a data frame, we would like to return a ⁠<my_tibble>⁠ when combined with a data frame or a tibble. Because this subclass has more metadata than normal data frames (it has a colour), it is a supertype of tibble and data frame, i.e. it is the richer type. This is similar to how a grouped tibble is a more general type than a tibble or a data frame. Conceptually, the latter are pinned to a single constant group.

The coercion methods for data frames operate in two steps:

They check for compatible subclass attributes. In our case the tibble colour has to be the same, or be undefined.
They call their parent methods, in this case tib_ptype2() and tib_cast() because we have a subclass of tibble. This eventually calls the data frame methods df_ptype2() and tib_ptype2() which match the columns and their types.

This process should usually be wrapped in two functions to avoid repetition. Consider exporting these if you expect your class to be derived by other subclasses.

We first implement a helper to determine if two data frames have compatible colours. We use the df_colour() accessor which returns NULL when the data frame colour is undefined.

has_compatible_colours <- function(x, y) {
  x_colour <- df_colour(x) %||% df_colour(y)
  y_colour <- df_colour(y) %||% x_colour
  identical(x_colour, y_colour)
}

Next we implement the coercion helpers. If the colours are not compatible, we call stop_incompatible_cast() or stop_incompatible_type(). These strict coercion semantics are justified because in this class colour is a data attribute. If it were a non essential detail attribute, like the timezone in a datetime, we would just standardise it to the value of the left-hand side.

In simpler cases (like the data.table example), these methods do not need to take the arguments suffixed in ⁠_arg⁠. Here we do need to take these arguments so we can pass them to the stop_ functions when we detect an incompatibility. They also should be passed to the parent methods.

#' @export
my_tib_cast <- function(x, to, ..., x_arg = "", to_arg = "") {
  out <- tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg)

  if (!has_compatible_colours(x, to)) {
    stop_incompatible_cast(
      x,
      to,
      x_arg = x_arg,
      to_arg = to_arg,
      details = "Can't combine colours."
    )
  }

  colour <- df_colour(x) %||% df_colour(to)
  new_my_tibble(out, colour = colour)
}
#' @export
my_tib_ptype2 <- function(x, y, ..., x_arg = "", y_arg = "") {
  out <- tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg)

  if (!has_compatible_colours(x, y)) {
    stop_incompatible_type(
      x,
      y,
      x_arg = x_arg,
      y_arg = y_arg,
      details = "Can't combine colours."
    )
  }

  colour <- df_colour(x) %||% df_colour(y)
  new_my_tibble(out, colour = colour)
}

Let’s now implement the coercion methods, starting with the self-self methods.

#' @export
vec_ptype2.my_tibble.my_tibble <- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_cast.my_tibble.my_tibble <- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}

We can now combine compatible instances of our class!

vec_rbind(red, red)
#> <my_tibble: red>
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3     1     1
#> 4     1     2

vec_rbind(green, green)
#> <my_tibble: green>
#>   z    
#>   <lgl>
#> 1 TRUE 
#> 2 TRUE

vec_rbind(green, red)
#> Error in `my_tib_ptype2()`:
#> ! Can't combine `..1` <my_tibble> and `..2` <my_tibble>.
#> Can't combine colours.

The methods for combining our class with tibbles follow the same pattern. For ptype2 we return our class in both cases because it is the richer type:

#' @export
vec_ptype2.my_tibble.tbl_df <- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_ptype2.tbl_df.my_tibble <- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}

For cast are careful about returning a tibble when casting to a tibble. Note the call to vctrs::tib_cast():

#' @export
vec_cast.my_tibble.tbl_df <- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}
#' @export
vec_cast.tbl_df.my_tibble <- function(x, to, ...) {
  tib_cast(x, to, ...)
}

From this point, we get correct combinations with tibbles:

vec_rbind(red, tibble::tibble(x = 10:12))
#> <my_tibble: red>
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3    10    NA
#> 4    11    NA
#> 5    12    NA

However we are not done yet. Because the coercion hierarchy is different from the class hierarchy, there is no inheritance of coercion methods. We’re not getting correct behaviour for data frames yet because we haven’t explicitly specified the methods for this class:

vec_rbind(red, data.frame(x = 10:12))
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3    10    NA
#> 4    11    NA
#> 5    12    NA

Let’s finish up the boiler plate:

#' @export
vec_ptype2.my_tibble.data.frame <- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_ptype2.data.frame.my_tibble <- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}

#' @export
vec_cast.my_tibble.data.frame <- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}
#' @export
vec_cast.data.frame.my_tibble <- function(x, to, ...) {
  df_cast(x, to, ...)
}

This completes the implementation:

vec_rbind(red, data.frame(x = 10:12))
#> <my_tibble: red>
#>       x     y
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3    10    NA
#> 4    11    NA
#> 5    12    NA

FAQ - Why isn't my class treated as a vector?

Description

The tidyverse is a bit stricter than base R regarding what kind of objects are considered as vectors (see the user FAQ about this topic). Sometimes vctrs won’t treat your class as a vector when it should.

Why isn’t my list class considered a vector?

By default, S3 lists are not considered to be vectors by vctrs:

my_list <- structure(list(), class = "my_class")

vctrs::vec_is(my_list)
#> [1] FALSE

To be treated as a vector, the class must either inherit from "list" explicitly:

my_explicit_list <- structure(list(), class = c("my_class", "list"))
vctrs::vec_is(my_explicit_list)
#> [1] TRUE

Or it should implement a vec_proxy() method that returns its input if explicit inheritance is not possible or troublesome:

#' @export
vec_proxy.my_class <- function(x, ...) x

vctrs::vec_is(my_list)
#> [1] FALSE

Note that explicit inheritance is the preferred way because this makes it possible for your class to dispatch on list methods of S3 generics:

my_generic <- function(x) UseMethod("my_generic")
my_generic.list <- function(x) "dispatched!"

my_generic(my_list)
#> Error in UseMethod("my_generic"): no applicable method for 'my_generic' applied to an object of class "my_class"

my_generic(my_explicit_list)
#> [1] "dispatched!"

Why isn’t my data frame class considered a vector?

The most likely explanation is that the data frame has not been properly constructed.

However, if you get an “Input must be a vector” error with a data frame subclass, it probably means that the data frame has not been properly constructed. The main cause of these errors are data frames whose base class is not "data.frame":

my_df <- data.frame(x = 1)
class(my_df) <- c("data.frame", "my_class")

vctrs::obj_check_vector(my_df)
#> Error:
#> ! `my_df` must be a vector, not a <data.frame/my_class> object.

This is problematic as many tidyverse functions won’t work properly:

dplyr::slice(my_df, 1)
#> Error in `vec_slice()`:
#> ! `x` must be a vector, not a <data.frame/my_class> object.

It is generally not appropriate to declare your class to be a superclass of another class. We generally consider this undefined behaviour (UB). To fix these errors, you can simply change the construction of your data frame class so that "data.frame" is a base class, i.e. it should come last in the class vector:

class(my_df) <- c("my_class", "data.frame")

vctrs::obj_check_vector(my_df)

dplyr::slice(my_df, 1)
#>   x
#> 1 1

Internal FAQ - Implementation of `vec_locate_matches()`

Description

vec_locate_matches() is similar to vec_match(), but detects all matches by default, and can match on conditions other than equality (like >= and <). There are also various other arguments to limit or adjust exactly which kinds of matches are returned. Here is an example:

x <- c("a", "b", "a", "c", "d")
y <- c("d", "b", "a", "d", "a", "e")

# For each value of `x`, find all matches in `y`
# - The "c" in `x` doesn't have a match, so it gets an NA location by default
# - The "e" in `y` isn't matched by anything in `x`, so it is dropped by default
vec_locate_matches(x, y)
#>   needles haystack
#> 1       1        3
#> 2       1        5
#> 3       2        2
#> 4       3        3
#> 5       3        5
#> 6       4       NA
#> 7       5        1
#> 8       5        4

Algorithm description

Overview and `==`

The simplest (approximate) way to think about the algorithm that df_locate_matches_recurse() uses is that it sorts both inputs, and then starts at the midpoint in needles and uses a binary search to find each needle in haystack. Since there might be multiple of the same needle, we find the location of the lower and upper duplicate of that needle to handle all duplicates of that needle at once. Similarly, if there are duplicates of a matching haystack value, we find the lower and upper duplicates of the match.

If the condition is ==, that is pretty much all we have to do. For each needle, we then record 3 things: the location of the needle, the location of the lower match in the haystack, and the match size (i.e. loc_upper_match - loc_lower_match + 1). This later gets expanded in expand_compact_indices() into the actual output.

After recording the matches for a single needle, we perform the same procedure on the LHS and RHS of that needle (remember we started on the midpoint needle). i.e. from ⁠[1, loc_needle-1]⁠ and ⁠[loc_needle+1, size_needles]⁠, again taking the midpoint of those two ranges, finding their respective needle in the haystack, recording matches, and continuing on to the next needle. This iteration proceeds until we run out of needles.

When we have a data frame with multiple columns, we add a layer of recursion to this. For the first column, we find the locations of the lower/upper duplicate of the current needle, and we find the locations of the lower/upper matches in the haystack. If we are on the final column in the data frame, we record the matches, otherwise we pass this information on to another call to df_locate_matches_recurse(), bumping the column index and using these refined lower/upper bounds as the starting bounds for the next column.

I think an example would be useful here, so below I step through this process for a few iterations:

# these are sorted already for simplicity
needles <- data_frame(x = c(1, 1, 2, 2, 2, 3), y = c(1, 2, 3, 4, 5, 3))
haystack <- data_frame(x = c(1, 1, 2, 2, 3), y = c(2, 3, 4, 4, 1))

needles
#>   x y
#> 1 1 1
#> 2 1 2
#> 3 2 3
#> 4 2 4
#> 5 2 5
#> 6 3 3

haystack
#>   x y
#> 1 1 2
#> 2 1 3
#> 3 2 4
#> 4 2 4
#> 5 3 1

## Column 1, iteration 1

# start at midpoint in needles
# this corresponds to x==2
loc_mid_needles <- 3L

# finding all x==2 values in needles gives us:
loc_lower_duplicate_needles <- 3L
loc_upper_duplicate_needles <- 5L

# finding matches in haystack give us:
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 4L

# compute LHS/RHS bounds for next needle
lhs_loc_lower_bound_needles <- 1L # original lower bound
lhs_loc_upper_bound_needles <- 2L # lower_duplicate-1

rhs_loc_lower_bound_needles <- 6L # upper_duplicate+1
rhs_loc_upper_bound_needles <- 6L # original upper bound

# We still have a 2nd column to check. So recurse and pass on the current
# duplicate and match bounds to start the 2nd column with.

## Column 2, iteration 1

# midpoint of [3, 5]
# value y==4
loc_mid_needles <- 4L

loc_lower_duplicate_needles <- 4L
loc_upper_duplicate_needles <- 4L

loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 4L

# last column, so record matches
# - this was location 4 in needles
# - lower match in haystack is at loc 3
# - match size is 2

# Now handle LHS and RHS of needle midpoint
lhs_loc_lower_bound_needles <- 3L # original lower bound
lhs_loc_upper_bound_needles <- 3L # lower_duplicate-1

rhs_loc_lower_bound_needles <- 5L # upper_duplicate+1
rhs_loc_upper_bound_needles <- 5L # original upper bound

## Column 2, iteration 2 (using LHS bounds)

# midpoint of [3,3]
# value of y==3
loc_mid_needles <- 3L

loc_lower_duplicate_needles <- 3L
loc_upper_duplicate_needles <- 3L

# no match! no y==3 in haystack for x==2
# lower-match will always end up > upper-match in this case
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 2L

# no LHS or RHS needle values to do, so we are done here

## Column 2, iteration 3 (using RHS bounds)

# same as above, range of [5,5], value of y==5, which has no match in haystack

## Column 1, iteration 2 (LHS of first x needle)

# Now we are done with the x needles from [3,5], so move on to the LHS and RHS
# of that. Here we would do the LHS:

# midpoint of [1,2]
loc_mid_needles <- 1L

# ...

## Column 1, iteration 3 (RHS of first x needle)

# midpoint of [6,6]
loc_mid_needles <- 6L

# ...

In the real code, rather than comparing the double values of the columns directly, we replace each column with pseudo "joint ranks" computed between the i-th column of needles and the i-th column of haystack. It is approximately like doing vec_rank(vec_c(needles$x, haystack$x), type = "dense"), then splitting the resulting ranks back up into their corresponding needle/haystack columns. This keeps the recursion code simpler, because we only have to worry about comparing integers.

Non-equi conditions and containers

At this point we can talk about non-equi conditions like < or >=. The general idea is pretty simple, and just builds on the above algorithm. For example, start with the x column from needles/haystack above:

needles$x
#> [1] 1 1 2 2 2 3

haystack$x
#> [1] 1 1 2 2 3

If we used a condition of <=, then we'd do everything the same as before:

Midpoint in needles is location 3, value x==2
Find lower/upper duplicates in needles, giving locations ⁠[3, 5]⁠
Find lower/upper exact match in haystack, giving locations ⁠[3, 4]⁠

At this point, we need to "adjust" the haystack match bounds to account for the condition. Since haystack is ordered, our "rule" for <= is to keep the lower match location the same, but extend the upper match location to the upper bound, so we end up with ⁠[3, 5]⁠. We know we can extend the upper match location because every haystack value after the exact match should be less than the needle. Then we just record the matches and continue on normally.

This approach is really nice, because we only have to exactly match the needle in haystack. We don't have to compare each needle against every value in haystack, which would take a massive amount of time.

However, it gets slightly more complex with data frames with multiple columns. Let's go back to our original needles and haystack data frames and apply the condition <= to each column. Here is another worked example, which shows a case where our "rule" falls apart on the second column.

needles
#>   x y
#> 1 1 1
#> 2 1 2
#> 3 2 3
#> 4 2 4
#> 5 2 5
#> 6 3 3

haystack
#>   x y
#> 1 1 2
#> 2 1 3
#> 3 2 4
#> 4 2 4
#> 5 3 1

# `condition = c("<=", "<=")`

## Column 1, iteration 1

# x == 2
loc_mid_needles <- 3L

loc_lower_duplicate_needles <- 3L
loc_upper_duplicate_needles <- 5L

# finding exact matches in haystack give us:
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 4L

# because haystack is ordered we know we can expand the upper bound automatically
# to include everything past the match. i.e. needle of x==2 must be less than
# the haystack value at loc 5, which we can check by seeing that it is x==3.
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 5L

## Column 2, iteration 1

# needles range of [3, 5]
# y == 4
loc_mid_needles <- 4L

loc_lower_duplicate_needles <- 4L
loc_upper_duplicate_needles <- 4L

# finding exact matches in haystack give us:
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 4L

# lets try using our rule, which tells us we should be able to extend the upper
# bound:
loc_lower_match_haystack <- 3L
loc_upper_match_haystack <- 5L

# but the haystack value of y at location 5 is y==1, which is not less than y==4
# in the needles! looks like our rule failed us.

If you read through the above example, you'll see that the rule didn't work here. The problem is that while haystack is ordered (by vec_order()s standards), each column isn't ordered independently of the others. Instead, each column is ordered within the "group" created by previous columns. Concretely, haystack here has an ordered x column, but if you look at haystack$y by itself, it isn't ordered (because of that 1 at the end). That is what causes the rule to fail.

haystack
#>   x y
#> 1 1 2
#> 2 1 3
#> 3 2 4
#> 4 2 4
#> 5 3 1

To fix this, we need to create haystack "containers" where the values within each container are all totally ordered. For haystack that would create 2 containers and look like:

haystack[1:4,]
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     2
#> 2     1     3
#> 3     2     4
#> 4     2     4

haystack[5,]
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     3     1

This is essentially what computing_nesting_container_ids() does. You can actually see these ids with the helper, compute_nesting_container_info():

haystack2 <- haystack

# we really pass along the integer ranks, but in this case that is equivalent
# to converting our double columns to integers
haystack2$x <- as.integer(haystack2$x)
haystack2$y <- as.integer(haystack2$y)

info <- compute_nesting_container_info(haystack2, condition = c("<=", "<="))

# the ids are in the second slot.
# container ids break haystack into [1, 4] and [5, 5].
info[[2]]
#> [1] 0 0 0 0 1

So the idea is that for each needle, we look in each haystack container and find all the matches, then we aggregate all of the matches once at the end. df_locate_matches_with_containers() has the job of iterating over the containers.

Computing totally ordered containers can be expensive, but luckily it doesn't happen very often in normal usage.

If there are all == conditions, we don't need containers (i.e. any equi join)
If there is only 1 non-equi condition and no conditions after it, we don't need containers (i.e. most rolling joins)
Otherwise the typical case where we need containers is if we have something like ⁠date >= lower, date <= upper⁠. Even so, the computation cost generally scales with the number of columns in haystack you compute containers with (here 2), and it only really slows down around 4 columns or so, which I haven't ever seen a real life example of.

Internal FAQ - `vec_ptype2()`, `NULL`, and unspecified vectors

Description

Promotion monoid

Promotions (i.e. automatic coercions) should always transform inputs to their richer type to avoid losing values of precision. vec_ptype2() returns the richer type of two vectors, or throws an incompatible type error if none of the two vector types include the other. For example, the richer type of integer and double is the latter because double covers a larger range of values than integer.

vec_ptype2() is a monoid over vectors, which in practical terms means that it is a well behaved operation for reduction. Reduction is an important operation for promotions because that is how the richer type of multiple elements is computed. As a monoid, vec_ptype2() needs an identity element, i.e. a value that doesn’t change the result of the reduction. vctrs has two identity values, NULL and unspecified vectors.

The `NULL` identity

As an identity element that shouldn’t influence the determination of the common type of a set of vectors, NULL is promoted to any type:

vec_ptype2(NULL, "")
#> character(0)
vec_ptype2(1L, NULL)
#> integer(0)

The common type of NULL and NULL is the identity NULL:

vec_ptype2(NULL, NULL)
#> NULL

This way the result of vec_ptype2(NULL, NULL) does not influence subsequent promotions:

vec_ptype2(
  vec_ptype2(NULL, NULL),
  ""
)
#> character(0)

Unspecified vectors

In the vctrs coercion system, logical vectors of missing values are also automatically promoted to the type of any other vector, just like NULL. We call these vectors unspecified. The special coercion semantics of unspecified vectors serve two purposes:

It makes it possible to assign vectors of NA inside any type of vectors, even when they are not coercible with logical:
```
x <- letters[1:5]
vec_assign(x, 1:2, c(NA, NA))
#> [1] NA  NA  "c" "d" "e"
```
We can’t put NULL in a data frame, so we need an identity element that behaves more like a vector. Logical vectors of NA seem a natural fit for this.

Unspecified vectors are thus promoted to any other type, just like NULL:

vec_ptype2(NA, "")
#> character(0)
vec_ptype2(1L, c(NA, NA))
#> integer(0)

Finalising common types

vctrs has an internal vector type of class vctrs_unspecified. Users normally don’t see such vectors in the wild, but they do come up when taking the common type of an unspecified vector with another identity value:

vec_ptype2(NA, NA)
#> <unspecified> [0]
vec_ptype2(NA, NULL)
#> <unspecified> [0]
vec_ptype2(NULL, NA)
#> <unspecified> [0]

We can’t return NA here because vec_ptype2() normally returns empty vectors. We also can’t return NULL because unspecified vectors need to be recognised as logical vectors if they haven’t been promoted at the end of the reduction.

vec_ptype_finalise(vec_ptype2(NULL, NA))
#> logical(0)

See the output of vec_ptype_common() which performs the reduction and finalises the type, ready to be used by the caller:

vec_ptype_common(NULL, NULL)
#> NULL
vec_ptype_common(NA, NULL)
#> logical(0)

Note that partial types in vctrs make use of the same mechanism. They are finalised with vec_ptype_finalise().

Drop empty elements from a list

Description

list_drop_empty() removes empty elements from a list. This includes NULL elements along with empty vectors, like integer(0). This is equivalent to, but faster than, vec_slice(x, list_sizes(x) != 0L).

Usage

list_drop_empty(x)

Arguments

x

A list.

Dependencies

vec_slice()

Examples

x <- list(1, NULL, integer(), 2)
list_drop_empty(x)

`list_of` S3 class for homogenous lists

Description

A list_of object is a list where each element has the same type. Modifying the list with $, [, and [[ preserves the constraint by coercing all input items.

Usage

list_of(..., .ptype = NULL)

as_list_of(x, ...)

is_list_of(x)

## S3 method for class 'vctrs_list_of'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'vctrs_list_of'
vec_cast(x, to, ...)

Arguments

...

Vectors to coerce.

.ptype

If NULL, the default, the output type is determined by computing the common type across all elements of ....

Alternatively, you can supply .ptype to give the output known type. If getOption("vctrs.no_guessing") is TRUE you must supply this value: this is a convenient way to make production code demand fixed types.

x

For as_list_of(), a vector to be coerced to list_of.

y, to

Arguments to vec_ptype2() and vec_cast().

x_arg, y_arg

Argument names for x and y. These are used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

Details

Unlike regular lists, setting a list element to NULL using [[ does not remove it.

Examples

x <- list_of(1:3, 5:6, 10:15)
if (requireNamespace("tibble", quietly = TRUE)) {
  tibble::tibble(x = x)
}

vec_c(list_of(1, 2), list_of(FALSE, TRUE))

Lossy cast error

Description

By default, lossy casts are an error. Use allow_lossy_cast() to silence these errors and continue with the partial results. In this case the lost values are typically set to NA or to a lower value resolution, depending on the type of cast.

Lossy cast errors are thrown by maybe_lossy_cast(). Unlike functions prefixed with stop_, maybe_lossy_cast() usually returns a result. If a lossy cast is detected, it throws an error, unless it's been wrapped in allow_lossy_cast(). In that case, it returns the result silently.

Usage

maybe_lossy_cast(
  result,
  x,
  to,
  lossy = NULL,
  locations = NULL,
  ...,
  loss_type = c("precision", "generality"),
  x_arg,
  to_arg,
  call = caller_env(),
  details = NULL,
  message = NULL,
  class = NULL,
  .deprecation = FALSE
)

Arguments

result

The result of a potentially lossy cast.

x

Vectors to cast.

to

Type to cast to.

lossy

A logical vector indicating which elements of result were lossy.

Can also be a single TRUE, but note that locations picks up locations from this vector by default. In this case, supply your own location vector, possibly empty.

locations

An optional integer vector giving the locations where x lost information.

..., class

Only use these fields when creating a subclass.

loss_type

The kind of lossy cast to be mentioned in error messages. Can be loss of precision (for instance from double to integer) or loss of generality (from character to factor).

x_arg

Argument name for x, used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

to_arg

Argument name to used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

call

details

Any additional human readable details.

message

An overriding message for the error. details and message are mutually exclusive, supplying both is an error.

.deprecation

If TRUE, the error is downgraded to a deprecation warning. This is useful for transitioning your class to a stricter conversion scheme. The warning advises your users to wrap their code with allow_lossy_cast().

Missing values

Description

vec_detect_missing() returns a logical vector the same size as x. For each element of x, it returns TRUE if the element is missing, and FALSE otherwise.
vec_any_missing() returns a single TRUE or FALSE depending on whether or not x has any missing values.

Differences with `is.na()`

Data frame rows are only considered missing if every element in the row is missing. Similarly, record vector elements are only considered missing if every field in the record is missing. Put another way, rows with any missing values are considered incomplete, but only rows with all missing values are considered missing.

List elements are only considered missing if they are NULL.

Usage

vec_detect_missing(x)

vec_any_missing(x)

Arguments

x

A vector

Value

vec_detect_missing() returns a logical vector the same size as x.
vec_any_missing() returns a single TRUE or FALSE.

Dependencies

vec_proxy_equal()

Examples

x <- c(1, 2, NA, 4, NA)

vec_detect_missing(x)
vec_any_missing(x)

# Data frames are iterated over rowwise, and only report a row as missing
# if every element of that row is missing. If a row is only partially
# missing, it is said to be incomplete, but not missing.
y <- c("a", "b", NA, "d", "e")
df <- data_frame(x = x, y = y)

df$missing <- vec_detect_missing(df)
df$incomplete <- !vec_detect_complete(df)
df

Name specifications

Description

A name specification describes how to combine an inner and outer names. This sort of name combination arises when concatenating vectors or flattening lists. There are two possible cases:

Named vector:

vec_c(outer = c(inner1 = 1, inner2 = 2))

Unnamed vector:
```
vec_c(outer = 1:2)
```

In r-lib and tidyverse packages, these cases are errors by default, because there's no behaviour that works well for every case. Instead, you can provide a name specification that describes how to combine the inner and outer names of inputs. Name specifications can refer to:

outer: The external name recycled to the size of the input vector.
inner: Either the names of the input vector, or a sequence of integer from 1 to the size of the vector if it is unnamed.

Arguments

name_spec, .name_spec

A name specification for combining inner and outer names. This is relevant for inputs passed with a name, when these inputs are themselves named, like outer = c(inner = 1), or when they have length greater than 1: outer = 1:2. By default, these cases trigger an error. You can resolve the error by providing a specification that describes how to combine the names or the indices of the inner vector with the name of the input. This specification can be:

A function of two arguments. The outer name is passed as a string to the first argument, and the inner names or positions are passed as second argument.
An anonymous function as a purrr-style formula.
A glue specification of the form "{outer}_{inner}".
An rlang::zap() object, in which case both outer and inner names are ignored and the result is unnamed.

See the name specification topic.

Examples

# By default, named inputs must be length 1:
vec_c(name = 1)         # ok
try(vec_c(name = 1:3))  # bad

# They also can't have internal names, even if scalar:
try(vec_c(name = c(internal = 1)))  # bad

# Pass a name specification to work around this. A specification
# can be a glue string referring to `outer` and `inner`:
vec_c(name = 1:3, other = 4:5, .name_spec = "{outer}")
vec_c(name = 1:3, other = 4:5, .name_spec = "{outer}_{inner}")

# They can also be functions:
my_spec <- function(outer, inner) paste(outer, inner, sep = "_")
vec_c(name = 1:3, other = 4:5, .name_spec = my_spec)

# Or purrr-style formulas for anonymous functions:
vec_c(name = 1:3, other = 4:5, .name_spec = ~ paste0(.x, .y))

Assemble attributes for data frame construction

Description

new_data_frame() constructs a new data frame from an existing list. It is meant to be performant, and does not check the inputs for correctness in any way. It is only safe to use after a call to df_list(), which collects and validates the columns used to construct the data frame.

Usage

new_data_frame(x = list(), n = NULL, ..., class = NULL)

Arguments

x

A named list of equal-length vectors. The lengths are not checked; it is responsibility of the caller to make sure they are equal.

n

Number of rows. If NULL, will be computed from the length of the first element of x.

..., class

Additional arguments for creating subclasses.

The following attributes have special behavior:

"names" is preferred if provided, overriding existing names in x.
"row.names" is preferred if provided, overriding both n and the size implied by x.

Examples

new_data_frame(list(x = 1:10, y = 10:1))

Date, date-time, and duration S3 classes

Description

A date (Date) is a double vector. Its value represent the number of days since the Unix "epoch", 1970-01-01. It has no attributes.
A datetime (POSIXct is a double vector. Its value represents the number of seconds since the Unix "Epoch", 1970-01-01. It has a single attribute: the timezone (tzone))
A duration (difftime)

Usage

new_date(x = double())

new_datetime(x = double(), tzone = "")

new_duration(x = double(), units = c("secs", "mins", "hours", "days", "weeks"))

## S3 method for class 'Date'
vec_ptype2(x, y, ...)

## S3 method for class 'POSIXct'
vec_ptype2(x, y, ...)

## S3 method for class 'POSIXlt'
vec_ptype2(x, y, ...)

## S3 method for class 'difftime'
vec_ptype2(x, y, ...)

## S3 method for class 'Date'
vec_cast(x, to, ...)

## S3 method for class 'POSIXct'
vec_cast(x, to, ...)

## S3 method for class 'POSIXlt'
vec_cast(x, to, ...)

## S3 method for class 'difftime'
vec_cast(x, to, ...)

## S3 method for class 'Date'
vec_arith(op, x, y, ...)

## S3 method for class 'POSIXct'
vec_arith(op, x, y, ...)

## S3 method for class 'POSIXlt'
vec_arith(op, x, y, ...)

## S3 method for class 'difftime'
vec_arith(op, x, y, ...)

Arguments

x

A double vector representing the number of days since UNIX epoch for new_date(), number of seconds since UNIX epoch for new_datetime(), and number of units for new_duration().

tzone

Time zone. A character vector of length 1. Either "" for the local time zone, or a value from OlsonNames()

units

Units of duration.

Details

These function help the base Date, POSIXct, and difftime classes fit into the vctrs type system by providing constructors, coercion functions, and casting functions.

Examples

new_date(0)
new_datetime(0, tzone = "UTC")
new_duration(1, "hours")

Factor/ordered factor S3 class

Description

A factor is an integer with attribute levels, a character vector. There should be one level for each integer between 1 and max(x). An ordered factor has the same properties as a factor, but possesses an extra class that marks levels as having a total ordering.

Usage

new_factor(x = integer(), levels = character(), ..., class = character())

new_ordered(x = integer(), levels = character())

## S3 method for class 'factor'
vec_ptype2(x, y, ...)

## S3 method for class 'ordered'
vec_ptype2(x, y, ...)

## S3 method for class 'factor'
vec_cast(x, to, ...)

## S3 method for class 'ordered'
vec_cast(x, to, ...)

Arguments

x

Integer values which index in to levels.

levels

Character vector of labels.

..., class

Used to for subclasses.

Details

These functions help the base factor and ordered factor classes fit in to the vctrs type system by providing constructors, coercion functions, and casting functions. new_factor() and new_ordered() are low-level constructors - they only check that types, but not values, are valid, so are for expert use only.

Create list_of subclass

Description

Create list_of subclass

Usage

new_list_of(x = list(), ptype = logical(), ..., class = character())

Arguments

x

A list

ptype

The prototype which every element of x belongs to

...

Additional attributes used by subclass

class

Optional subclass name

Partial type

Description

Use new_partial() when constructing a new partial type subclass; and use is_partial() to test if a type is partial. All subclasses need to provide a vec_ptype_finalise() method.

Usage

new_partial(..., class = character())

is_partial(x)

vec_ptype_finalise(x, ...)

Arguments

...

Attributes of the partial type

class

Name of subclass.

Details

As the name suggests, a partial type partially specifies a type, and it must be combined with data to yield a full type. A useful example of a partial type is partial_frame(), which makes it possible to specify the type of just a few columns in a data frame. Use this constructor if you're making your own partial type.

rcrd (record) S3 class

Description

The rcrd class extends vctr. A rcrd is composed of 1 or more fields, which must be vectors of the same length. Is designed specifically for classes that can naturally be decomposed into multiple vectors of the same length, like POSIXlt, but where the organisation should be considered an implementation detail invisible to the user (unlike a data.frame).

Usage

new_rcrd(fields, ..., class = character())

Arguments

fields

A list or a data frame. Lists must be rectangular (same sizes), and contain uniquely named vectors (at least one). fields is validated with df_list() to ensure uniquely named vectors.

...

Additional attributes

class

Name of subclass.

vctr (vector) S3 class

Description

This abstract class provides a set of useful default methods that makes it considerably easier to get started with a new S3 vector class. See vignette("s3-vector") to learn how to use it to create your own S3 vector classes.

Usage

new_vctr(.data, ..., class = character(), inherit_base_type = NULL)

Arguments

.data

Foundation of class. Must be a vector

...

Name-value pairs defining attributes

class

Name of subclass.

inherit_base_type

A single logical, or NULL. Does this class extend the base type of .data? i.e. does the resulting object extend the behaviour of the underlying type? Defaults to FALSE for all types except lists, which are required to inherit from the base type.

Details

List vctrs are special cases. When created through new_vctr(), the resulting list vctr should always be recognized as a list by obj_is_list(). Because of this, if inherit_base_type is FALSE an error is thrown.

Base methods

The vctr class provides methods for many base generics using a smaller set of generics defined by this package. Generally, you should think carefully before overriding any of the methods that vctrs implements for you as they've been carefully planned to be internally consistent.

[[ and [ use NextMethod() dispatch to the underlying base function, then restore attributes with vec_restore(). rep() and ⁠length<-⁠ work similarly.
⁠[[<-⁠ and ⁠[<-⁠ cast value to same type as x, then call NextMethod().
as.logical(), as.integer(), as.numeric(), as.character(), as.Date() and as.POSIXct() methods call vec_cast(). The as.list() method calls [[ repeatedly, and the as.data.frame() method uses a standard technique to wrap a vector in a data frame.
as.factor(), as.ordered() and as.difftime() are not generic functions in base R, but have been reimplemented as generics in the generics package. vctrs extends these and calls vec_cast(). To inherit this behaviour in a package, import and re-export the generic of interest from generics.
==, !=, unique(), anyDuplicated(), and is.na() use vec_proxy().
<, <=, >=, >, min(), max(), range(), median(), quantile(), and xtfrm() methods use vec_proxy_compare().
+, -, /, *, ^, %%, %/%, !, &, and | operators use vec_arith().
Mathematical operations including the Summary group generics (prod(), sum(), any(), all()), the Math group generics (abs(), sign(), etc), mean(), is.nan(), is.finite(), and is.infinite() use vec_math().
dims(), ⁠dims<-⁠, dimnames(), ⁠dimnames<-⁠, levels(), and ⁠levels<-⁠ methods throw errors.

List checks

Description

obj_is_list() tests if x is considered a list in the vctrs sense. It returns TRUE if:
- x is a bare list with no class.
- x is a list explicitly inheriting from "list".
list_all_vectors() takes a list and returns TRUE if all elements of that list are vectors.
list_all_size() takes a list and returns TRUE if all elements of that list have the same size.
obj_check_list(), list_check_all_vectors(), and list_check_all_size() use the above functions, but throw a standardized and informative error if they return FALSE.

Usage

obj_is_list(x)

obj_check_list(x, ..., arg = caller_arg(x), call = caller_env())

list_all_vectors(x)

list_check_all_vectors(x, ..., arg = caller_arg(x), call = caller_env())

list_all_size(x, size)

list_check_all_size(x, size, ..., arg = caller_arg(x), call = caller_env())

Arguments

x

For ⁠vec_*()⁠ functions, an object. For ⁠list_*()⁠ functions, a list.

...

These dots are for future extensions and must be empty.

arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

call

size

The size to check each element for.

Details

Notably, data frames and S3 record style classes like POSIXlt are not considered lists.

Examples

obj_is_list(list())
obj_is_list(list_of(1))
obj_is_list(data.frame())

list_all_vectors(list(1, mtcars))
list_all_vectors(list(1, environment()))

list_all_size(list(1:2, 2:3), 2)
list_all_size(list(1:2, 2:4), 2)

# `list_`-prefixed functions assume a list:
try(list_all_vectors(environment()))

`print()` and `str()` generics.

Description

These are constructed to be more easily extensible since you can override the ⁠_header()⁠, ⁠_data()⁠ or ⁠_footer()⁠ components individually. The default methods are built on top of format().

Usage

obj_print(x, ...)

obj_print_header(x, ...)

obj_print_data(x, ...)

obj_print_footer(x, ...)

obj_str(x, ...)

obj_str_header(x, ...)

obj_str_data(x, ...)

obj_str_footer(x, ...)

Arguments

x

A vector

...

Additional arguments passed on to methods. See print() and str() for commonly used options

Order and sort vectors

Description

vec_order_radix() computes the order of x. For data frames, the order is computed along the rows by computing the order of the first column and using subsequent columns to break ties.

vec_sort_radix() sorts x. It is equivalent to vec_slice(x, vec_order_radix(x)).

Usage

vec_order_radix(
  x,
  ...,
  direction = "asc",
  na_value = "largest",
  nan_distinct = FALSE,
  chr_proxy_collate = NULL
)

vec_sort_radix(
  x,
  ...,
  direction = "asc",
  na_value = "largest",
  nan_distinct = FALSE,
  chr_proxy_collate = NULL
)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

direction

Direction to sort in.

A single "asc" or "desc" for ascending or descending order respectively.
For data frames, a length 1 or ncol(x) character vector containing only "asc" or "desc", specifying the direction for each column.

na_value

Ordering of missing values.

A single "largest" or "smallest" for ordering missing values as the largest or smallest values respectively.
For data frames, a length 1 or ncol(x) character vector containing only "largest" or "smallest", specifying how missing values should be ordered within each column.

nan_distinct

A single logical specifying whether or not NaN should be considered distinct from NA for double and complex vectors. If TRUE, NaN will always be ordered between NA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representation of character vectors to use for collation, often used for locale-aware ordering.

If NULL, no transformation is done.
Otherwise, this must be a function of one argument. If the input contains a character vector, it will be passed to this function after it has been translated to UTF-8. This function should return a character vector with the same length as the input. The result should sort as expected in the C-locale, regardless of encoding.

For data frames, chr_proxy_collate will be applied to all character columns.

Common transformation functions include: tolower() for case-insensitive ordering and stringi::stri_sort_key() for locale-aware ordering.

Value

vec_order_radix() an integer vector the same size as x.
vec_sort_radix() a vector with the same size and type as x.

Differences with `order()`

Unlike the na.last argument of order() which decides the positions of missing values irrespective of the decreasing argument, the na_value argument of vec_order_radix() interacts with direction. If missing values are considered the largest value, they will appear last in ascending order, and first in descending order.

Character vectors are ordered in the C-locale. This is different from base::order(), which respects base::Sys.setlocale(). Sorting in a consistent locale can produce more reproducible results between different sessions and platforms, however, the results of sorting in the C-locale can be surprising. For example, capital letters sort before lower case letters. Sorting c("b", "C", "a") with vec_sort_radix() will return c("C", "a", "b"), but with base::order() will return c("a", "b", "C") unless base::order(method = "radix") is explicitly set, which also uses the C-locale. While sorting with the C-locale can be useful for algorithmic efficiency, in many real world uses it can be the cause of data analysis mistakes. To balance these trade-offs, you can supply a chr_proxy_collate function to transform character vectors into an alternative representation that orders in the C-locale in a less surprising way. For example, providing base::tolower() as a transform will order the original vector in a case-insensitive manner. Locale-aware ordering can be achieved by providing stringi::stri_sort_key() as a transform, setting the collation options as appropriate for your locale.

Character vectors are always translated to UTF-8 before ordering, and before any transform is applied by chr_proxy_collate.

For complex vectors, if either the real or imaginary component is NA or NaN, then the entire observation is considered missing.

Dependencies of `vec_order_radix()`

vec_proxy_order()

Examples

if (FALSE) {

x <- round(sample(runif(5), 9, replace = TRUE), 3)
x <- c(x, NA)

vec_order_radix(x)
vec_sort_radix(x)
vec_sort_radix(x, direction = "desc")

# Can also handle data frames
df <- data.frame(g = sample(2, 10, replace = TRUE), x = x)
vec_order_radix(df)
vec_sort_radix(df)
vec_sort_radix(df, direction = "desc")

# For data frames, `direction` and `na_value` are allowed to be vectors
# with length equal to the number of columns in the data frame
vec_sort_radix(
  df,
  direction = c("desc", "asc"),
  na_value = c("largest", "smallest")
)

# Character vectors are ordered in the C locale, which orders capital letters
# below lowercase ones
y <- c("B", "A", "a")
vec_sort_radix(y)

# To order in a case-insensitive manner, provide a `chr_proxy_collate`
# function that transforms the strings to all lowercase
vec_sort_radix(y, chr_proxy_collate = tolower)

}

Partially specify a factor

Description

This special class can be passed as a ptype in order to specify that the result should be a factor that contains at least the specified levels.

Usage

partial_factor(levels = character())

Arguments

levels

Character vector of labels.

Examples

pf <- partial_factor(levels = c("x", "y"))
pf

vec_ptype_common(factor("v"), factor("w"), .ptype = pf)

Partially specify columns of a data frame

Description

This special class can be passed to .ptype in order to specify the types of only some of the columns in a data frame.

Usage

partial_frame(...)

Arguments

...

Attributes of subclass

Examples

pf <- partial_frame(x = double())
pf

vec_rbind(
  data.frame(x = 1L, y = "a"),
  data.frame(x = FALSE, z = 10),
  .ptype = partial_frame(x = double(), a = character())
)

FAQ - Is my class compatible with vctrs?

Description

vctrs provides a framework for working with vector classes in a generic way. However, it implements several compatibility fallbacks to base R methods. In this reference you will find how vctrs tries to be compatible with your vector class, and what base methods you need to implement for compatibility.

If you’re starting from scratch, we think you’ll find it easier to start using new_vctr() as documented in vignette("s3-vector"). This guide is aimed for developers with existing vector classes.

Aggregate operations with fallbacks

All vctrs operations are based on four primitive generics described in the next section. However there are many higher level operations. The most important ones implement fallbacks to base generics for maximum compatibility with existing classes.

vec_slice() falls back to the base [ generic if no vec_proxy() method is implemented. This way foreign classes that do not implement vec_restore() can restore attributes based on the new subsetted contents.
vec_c() and vec_rbind() now fall back to base::c() if the inputs have a common parent class with a c() method (only if they have no self-to-self vec_ptype2() method).

vctrs works hard to make your c() method success in various situations (with NULL and NA inputs, even as first input which would normally prevent dispatch to your method). The main downside compared to using vctrs primitives is that you can’t combine vectors of different classes since there is no extensible mechanism of coercion in c(), and it is less efficient in some cases.

The vctrs primitives

Most functions in vctrs are aggregate operations: they call other vctrs functions which themselves call other vctrs functions. The dependencies of a vctrs functions are listed in the Dependencies section of its documentation page. Take a look at vec_count() for an example.

These dependencies form a tree whose leaves are the four vctrs primitives. Here is the diagram for vec_count():

The coercion generics

The coercion mechanism in vctrs is based on two generics:

vec_ptype2()
vec_cast()

See the theory overview.

Two objects with the same class and the same attributes are always considered compatible by ptype2 and cast. If the attributes or classes differ, they throw an incompatible type error.

Coercion errors are the main source of incompatibility with vctrs. See the howto guide if you need to implement methods for these generics.

The proxy and restoration generics

vec_proxy()
vec_restore()

These generics are essential for vctrs but mostly optional. vec_proxy() defaults to an identity function and you normally don’t need to implement it. The proxy a vector must be one of the atomic vector types, a list, or a data frame. By default, S3 lists that do not inherit from "list" do not have an identity proxy. In that case, you need to explicitly implement vec_proxy() or make your class inherit from list.

Runs

Description

vec_identify_runs() returns a vector of identifiers for the elements of x that indicate which run of repeated values they fall in. The number of runs is also returned as an attribute, n.
vec_run_sizes() returns an integer vector corresponding to the size of each run. This is identical to the times column from vec_unrep(), but is faster if you don't need the run keys.
vec_unrep() is a generalized base::rle(). It is documented alongside the "repeat" functions of vec_rep() and vec_rep_each(); look there for more information.

Usage

vec_identify_runs(x)

vec_run_sizes(x)

Arguments

x

A vector.

Details

Unlike base::rle(), adjacent missing values are considered identical when constructing runs. For example, vec_identify_runs(c(NA, NA)) will return c(1, 1), not c(1, 2).

Value

For vec_identify_runs(), an integer vector with the same size as x. A scalar integer attribute, n, is attached.
For vec_run_sizes(), an integer vector with size equal to the number of runs in x.

Examples

x <- c("a", "z", "z", "c", "a", "a")

vec_identify_runs(x)
vec_run_sizes(x)
vec_unrep(x)

y <- c(1, 1, 1, 2, 2, 3)

# With multiple columns, the runs are constructed rowwise
df <- data_frame(
  x = x,
  y = y
)

vec_identify_runs(df)
vec_run_sizes(df)
vec_unrep(df)

Register a method for a suggested dependency

Description

Generally, the recommend way to register an S3 method is to use the S3Method() namespace directive (often generated automatically by the ⁠@export⁠ roxygen2 tag). However, this technique requires that the generic be in an imported package, and sometimes you want to suggest a package, and only provide a method when that package is loaded. s3_register() can be called from your package's .onLoad() to dynamically register a method only if the generic's package is loaded.

Arguments

generic

Name of the generic in the form pkg::generic.

class

Name of the class

method

Optionally, the implementation of the method. By default, this will be found by looking for a function called generic.class in the package environment.

Note that providing method can be dangerous if you use devtools. When the namespace of the method is reloaded by devtools::load_all(), the function will keep inheriting from the old namespace. This might cause crashes because of dangling .Call() pointers.

Details

For R 3.5.0 and later, s3_register() is also useful when demonstrating class creation in a vignette, since method lookup no longer always involves the lexical scope. For R 3.6.0 and later, you can achieve a similar effect by using "delayed method registration", i.e. placing the following in your NAMESPACE file:

if (getRversion() >= "3.6.0") {
  S3method(package::generic, class)
}

Usage in other packages

To avoid taking a dependency on vctrs, you copy the source of s3_register() into your own package. It is licensed under the permissive unlicense to make it crystal clear that we're happy for you to do this. There's no need to include the license or even credit us when using this function.

Examples

# A typical use case is to dynamically register tibble/pillar methods
# for your class. That way you avoid creating a hard dependency on packages
# that are not essential, while still providing finer control over
# printing when they are used.

.onLoad <- function(...) {
  s3_register("pillar::pillar_shaft", "vctrs_vctr")
  s3_register("tibble::type_sum", "vctrs_vctr")
}

Table S3 class

Description

These functions help the base table class fit into the vctrs type system by providing coercion and casting functions.

FAQ - How does coercion work in vctrs?

Description

This is an overview of the usage of vec_ptype2() and vec_cast() and their role in the vctrs coercion mechanism. Related topics:

For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.
For a tutorial about implementing vctrs classes from scratch, see vignette("s3-vector").

Combination mechanism in vctrs

The coercion system in vctrs is designed to make combination of multiple inputs consistent and extensible. Combinations occur in many places, such as row-binding, joins, subset-assignment, or grouped summary functions that use the split-apply-combine strategy. For example:

vec_c(TRUE, 1)
#> [1] 1 1

vec_c("a", 1)
#> Error in `vec_c()`:
#> ! Can't combine `..1` <character> and `..2` <double>.

vec_rbind(
  data.frame(x = TRUE),
  data.frame(x = 1, y = 2)
)
#>   x  y
#> 1 1 NA
#> 2 1  2

vec_rbind(
  data.frame(x = "a"),
  data.frame(x = 1, y = 2)
)
#> Error in `vec_rbind()`:
#> ! Can't combine `..1$x` <character> and `..2$x` <double>.

One major goal of vctrs is to provide a central place for implementing the coercion methods that make generic combinations possible. The two relevant generics are vec_ptype2() and vec_cast(). They both take two arguments and perform double dispatch, meaning that a method is selected based on the classes of both inputs.

The general mechanism for combining multiple inputs is:

Find the common type of a set of inputs by reducing (as in base::Reduce() or purrr::reduce()) the vec_ptype2() binary function over the set.
Convert all inputs to the common type with vec_cast().
Initialise the output vector as an instance of this common type with vec_init().
Fill the output vector with the elements of the inputs using vec_assign().

The last two steps may require vec_proxy() and vec_restore() implementations, unless the attributes of your class are constant and do not depend on the contents of the vector. We focus here on the first two steps, which require vec_ptype2() and vec_cast() implementations.

`vec_ptype2()`

Methods for vec_ptype2() are passed two prototypes, i.e. two inputs emptied of their elements. They implement two behaviours:

If the types of their inputs are compatible, indicate which of them is the richer type by returning it. If the types are of equal resolution, return any of the two.
Throw an error with stop_incompatible_type() when it can be determined from the attributes that the types of the inputs are not compatible.

Type compatibility

A type is compatible with another type if the values it represents are a subset or a superset of the values of the other type. The notion of “value” is to be interpreted at a high level, in particular it is not the same as the memory representation. For example, factors are represented in memory with integers but their values are more related to character vectors than to round numbers:

# Two factors are compatible
vec_ptype2(factor("a"), factor("b"))
#> factor()
#> Levels: a b

# Factors are compatible with a character
vec_ptype2(factor("a"), "b")
#> character(0)

# But they are incompatible with integers
vec_ptype2(factor("a"), 1L)
#> Error:
#> ! Can't combine `factor("a")` <factor<4d52a>> and `1L` <integer>.

Richness of type

Richness of type is not a very precise notion. It can be about richer data (for instance a double vector covers more values than an integer vector), richer behaviour (a data.table has richer behaviour than a data.frame), or both. If you have trouble determining which one of the two types is richer, it probably means they shouldn’t be automatically coercible.

Let’s look again at what happens when we combine a factor and a character:

vec_ptype2(factor("a"), "b")
#> character(0)

The ptype2 method for ⁠<character>⁠ and ⁠<factor<"a">>⁠ returns ⁠<character>⁠ because the former is a richer type. The factor can only contain "a" strings, whereas the character can contain any strings. In this sense, factors are a subset of character.

Note that another valid behaviour would be to throw an incompatible type error. This is what a strict factor implementation would do. We have decided to be laxer in vctrs because it is easy to inadvertently create factors instead of character vectors, especially with older versions of R where stringsAsFactors is still true by default.

Consistency and symmetry on permutation

Each ptype2 method should strive to have exactly the same behaviour when the inputs are permuted. This is not always possible, for example factor levels are aggregated in order:

vec_ptype2(factor(c("a", "c")), factor("b"))
#> factor()
#> Levels: a c b

vec_ptype2(factor("b"), factor(c("a", "c")))
#> factor()
#> Levels: b a c

In any case, permuting the input should not return a fundamentally different type or introduce an incompatible type error.

Coercion hierarchy

The classes that you can coerce together form a coercion (or subtyping) hierarchy. Below is a schema of the hierarchy for the base types like integer and factor. In this diagram the directions of the arrows express which type is richer. They flow from the bottom (more constrained types) to the top (richer types).

A coercion hierarchy is distinct from the structural hierarchy implied by memory types and classes. For instance, in a structural hierarchy, factors are built on top of integers. But in the coercion hierarchy they are more related to character vectors. Similarly, subclasses are not necessarily coercible with their superclasses because the coercion and structural hierarchies are separate.

Implementing a coercion hierarchy

As a class implementor, you have two options. The simplest is to create an entirely separate hierarchy. The date and date-time classes are an example of an S3-based hierarchy that is completely separate. Alternatively, you can integrate your class in an existing hierarchy, typically by adding parent nodes on top of the hierarchy (your class is richer), by adding children node at the root of the hierarchy (your class is more constrained), or by inserting a node in the tree.

These coercion hierarchies are implicit, in the sense that they are implied by the vec_ptype2() implementations. There is no structured way to create or modify a hierarchy, instead you need to implement the appropriate coercion methods for all the types in your hierarchy, and diligently return the richer type in each case. The vec_ptype2() implementations are not transitive nor inherited, so all pairwise methods between classes lying on a given path must be implemented manually. This is something we might make easier in the future.

`vec_cast()`

The second generic, vec_cast(), is the one that looks at the data and actually performs the conversion. Because it has access to more information than vec_ptype2(), it may be stricter and cause an error in more cases. vec_cast() has three possible behaviours:

Determine that the prototypes of the two inputs are not compatible. This must be decided in exactly the same way as for vec_ptype2(). Call stop_incompatible_cast() if you can determine from the attributes that the types are not compatible.
Detect incompatible values. Usually this is because the target type is too restricted for the values supported by the input type. For example, a fractional number can’t be converted to an integer. The method should throw an error in that case.
Return the input vector converted to the target type if all values are compatible. Whereas vec_ptype2() must return the same type when the inputs are permuted, vec_cast() is directional. It always returns the type of the right-hand side, or dies trying.

Double dispatch

The dispatch mechanism for vec_ptype2() and vec_cast() looks like S3 but is actually a custom mechanism. Compared to S3, it has the following differences:

It dispatches on the classes of the first two inputs.
There is no inheritance of ptype2 and cast methods. This is because the S3 class hierarchy is not necessarily the same as the coercion hierarchy.
NextMethod() does not work. Parent methods must be called explicitly if necessary.
The default method is hard-coded.

Data frames

The determination of the common type of data frames with vec_ptype2() happens in three steps:

Match the columns of the two input data frames. If some columns don’t exist, they are created and filled with adequately typed NA values.
Find the common type for each column by calling vec_ptype2() on each pair of matched columns.
Find the common data frame type. For example the common type of a grouped tibble and a tibble is a grouped tibble because the latter is the richer type. The common type of a data table and a data frame is a data table.

vec_cast() operates similarly. If a data frame is cast to a target type that has fewer columns, this is an error.

If you are implementing coercion methods for data frames, you will need to explicitly call the parent methods that perform the common type determination or the type conversion described above. These are exported as df_ptype2() and df_cast().

Data frame fallbacks

Being too strict with data frame combinations would cause too much pain because there are many data frame subclasses in the wild that don’t implement vctrs methods. We have decided to implement a special fallback behaviour for foreign data frames. Incompatible data frames fall back to a base data frame:

df1 <- data.frame(x = 1)
df2 <- structure(df1, class = c("foreign_df", "data.frame"))

vec_rbind(df1, df2)
#>   x
#> 1 1
#> 2 1

When a tibble is involved, we fall back to tibble:

df3 <- tibble::as_tibble(df1)

vec_rbind(df1, df3)
#> # A tibble: 2 x 1
#>       x
#>   <dbl>
#> 1     1
#> 2     1

These fallbacks are not ideal but they make sense because all data frames share a common data structure. This is not generally the case for vectors. For example factors and characters have different representations, and it is not possible to find a fallback time mechanically.

However this fallback has a big downside: implementing vctrs methods for your data frame subclass is a breaking behaviour change. The proper coercion behaviour for your data frame class should be specified as soon as possible to limit the consequences of changing the behaviour of your class in R scripts.

FAQ - How does recycling work in vctrs and the tidyverse?

Description

Recycling describes the concept of repeating elements of one vector to match the size of another. There are two rules that underlie the “tidyverse” recycling rules:

Vectors of size 1 will be recycled to the size of any other vector
Otherwise, all vectors must have the same size

Examples

Vectors of size 1 are recycled to the size of any other vector:

tibble(x = 1:3, y = 1L)
#> # A tibble: 3 x 2
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     2     1
#> 3     3     1

This includes vectors of size 0:

tibble(x = integer(), y = 1L)
#> # A tibble: 0 x 2
#> # i 2 variables: x <int>, y <int>

If vectors aren’t size 1, they must all be the same size. Otherwise, an error is thrown:

tibble(x = 1:3, y = 4:7)
#> Error in `tibble()`:
#> ! Tibble columns must have compatible sizes.
#> * Size 3: Existing data.
#> * Size 4: Column `y`.
#> i Only values of size one are recycled.

vctrs backend

Packages in r-lib and the tidyverse generally use vec_size_common() and vec_recycle_common() as the backends for handling recycling rules.

vec_size_common() returns the common size of multiple vectors, after applying the recycling rules
vec_recycle_common() goes one step further, and actually recycles the vectors to their common size

vec_size_common(1:3, "x")
#> [1] 3

vec_recycle_common(1:3, "x")
#> [[1]]
#> [1] 1 2 3
#> 
#> [[2]]
#> [1] "x" "x" "x"

vec_size_common(1:3, c("x", "y"))
#> Error:
#> ! Can't recycle `..1` (size 3) to match `..2` (size 2).

Base R recycling rules

The recycling rules described here are stricter than the ones generally used by base R, which are:

If any vector is length 0, the output will be length 0
Otherwise, the output will be length max(length_x, length_y), and a warning will be thrown if the length of the longer vector is not an integer multiple of the length of the shorter vector.

We explore the base R rules in detail in vignette("type-size").

A 1d vector of unspecified type

Description

This is a partial type used to represent logical vectors that only contain NA. These require special handling because we want to allow NA to specify missingness without requiring a type.

Usage

unspecified(n = 0)

Arguments

n

Length of vector

Examples

vec_ptype_show()
vec_ptype_show(NA)

vec_c(NA, factor("x"))
vec_c(NA, Sys.Date())
vec_c(NA, Sys.time())
vec_c(NA, list(1:3, 4:5))

Custom conditions for vctrs package

Description

These functions are called for their side effect of raising errors and warnings. These conditions have custom classes and structures to make testing easier.

Usage

stop_incompatible_type(
  x,
  y,
  ...,
  x_arg,
  y_arg,
  action = c("combine", "convert"),
  details = NULL,
  message = NULL,
  class = NULL,
  call = caller_env()
)

stop_incompatible_cast(
  x,
  to,
  ...,
  x_arg,
  to_arg,
  details = NULL,
  message = NULL,
  class = NULL,
  call = caller_env()
)

stop_incompatible_op(
  op,
  x,
  y,
  details = NULL,
  ...,
  message = NULL,
  class = NULL,
  call = caller_env()
)

stop_incompatible_size(
  x,
  y,
  x_size,
  y_size,
  ...,
  x_arg,
  y_arg,
  details = NULL,
  message = NULL,
  class = NULL,
  call = caller_env()
)

allow_lossy_cast(expr, x_ptype = NULL, to_ptype = NULL)

Arguments

x, y, to

Vectors

..., class

Only use these fields when creating a subclass.

x_arg, y_arg, to_arg

Argument names for x, y, and to. Used in error messages to inform the user about the locations of incompatible types.

action

An option to customize the incompatible type message depending on the context. Errors thrown from vec_ptype2() use "combine" and those thrown from vec_cast() use "convert".

details

Any additional human readable details.

message

An overriding message for the error. details and message are mutually exclusive, supplying both is an error.

call

x_ptype, to_ptype

Suppress only the casting errors where x or to match these prototypes.

Value

⁠stop_incompatible_*()⁠ unconditionally raise an error of class "vctrs_error_incompatible_*" and "vctrs_error_incompatible".

Examples


# Most of the time, `maybe_lossy_cast()` returns its input normally:
maybe_lossy_cast(
  c("foo", "bar"),
  NA,
  "",
  lossy = c(FALSE, FALSE),
  x_arg = "",
  to_arg = ""
)

# If `lossy` has any `TRUE`, an error is thrown:
try(maybe_lossy_cast(
  c("foo", "bar"),
  NA,
  "",
  lossy = c(FALSE, TRUE),
  x_arg = "",
  to_arg = ""
))

# Unless lossy casts are allowed:
allow_lossy_cast(
  maybe_lossy_cast(
    c("foo", "bar"),
    NA,
    "",
    lossy = c(FALSE, TRUE),
    x_arg = "",
    to_arg = ""
  )
)

vctrs methods for data frames

Description

These functions help the base data.frame class fit into the vctrs type system by providing coercion and casting functions.

Usage

## S3 method for class 'data.frame'
vec_ptype2(x, y, ...)

## S3 method for class 'data.frame'
vec_cast(x, to, ...)

Arithmetic operations

Description

This generic provides a common double dispatch mechanism for all infix operators (+, -, /, *, ^, %%, %/%, !, &, |). It is used to power the default arithmetic and boolean operators for vctrs objects, overcoming the limitations of the base Ops generic.

Usage

vec_arith(op, x, y, ...)

## Default S3 method:
vec_arith(op, x, y, ...)

## S3 method for class 'logical'
vec_arith(op, x, y, ...)

## S3 method for class 'numeric'
vec_arith(op, x, y, ...)

vec_arith_base(op, x, y)

MISSING()

Arguments

op

An arithmetic operator as a string

x, y

A pair of vectors. For !, unary + and unary -, y will be a sentinel object of class MISSING, as created by MISSING().

...

These dots are for future extensions and must be empty.

Details

vec_arith_base() is provided as a convenience for writing methods. It recycles x and y to common length then calls the base operator with the underlying vec_data().

vec_arith() is also used in diff.vctrs_vctr() method via -.

Examples

d <- as.Date("2018-01-01")
dt <- as.POSIXct("2018-01-02 12:00")
t <- as.difftime(12, unit = "hours")

vec_arith("-", dt, 1)
vec_arith("-", dt, t)
vec_arith("-", dt, d)

vec_arith("+", dt, 86400)
vec_arith("+", dt, t)
vec_arith("+", t, t)

vec_arith("/", t, t)
vec_arith("/", t, 2)

vec_arith("*", t, 2)

Convert to an index vector

Description

vec_as_index() has been renamed to vec_as_location() and is deprecated as of vctrs 0.2.2.

Usage

vec_as_index(i, n, names = NULL)

Arguments

i

An integer, character or logical vector specifying the locations or names of the observations to get/set. Specify TRUE to index all elements (as in x[]), or NULL, FALSE or integer() to index none (as in x[NULL]).

n

A single integer representing the total size of the object that i is meant to index into.

names

If i is a character vector, names should be a character vector that i will be matched against to construct the index. Otherwise, not used. The default value of NULL will result in an error if i is a character vector.

Create a vector of locations

Description

These helpers provide a means of standardizing common indexing methods such as integer, character or logical indexing.

vec_as_location() accepts integer, character, or logical vectors of any size. The output is always an integer vector that is suitable for subsetting with [ or vec_slice(). It might be a different size than the input because negative selections are transformed to positive ones and logical vectors are transformed to a vector of indices for the TRUE locations.
vec_as_location2() accepts a single number or string. It returns a single location as a integer vector of size 1. This is suitable for extracting with [[.
num_as_location() and num_as_location2() are specialized variants that have extra options for numeric indices.

Usage

vec_as_location(
  i,
  n,
  names = NULL,
  ...,
  missing = c("propagate", "remove", "error"),
  arg = caller_arg(i),
  call = caller_env()
)

num_as_location(
  i,
  n,
  ...,
  missing = c("propagate", "remove", "error"),
  negative = c("invert", "error", "ignore"),
  oob = c("error", "remove", "extend"),
  zero = c("remove", "error", "ignore"),
  arg = caller_arg(i),
  call = caller_env()
)

vec_as_location2(
  i,
  n,
  names = NULL,
  ...,
  missing = c("error", "propagate"),
  arg = caller_arg(i),
  call = caller_env()
)

num_as_location2(
  i,
  n,
  ...,
  negative = c("error", "ignore"),
  missing = c("error", "propagate"),
  arg = caller_arg(i),
  call = caller_env()
)

Arguments

i

n

A single integer representing the total size of the object that i is meant to index into.

names

...

These dots are for future extensions and must be empty.

missing

How should missing i values be handled?

"error" throws an error.
"propagate" returns them as is.
"remove" removes them.

By default, vector subscripts propagate missing values but scalar subscripts error on them.

Propagated missing values can't be combined with negative indices when negative = "invert", because they can't be meaningfully inverted.

arg

The argument name to be displayed in error messages.

call

negative

How should negative i values be handled?

"error" throws an error.
"ignore" returns them as is.
"invert" returns the positive location generated by inverting the negative location. When inverting, positive and negative locations can't be mixed. This option is only applicable for num_as_location().

oob

How should out-of-bounds i values be handled?

"error" throws an error.
"remove" removes both positive and negative out-of-bounds locations.
"extend" allows positive out-of-bounds locations if they directly follow the end of a vector. This can be used to implement extendable vectors, like letters[1:30].

zero

How should zero i values be handled?

"error" throws an error.
"remove" removes them.
"ignore" returns them as is.

Value

vec_as_location() and num_as_location() return an integer vector that can be used as an index in a subsetting operation.
vec_as_location2() and num_as_location2() return an integer of size 1 that can be used a scalar index for extracting an element.

Examples

x <- array(1:6, c(2, 3))
dimnames(x) <- list(c("r1", "r2"), c("c1", "c2", "c3"))

# The most common use case validates row indices
vec_as_location(1, vec_size(x))

# Negative indices can be used to index from the back
vec_as_location(-1, vec_size(x))

# Character vectors can be used if `names` are provided
vec_as_location("r2", vec_size(x), rownames(x))

# You can also construct an index for dimensions other than the first
vec_as_location(c("c2", "c1"), ncol(x), colnames(x))

Retrieve and repair names

Description

vec_as_names() takes a character vector of names and repairs it according to the repair argument. It is the r-lib and tidyverse equivalent of base::make.names().

vctrs deals with a few levels of name repair:

minimal names exist. The names attribute is not NULL. The name of an unnamed element is "" and never NA. For instance, vec_as_names() always returns minimal names and data frames created by the tibble package have names that are, at least, minimal.
unique names are minimal, have no duplicates, and can be used where a variable name is expected. Empty names, ..., and .. followed by a sequence of digits are banned.
- All columns can be accessed by name via df[["name"]] and df$`name` and with(df, `name`).
universal names are unique and syntactic (see Details for more).
- Names work everywhere, without quoting: df$name and with(df, name) and lm(name1 ~ name2, data = df) and dplyr::select(df, name) all work.

universal implies unique, unique implies minimal. These levels are nested.

Usage

vec_as_names(
  names,
  ...,
  repair = c("minimal", "unique", "universal", "check_unique", "unique_quiet",
    "universal_quiet"),
  repair_arg = NULL,
  quiet = FALSE,
  call = caller_env()
)

Arguments

names

A character vector.

...

These dots are for future extensions and must be empty.

repair

Either a string or a function. If a string, it must be one of "check_unique", "minimal", "unique", "universal", "unique_quiet", or "universal_quiet". If a function, it is invoked with a vector of minimal names and must return minimal names, otherwise an error is thrown.

Minimal names are never NULL or NA. When an element doesn't have a name, its minimal name is an empty string.
Unique names are unique. A suffix is appended to duplicate names to make them unique.
Universal names are unique and syntactic, meaning that you can safely use the names as variables without causing a syntax error.

The "check_unique" option doesn't perform any name repair. Instead, an error is raised if the names don't suit the "unique" criteria.

The options "unique_quiet" and "universal_quiet" are here to help the user who calls this function indirectly, via another function which exposes repair but not quiet. Specifying repair = "unique_quiet" is like specifying ⁠repair = "unique", quiet = TRUE⁠. When the "*_quiet" options are used, any setting of quiet is silently overridden.

repair_arg

If specified and repair = "check_unique", any errors will include a hint to set the repair_arg.

quiet

By default, the user is informed of any renaming caused by repairing the names. This only concerns unique and universal repairing. Set quiet to TRUE to silence the messages.

Users can silence the name repair messages by setting the "rlib_name_repair_verbosity" global option to "quiet".

call

`minimal` names

minimal names exist. The names attribute is not NULL. The name of an unnamed element is "" and never NA.

Examples:

Original names of a vector with length 3: NULL
                           minimal names: "" "" ""

                          Original names: "x" NA
                           minimal names: "x" ""

`unique` names

unique names are minimal, have no duplicates, and can be used (possibly with backticks) in contexts where a variable is expected. Empty names, ..., and .. followed by a sequence of digits are banned. If a data frame has unique names, you can index it by name, and also access the columns by name. In particular, df[["name"]] and df$`name` and also with(df, `name`) always work.

There are many ways to make names unique. We append a suffix of the form ...j to any name that is "" or a duplicate, where j is the position. We also change ..# and ... to ...#.

Example:

Original names:     ""     "x"     "" "y"     "x"  "..2"  "..."
  unique names: "...1" "x...2" "...3" "y" "x...5" "...6" "...7"

Pre-existing suffixes of the form ...j are always stripped, prior to making names unique, i.e. reconstructing the suffixes. If this interacts poorly with your names, you should take control of name repair.

`universal` names

universal names are unique and syntactic, meaning they:

Are never empty (inherited from unique).
Have no duplicates (inherited from unique).
Are not .... Do not have the form ..i, where i is a number (inherited from unique).
Consist of letters, numbers, and the dot . or underscore ⁠_⁠ characters.
Start with a letter or start with the dot . not followed by a number.
Are not a reserved word, e.g., if or function or TRUE.

If a vector has universal names, variable names can be used "as is" in code. They work well with nonstandard evaluation, e.g., df$name works.

vctrs has a different method of making names syntactic than base::make.names(). In general, vctrs prepends one or more dots . until the name is syntactic.

Examples:

 Original names:     ""     "x"    NA      "x"
universal names: "...1" "x...2" "...3" "x...4"

  Original names: "(y)"  "_z"  ".2fa"  "FALSE"
 universal names: ".y." "._z" "..2fa" ".FALSE"

Examples

# By default, `vec_as_names()` returns minimal names:
vec_as_names(c(NA, NA, "foo"))

# You can make them unique:
vec_as_names(c(NA, NA, "foo"), repair = "unique")

# Universal repairing fixes any non-syntactic name:
vec_as_names(c("_foo", "+"), repair = "universal")

Repair names with legacy method

Description

This standardises names with the legacy approach that was used in tidyverse packages (such as tibble, tidyr, and readxl) before vec_as_names() was implemented. This tool is meant to help transitioning to the new name repairing standard and will be deprecated and removed from the package some time in the future.

Usage

vec_as_names_legacy(names, prefix = "V", sep = "")

Arguments

names

A character vector.

prefix, sep

Prefix and separator for repaired names.

Examples

if (rlang::is_installed("tibble")) {

library(tibble)

# Names repair is turned off by default in tibble:
try(tibble(a = 1, a = 2))

# You can turn it on by supplying a repair method:
tibble(a = 1, a = 2, .name_repair = "universal")

# If you prefer the legacy method, use `vec_as_names_legacy()`:
tibble(a = 1, a = 2, .name_repair = vec_as_names_legacy)

}

Convert to a base subscript type

Description

Convert i to the base type expected by vec_as_location() or vec_as_location2(). The values of the subscript type are not checked in any way (length, missingness, negative elements).

Usage

vec_as_subscript(
  i,
  ...,
  logical = c("cast", "error"),
  numeric = c("cast", "error"),
  character = c("cast", "error"),
  arg = NULL,
  call = caller_env()
)

vec_as_subscript2(
  i,
  ...,
  numeric = c("cast", "error"),
  character = c("cast", "error"),
  arg = NULL,
  call = caller_env()
)

Arguments

i

...

These dots are for future extensions and must be empty.

logical, numeric, character

How to handle logical, numeric, and character subscripts.

If "cast" and the subscript is not one of the three base types (logical, integer or character), the subscript is cast to the relevant base type, e.g. factors are coerced to character. NULL is treated as an empty integer vector, and is thus coercible depending on the setting of numeric. Symbols are treated as character vectors and thus coercible depending on the setting of character.

If "error", the subscript type is disallowed and triggers an informative error.

arg

The argument name to be displayed in error messages.

call

Assert an argument has known prototype and/or size

Description

vec_is() is a predicate that checks if its input is a vector that conforms to a prototype and/or a size.
vec_assert() throws an error when the input is not a vector or doesn't conform.

Usage

vec_assert(
  x,
  ptype = NULL,
  size = NULL,
  arg = caller_arg(x),
  call = caller_env()
)

vec_is(x, ptype = NULL, size = NULL)

Arguments

x

A vector argument to check.

ptype

Prototype to compare against. If the prototype has a class, its vec_ptype() is compared to that of x with identical(). Otherwise, its typeof() is compared to that of x with ==.

size

A single integer size against which to compare.

arg

Name of argument being checked. This is used in error messages. The label of the expression passed as x is taken as default.

call

Value

vec_is() returns TRUE or FALSE. vec_assert() either throws a typed error (see section on error types) or returns x, invisibly.

Error types

vec_is() never throws. vec_assert() throws the following errors:

If the input is not a vector, an error of class "vctrs_error_scalar_type" is raised.
If the prototype doesn't match, an error of class "vctrs_error_assert_ptype" is raised.
If the size doesn't match, an error of class "vctrs_error_assert_size" is raised.

Both errors inherit from "vctrs_error_assert".

Lifecycle

Both vec_is() and vec_assert() are questioning because their ptype arguments have semantics that are challenging to define clearly and are rarely useful.

Use obj_is_vector() or obj_check_vector() for vector checks
Use vec_check_size() for size checks
Use vec_cast(), inherits(), or simple type predicates like rlang::is_logical() for specific type checks

Vectors and scalars

Informally, a vector is a collection that makes sense to use as column in a data frame. The following rules define whether or not x is considered a vector.

If no vec_proxy() method has been registered, x is a vector if:

The base type of the object is atomic: "logical", "integer", "double", "complex", "character", or "raw".
x is a list, as defined by obj_is_list().
x is a data.frame.

If a vec_proxy() method has been registered, x is a vector if:

The proxy satisfies one of the above conditions.
The base type of the proxy is "list", regardless of its class. S3 lists are thus treated as scalars unless they implement a vec_proxy() method.

Otherwise an object is treated as scalar and cannot be used as a vector. In particular:

NULL is not a vector.
S3 lists like lm objects are treated as scalars by default.
Objects of type expression are not treated as vectors.

Combine many data frames into one data frame

Description

This pair of functions binds together data frames (and vectors), either row-wise or column-wise. Row-binding creates a data frame with common type across all arguments. Column-binding creates a data frame with common length across all arguments.

Usage

vec_rbind(
  ...,
  .ptype = NULL,
  .names_to = rlang::zap(),
  .name_repair = c("unique", "universal", "check_unique", "unique_quiet",
    "universal_quiet"),
  .name_spec = NULL,
  .error_call = current_env()
)

vec_cbind(
  ...,
  .ptype = NULL,
  .size = NULL,
  .name_repair = c("unique", "universal", "check_unique", "minimal", "unique_quiet",
    "universal_quiet"),
  .error_call = current_env()
)

Arguments

...

Data frames or vectors.

When the inputs are named:

vec_rbind() assigns names to row names unless .names_to is supplied. In that case the names are assigned in the column defined by .names_to.
vec_cbind() creates packed data frame columns with named inputs.

NULL inputs are silently ignored. Empty (e.g. zero row) inputs will not appear in the output, but will affect the derived .ptype.

.ptype

If NULL, the default, the output type is determined by computing the common type across all elements of ....

.names_to

This controls what to do with input names supplied in ....

By default, input names are zapped.
If a string, specifies a column where the input names will be copied. These names are often useful to identify rows with their original input. If a column name is supplied and ... is not named, an integer column is used instead.
If NULL, the input names are used as row names.

.name_repair

One of "unique", "universal", "check_unique", "unique_quiet", or "universal_quiet". See vec_as_names() for the meaning of these options.

With vec_rbind(), the repair function is applied to all inputs separately. This is because vec_rbind() needs to align their columns before binding the rows, and thus needs all inputs to have unique names. On the other hand, vec_cbind() applies the repair function after all inputs have been concatenated together in a final data frame. Hence vec_cbind() allows the more permissive minimal names repair.

.name_spec

A name specification (as documented in vec_c()) for combining the outer inputs names in ... and the inner row names of the inputs. This only has an effect when .names_to is set to NULL, which causes the input names to be assigned as row names.

.error_call

.size

If, NULL, the default, will determine the number of rows in vec_cbind() output by using the tidyverse recycling rules.

Alternatively, specify the desired number of rows, and any inputs of length 1 will be recycled appropriately.

Value

A data frame, or subclass of data frame.

If ... is a mix of different data frame subclasses, vec_ptype2() will be used to determine the output type. For vec_rbind(), this will determine the type of the container and the type of each column; for vec_cbind() it only determines the type of the output container. If there are no non-NULL inputs, the result will be data.frame().

Invariants

All inputs are first converted to a data frame. The conversion for 1d vectors depends on the direction of binding:

For vec_rbind(), each element of the vector becomes a column in a single row.
For vec_cbind(), each element of the vector becomes a row in a single column.

Once the inputs have all become data frames, the following invariants are observed for row-binding:

vec_size(vec_rbind(x, y)) == vec_size(x) + vec_size(y)
vec_ptype(vec_rbind(x, y)) = vec_ptype_common(x, y)

Note that if an input is an empty vector, it is first converted to a 1-row data frame with 0 columns. Despite being empty, its effective size for the total number of rows is 1.

For column-binding, the following invariants apply:

vec_size(vec_cbind(x, y)) == vec_size_common(x, y)
vec_ptype(vec_cbind(x, y)) == vec_cbind(vec_ptype(x), vec_ptype(x))

Dependencies

vctrs dependencies

vec_cast_common()
vec_proxy()
vec_init()
vec_assign()
vec_restore()

base dependencies of `vec_rbind()`

base::c()

If columns to combine inherit from a common class, vec_rbind() falls back to base::c() if there exists a c() method implemented for this class hierarchy.

Examples

# row binding -----------------------------------------

# common columns are coerced to common class
vec_rbind(
  data.frame(x = 1),
  data.frame(x = FALSE)
)

# unique columns are filled with NAs
vec_rbind(
  data.frame(x = 1),
  data.frame(y = "x")
)

# null inputs are ignored
vec_rbind(
  data.frame(x = 1),
  NULL,
  data.frame(x = 2)
)

# bare vectors are treated as rows
vec_rbind(
  c(x = 1, y = 2),
  c(x = 3)
)

# default names will be supplied if arguments are not named
vec_rbind(
  1:2,
  1:3,
  1:4
)

# column binding --------------------------------------

# each input is recycled to have common length
vec_cbind(
  data.frame(x = 1),
  data.frame(y = 1:3)
)

# bare vectors are treated as columns
vec_cbind(
  data.frame(x = 1),
  y = letters[1:3]
)

# if you supply a named data frame, it is packed in a single column
data <- vec_cbind(
  x = data.frame(a = 1, b = 2),
  y = 1
)
data

# Packed data frames are nested in a single column. This makes it
# possible to access it through a single name:
data$x

# since the base print method is suboptimal with packed data
# frames, it is recommended to use tibble to work with these:
if (rlang::is_installed("tibble")) {
  vec_cbind(x = tibble::tibble(a = 1, b = 2), y = 1)
}

# duplicate names are flagged
vec_cbind(x = 1, x = 2)

Combine many vectors into one vector

Description

Combine all arguments into a new vector of common type.

Usage

vec_c(
  ...,
  .ptype = NULL,
  .name_spec = NULL,
  .name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet",
    "universal_quiet"),
  .error_arg = "",
  .error_call = current_env()
)

Arguments

...

Vectors to coerce.

.ptype

If NULL, the default, the output type is determined by computing the common type across all elements of ....

.name_spec

A function of two arguments. The outer name is passed as a string to the first argument, and the inner names or positions are passed as second argument.
An anonymous function as a purrr-style formula.
A glue specification of the form "{outer}_{inner}".
An rlang::zap() object, in which case both outer and inner names are ignored and the result is unnamed.

See the name specification topic.

.name_repair

How to repair names, see repair options in vec_as_names().

.error_arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

.error_call

Value

A vector with class given by .ptype, and length equal to the sum of the vec_size() of the contents of ....

The vector will have names if the individual components have names (inner names) or if the arguments are named (outer names). If both inner and outer names are present, an error is thrown unless a .name_spec is provided.

Invariants

vec_size(vec_c(x, y)) == vec_size(x) + vec_size(y)
vec_ptype(vec_c(x, y)) == vec_ptype_common(x, y).

Dependencies

vctrs dependencies

vec_cast_common() with fallback
vec_proxy()
vec_restore()

base dependencies

base::c()

If inputs inherit from a common class hierarchy, vec_c() falls back to base::c() if there exists a c() method implemented for this class hierarchy.

Examples

vec_c(FALSE, 1L, 1.5)

# Date/times --------------------------
c(Sys.Date(), Sys.time())
c(Sys.time(), Sys.Date())

vec_c(Sys.Date(), Sys.time())
vec_c(Sys.time(), Sys.Date())

# Factors -----------------------------
c(factor("a"), factor("b"))
vec_c(factor("a"), factor("b"))


# By default, named inputs must be length 1:
vec_c(name = 1)
try(vec_c(name = 1:3))

# Pass a name specification to work around this:
vec_c(name = 1:3, .name_spec = "{outer}_{inner}")

# See `?name_spec` for more examples of name specifications.

Cast a vector to a specified type

Description

vec_cast() provides directional conversions from one type of vector to another. Along with vec_ptype2(), this generic forms the foundation of type coercions in vctrs.

Usage

vec_cast(x, to, ..., x_arg = caller_arg(x), to_arg = "", call = caller_env())

vec_cast_common(..., .to = NULL, .arg = "", .call = caller_env())

## S3 method for class 'logical'
vec_cast(x, to, ...)

## S3 method for class 'integer'
vec_cast(x, to, ...)

## S3 method for class 'double'
vec_cast(x, to, ...)

## S3 method for class 'complex'
vec_cast(x, to, ...)

## S3 method for class 'raw'
vec_cast(x, to, ...)

## S3 method for class 'character'
vec_cast(x, to, ...)

## S3 method for class 'list'
vec_cast(x, to, ...)

Arguments

x

Vectors to cast.

to, .to

Type to cast to. If NULL, x will be returned as is.

...

For vec_cast_common(), vectors to cast. For vec_cast(), vec_cast_default(), and vec_restore(), these dots are only for future extensions and should be empty.

x_arg

Argument name for x, used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

to_arg

Argument name to used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

call, .call

.arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

Value

A vector the same length as x with the same type as to, or an error if the cast is not possible. An error is generated if information is lost when casting between compatible types (i.e. when there is no 1-to-1 mapping for a specific value).

Implementing coercion methods

For an overview of how these generics work and their roles in vctrs, see ?theory-faq-coercion.
For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.
For a tutorial about implementing vctrs classes from scratch, see vignette("s3-vector").

Dependencies of `vec_cast_common()`

vctrs dependencies

vec_ptype2()
vec_cast()

base dependencies

Some functions enable a base-class fallback for vec_cast_common(). In that case the inputs are deemed compatible when they have the same base type and inherit from the same base class.

Examples

# x is a double, but no information is lost
vec_cast(1, integer())

# When information is lost the cast fails
try(vec_cast(c(1, 1.5), integer()))
try(vec_cast(c(1, 2), logical()))

# You can suppress this error and get the partial results
allow_lossy_cast(vec_cast(c(1, 1.5), integer()))
allow_lossy_cast(vec_cast(c(1, 2), logical()))

# By default this suppress all lossy cast errors without
# distinction, but you can be specific about what cast is allowed
# by supplying prototypes
allow_lossy_cast(vec_cast(c(1, 1.5), integer()), to_ptype = integer())
try(allow_lossy_cast(vec_cast(c(1, 2), logical()), to_ptype = integer()))

# No sensible coercion is possible so an error is generated
try(vec_cast(1.5, factor("a")))

# Cast to common type
vec_cast_common(factor("a"), factor(c("a", "b")))

Frame prototype

Description

This is an experimental generic that returns zero-columns variants of a data frame. It is needed for vec_cbind(), to work around the lack of colwise primitives in vctrs. Expect changes.

Usage

vec_cbind_frame_ptype(x, ...)

Arguments

x

A data frame.

...

These dots are for future extensions and must be empty.

Chopping

Description

vec_chop() provides an efficient method to repeatedly slice a vector. It captures the pattern of map(indices, vec_slice, x = x). When no indices are supplied, it is generally equivalent to as.list().
list_unchop() combines a list of vectors into a single vector, placing elements in the output according to the locations specified by indices. It is similar to vec_c(), but gives greater control over how the elements are combined. When no indices are supplied, it is identical to vec_c(), but typically a little faster.

If indices selects every value in x exactly once, in any order, then list_unchop() is the inverse of vec_chop() and the following invariant holds:

list_unchop(vec_chop(x, indices = indices), indices = indices) == x

Usage

vec_chop(x, ..., indices = NULL, sizes = NULL)

list_unchop(
  x,
  ...,
  indices = NULL,
  ptype = NULL,
  name_spec = NULL,
  name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet",
    "universal_quiet"),
  error_arg = "x",
  error_call = current_env()
)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

indices

For vec_chop(), a list of positive integer vectors to slice x with, or NULL. Can't be used if sizes is already specified. If both indices and sizes are NULL, x is split into its individual elements, equivalent to using an indices of as.list(vec_seq_along(x)).

For list_unchop(), a list of positive integer vectors specifying the locations to place elements of x in. Each element of x is recycled to the size of the corresponding index vector. The size of indices must match the size of x. If NULL, x is combined in the order it is provided in, which is equivalent to using vec_c().

sizes

An integer vector of non-negative sizes representing sequential indices to slice x with, or NULL. Can't be used if indices is already specified.

For example, sizes = c(2, 4) is equivalent to indices = list(1:2, 3:6), but is typically faster.

sum(sizes) must be equal to vec_size(x), i.e. sizes must completely partition x, but an individual size is allowed to be 0.

ptype

If NULL, the default, the output type is determined by computing the common type across all elements of x. Alternatively, you can supply ptype to give the output a known type.

name_spec

A function of two arguments. The outer name is passed as a string to the first argument, and the inner names or positions are passed as second argument.
An anonymous function as a purrr-style formula.
A glue specification of the form "{outer}_{inner}".
An rlang::zap() object, in which case both outer and inner names are ignored and the result is unnamed.

See the name specification topic.

name_repair

How to repair names, see repair options in vec_as_names().

error_arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

error_call

Value

vec_chop(): A list where each element has the same type as x. The size of the list is equal to vec_size(indices), vec_size(sizes), or vec_size(x) depending on whether or not indices or sizes is provided.
list_unchop(): A vector of type vec_ptype_common(!!!x), or ptype, if specified. The size is computed as vec_size_common(!!!indices) unless the indices are NULL, in which case the size is vec_size_common(!!!x).

Dependencies of `vec_chop()`

vec_slice()

Dependencies of `list_unchop()`

vec_c()

Examples

vec_chop(1:5)

# These two are equivalent
vec_chop(1:5, indices = list(1:2, 3:5))
vec_chop(1:5, sizes = c(2, 3))

# Can also be used on data frames
vec_chop(mtcars, indices = list(1:3, 4:6))

# If `indices` selects every value in `x` exactly once,
# in any order, then `list_unchop()` inverts `vec_chop()`
x <- c("a", "b", "c", "d")
indices <- list(2, c(3, 1), 4)
vec_chop(x, indices = indices)
list_unchop(vec_chop(x, indices = indices), indices = indices)

# When unchopping, size 1 elements of `x` are recycled
# to the size of the corresponding index
list_unchop(list(1, 2:3), indices = list(c(1, 3, 5), c(2, 4)))

# Names are retained, and outer names can be combined with inner
# names through the use of a `name_spec`
lst <- list(x = c(a = 1, b = 2), y = 1)
list_unchop(lst, indices = list(c(3, 2), c(1, 4)), name_spec = "{outer}_{inner}")

# An alternative implementation of `ave()` can be constructed using
# `vec_chop()` and `list_unchop()` in combination with `vec_group_loc()`
ave2 <- function(.x, .by, .f, ...) {
  indices <- vec_group_loc(.by)$loc
  chopped <- vec_chop(.x, indices = indices)
  out <- lapply(chopped, .f, ...)
  list_unchop(out, indices = indices)
}

breaks <- warpbreaks$breaks
wool <- warpbreaks$wool

ave2(breaks, wool, mean)

identical(
  ave2(breaks, wool, mean),
  ave(breaks, wool, FUN = mean)
)

# If you know your input is sorted and you'd like to split on the groups,
# `vec_run_sizes()` can be efficiently combined with `sizes`
df <- data_frame(
  g = c(2, 5, 5, 6, 6, 6, 6, 8, 9, 9),
  x = 1:10
)
vec_chop(df, sizes = vec_run_sizes(df$g))

# If you have a list of homogeneous vectors, sometimes it can be useful to
# unchop, apply a function to the flattened vector, and then rechop according
# to the original indices. This can be done efficiently with `list_sizes()`.
x <- list(c(1, 2, 1), c(3, 1), 5, double())
x_flat <- list_unchop(x)
x_flat <- x_flat + max(x_flat)
vec_chop(x_flat, sizes = list_sizes(x))

Compare two vectors

Description

Compare two vectors

Usage

vec_compare(x, y, na_equal = FALSE, .ptype = NULL)

Arguments

x, y

Vectors with compatible types and lengths.

na_equal

Should NA values be considered equal?

.ptype

Override to optionally specify common type

Value

An integer vector with values -1 for x < y, 0 if x == y, and 1 if x > y. If na_equal is FALSE, the result will be NA if either x or y is NA.

S3 dispatch

vec_compare() is not generic for performance; instead it uses vec_proxy_compare() to create a proxy that is used in the comparison.

Dependencies

vec_cast_common() with fallback
vec_recycle_common()
vec_proxy_compare()

Examples

vec_compare(c(TRUE, FALSE, NA), FALSE)
vec_compare(c(TRUE, FALSE, NA), FALSE, na_equal = TRUE)

vec_compare(1:10, 5)
vec_compare(runif(10), 0.5)
vec_compare(letters[1:10], "d")

df <- data.frame(x = c(1, 1, 1, 2), y = c(0, 1, 2, 1))
vec_compare(df, data.frame(x = 1, y = 1))

Count unique values in a vector

Description

Count the number of unique values in a vector. vec_count() has two important differences to table(): it returns a data frame, and when given multiple inputs (as a data frame), it only counts combinations that appear in the input.

Usage

vec_count(x, sort = c("count", "key", "location", "none"))

Arguments

x

A vector (including a data frame).

sort

One of "count", "key", "location", or "none".

"count", the default, puts most frequent values at top
"key", orders by the output key column (i.e. unique values of x)
"location", orders by location where key first seen. This is useful if you want to match the counts up to other unique/duplicated functions.
"none", leaves unordered. This is not guaranteed to produce the same ordering across R sessions, but is the fastest method.

Value

A data frame with columns key (same type as x) and count (an integer vector).

Dependencies

vec_proxy_equal()
vec_slice()
vec_order()

Examples

vec_count(mtcars$vs)
vec_count(iris$Species)

# If you count a data frame you'll get a data frame
# column in the output
str(vec_count(mtcars[c("vs", "am")]))

# Sorting ---------------------------------------

x <- letters[rpois(100, 6)]
# default is to sort by frequency
vec_count(x)

# by can sort by key
vec_count(x, sort = "key")

# or location of first value
vec_count(x, sort = "location")
head(x)

# or not at all
vec_count(x, sort = "none")

Extract underlying data

Description

Extract the data underlying an S3 vector object, i.e. the underlying (named) atomic vector, data frame, or list.

Usage

vec_data(x)

Arguments

x

A vector or object implementing vec_proxy().

Value

The data underlying x, free from any attributes except the names.

Difference with `vec_proxy()`

vec_data() returns unstructured data. The only attributes preserved are names, dims, and dimnames.

Currently, due to the underlying memory architecture of R, this creates a full copy of the data for atomic vectors.
vec_proxy() may return structured data. This generic is the main customisation point for accessing memory values in vctrs, along with vec_restore().

Methods must return a vector type. Records and data frames will be processed rowwise.

Default cast and ptype2 methods

Description

These functions are automatically called when no vec_ptype2() or vec_cast() method is implemented for a pair of types.

They apply special handling if one of the inputs is of type AsIs or sfc.
They attempt a number of fallbacks in cases where it would be too inconvenient to be strict:
- If the class and attributes are the same they are considered compatible. vec_default_cast() returns x in this case.
- In case of incompatible data frame classes, they fall back to data.frame. If an incompatible subclass of tibble is involved, they fall back to tbl_df.
Otherwise, an error is thrown with stop_incompatible_type() or stop_incompatible_cast().

Usage

vec_default_cast(x, to, ..., x_arg = "", to_arg = "", call = caller_env())

vec_default_ptype2(x, y, ..., x_arg = "", y_arg = "", call = caller_env())

Arguments

x

Vectors to cast.

to

Type to cast to. If NULL, x will be returned as is.

...

For vec_cast_common(), vectors to cast. For vec_cast(), vec_cast_default(), and vec_restore(), these dots are only for future extensions and should be empty.

x_arg

Argument name for x, used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

to_arg

Argument name to used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

call

Complete

Description

vec_detect_complete() detects "complete" observations. An observation is considered complete if it is non-missing. For most vectors, this implies that vec_detect_complete(x) == !vec_detect_missing(x).

For data frames and matrices, a row is only considered complete if all elements of that row are non-missing. To compare, !vec_detect_missing(x) detects rows that are partially complete (they have at least one non-missing value).

Usage

vec_detect_complete(x)

Arguments

x

A vector

Details

A record type vector is similar to a data frame, and is only considered complete if all fields are non-missing.

Value

A logical vector with the same size as x.

Examples

x <- c(1, 2, NA, 4, NA)

# For most vectors, this is identical to `!vec_detect_missing(x)`
vec_detect_complete(x)
!vec_detect_missing(x)

df <- data_frame(
  x = x,
  y = c("a", "b", NA, "d", "e")
)

# This returns `TRUE` where all elements of the row are non-missing.
# Compare that with `!vec_detect_missing()`, which detects rows that have at
# least one non-missing value.
df2 <- df
df2$all_non_missing <- vec_detect_complete(df)
df2$any_non_missing <- !vec_detect_missing(df)
df2

Find duplicated values

Description

vec_duplicate_any(): detects the presence of duplicated values, similar to anyDuplicated().
vec_duplicate_detect(): returns a logical vector describing if each element of the vector is duplicated elsewhere. Unlike duplicated(), it reports all duplicated values, not just the second and subsequent repetitions.
vec_duplicate_id(): returns an integer vector giving the location of the first occurrence of the value.

Usage

vec_duplicate_any(x)

vec_duplicate_detect(x)

vec_duplicate_id(x)

Arguments

x

A vector (including a data frame).

Value

vec_duplicate_any(): a logical vector of length 1.
vec_duplicate_detect(): a logical vector the same length as x.
vec_duplicate_id(): an integer vector the same length as x.

Missing values

In most cases, missing values are not considered to be equal, i.e. NA == NA is not TRUE. This behaviour would be unappealing here, so these functions consider all NAs to be equal. (Similarly, all NaN are also considered to be equal.)

Dependencies

vec_proxy_equal()

Examples

vec_duplicate_any(1:10)
vec_duplicate_any(c(1, 1:10))

x <- c(10, 10, 20, 30, 30, 40)
vec_duplicate_detect(x)
# Note that `duplicated()` doesn't consider the first instance to
# be a duplicate
duplicated(x)

# Identify elements of a vector by the location of the first element that
# they're equal to:
vec_duplicate_id(x)
# Location of the unique values:
vec_unique_loc(x)
# Equivalent to `duplicated()`:
vec_duplicate_id(x) == seq_along(x)

Is a vector empty

Description

This function is defunct, please use vec_is_empty().

Usage

vec_empty(x)

Arguments

x

An object.

Equality

Description

vec_equal() tests if two vectors are equal.

Usage

vec_equal(x, y, na_equal = FALSE, .ptype = NULL)

Arguments

x, y

Vectors with compatible types and lengths.

na_equal

Should NA values be considered equal?

.ptype

Override to optionally specify common type

Value

A logical vector the same size as the common size of x and y. Will only contain NAs if na_equal is FALSE.

Dependencies

vec_cast_common() with fallback
vec_recycle_common()
vec_proxy_equal()

Examples

vec_equal(c(TRUE, FALSE, NA), FALSE)
vec_equal(c(TRUE, FALSE, NA), FALSE, na_equal = TRUE)

vec_equal(5, 1:10)
vec_equal("d", letters[1:10])

df <- data.frame(x = c(1, 1, 2, 1), y = c(1, 2, 1, NA))
vec_equal(df, data.frame(x = 1, y = 2))

Missing values

Description

vec_equal_na() has been renamed to vec_detect_missing() and is deprecated as of vctrs 0.5.0.

Usage

vec_equal_na(x)

Arguments

x

A vector

Value

A logical vector the same size as x.

Create a data frame from all combinations of the inputs

Description

vec_expand_grid() creates a new data frame by creating a grid of all possible combinations of the input vectors. It is inspired by expand.grid(). Compared with expand.grid(), it:

Produces sorted output by default by varying the first column the slowest, rather than the fastest. Control this with .vary.
Never converts strings to factors.
Does not add additional attributes.
Drops NULL inputs.
Can expand any vector type, including data frames and records.

Usage

vec_expand_grid(
  ...,
  .vary = "slowest",
  .name_repair = "check_unique",
  .error_call = current_env()
)

Arguments

...

Name-value pairs. The name will become the column name in the resulting data frame.

.vary

One of:

"slowest" to vary the first column slowest. This produces sorted output and is generally the most useful.
"fastest" to vary the first column fastest. This matches the behavior of expand.grid().

.name_repair

One of "check_unique", "unique", "universal", "minimal", "unique_quiet", or "universal_quiet". See vec_as_names() for the meaning of these options.

.error_call

Details

If any input is empty (i.e. size 0), then the result will have 0 rows.

If no inputs are provided, the result is a 1 row data frame with 0 columns. This is consistent with the fact that prod() with no inputs returns 1.

Value

A data frame with as many columns as there are inputs in ... and as many rows as the prod() of the sizes of the inputs.

Examples

vec_expand_grid(x = 1:2, y = 1:3)

# Use `.vary` to match `expand.grid()`:
vec_expand_grid(x = 1:2, y = 1:3, .vary = "fastest")

# Can also expand data frames
vec_expand_grid(
  x = data_frame(a = 1:2, b = 3:4),
  y = 1:4
)

Fill in missing values with the previous or following value

Description

vec_fill_missing() fills gaps of missing values with the previous or following non-missing value.

Usage

vec_fill_missing(
  x,
  direction = c("down", "up", "downup", "updown"),
  max_fill = NULL
)

Arguments

x

A vector

direction

Direction in which to fill missing values. Must be either "down", "up", "downup", or "updown".

max_fill

A single positive integer specifying the maximum number of sequential missing values that will be filled. If NULL, there is no limit.

Examples

x <- c(NA, NA, 1, NA, NA, NA, 3, NA, NA)

# Filling down replaces missing values with the previous non-missing value
vec_fill_missing(x, direction = "down")

# To also fill leading missing values, use `"downup"`
vec_fill_missing(x, direction = "downup")

# Limit the number of sequential missing values to fill with `max_fill`
vec_fill_missing(x, max_fill = 1)

# Data frames are filled rowwise. Rows are only considered missing
# if all elements of that row are missing.
y <- c(1, NA, 2, NA, NA, 3, 4, NA, 5)
df <- data_frame(x = x, y = y)
df

vec_fill_missing(df)

Identify groups

Description

vec_group_id() returns an identifier for the group that each element of x falls in, constructed in the order that they appear. The number of groups is also returned as an attribute, n.
vec_group_loc() returns a data frame containing a key column with the unique groups, and a loc column with the locations of each group in x.
vec_group_rle() locates groups in x and returns them run length encoded in the order that they appear. The return value is a rcrd object with fields for the group identifiers and the run length of the corresponding group. The number of groups is also returned as an attribute, n.

Usage

vec_group_id(x)

vec_group_loc(x)

vec_group_rle(x)

Arguments

x

A vector

Value

vec_group_id(): An integer vector with the same size as x.
vec_group_loc(): A two column data frame with size equal to vec_size(vec_unique(x)).
- A key column of type vec_ptype(x)
- A loc column of type list, with elements of type integer.
vec_group_rle(): A vctrs_group_rle rcrd object with two integer vector fields: group and length.

Note that when using vec_group_loc() for complex types, the default data.frame print method will be suboptimal, and you will want to coerce into a tibble to better understand the output.

Dependencies

vec_proxy_equal()

Examples

purrr <- c("p", "u", "r", "r", "r")
vec_group_id(purrr)
vec_group_rle(purrr)

groups <- mtcars[c("vs", "am")]
vec_group_id(groups)

group_rle <- vec_group_rle(groups)
group_rle

# Access fields with `field()`
field(group_rle, "group")
field(group_rle, "length")

# `vec_group_id()` is equivalent to
vec_match(groups, vec_unique(groups))

vec_group_loc(mtcars$vs)
vec_group_loc(mtcars[c("vs", "am")])

if (require("tibble")) {
  as_tibble(vec_group_loc(mtcars[c("vs", "am")]))
}

Initialize a vector

Description

Initialize a vector

Usage

vec_init(x, n = 1L)

Arguments

x

Template of vector to initialize.

n

Desired size of result.

Dependencies

vec_slice()

Examples

vec_init(1:10, 3)
vec_init(Sys.Date(), 5)
vec_init(mtcars, 2)

Interleave many vectors into one vector

Description

vec_interleave() combines multiple vectors together, much like vec_c(), but does so in such a way that the elements of each vector are interleaved together.

It is a more efficient equivalent to the following usage of vec_c():

vec_interleave(x, y) == vec_c(x[1], y[1], x[2], y[2], ..., x[n], y[n])

Usage

vec_interleave(
  ...,
  .ptype = NULL,
  .name_spec = NULL,
  .name_repair = c("minimal", "unique", "check_unique", "universal", "unique_quiet",
    "universal_quiet")
)

Arguments

...

Vectors to interleave. These will be recycled to a common size.

.ptype

If NULL, the default, the output type is determined by computing the common type across all elements of ....

.name_spec

A function of two arguments. The outer name is passed as a string to the first argument, and the inner names or positions are passed as second argument.
An anonymous function as a purrr-style formula.
A glue specification of the form "{outer}_{inner}".
An rlang::zap() object, in which case both outer and inner names are ignored and the result is unnamed.

See the name specification topic.

.name_repair

How to repair names, see repair options in vec_as_names().

Dependencies

vctrs dependencies

list_unchop()

Examples

# The most common case is to interleave two vectors
vec_interleave(1:3, 4:6)

# But you aren't restricted to just two
vec_interleave(1:3, 4:6, 7:9, 10:12)

# You can also interleave data frames
x <- data_frame(x = 1:2, y = c("a", "b"))
y <- data_frame(x = 3:4, y = c("c", "d"))

vec_interleave(x, y)

List checks

Description

These functions have been deprecated as of vctrs 0.6.0.

vec_is_list() has been renamed to obj_is_list().
vec_check_list() has been renamed to obj_check_list().

Usage

vec_is_list(x)

vec_check_list(x, ..., arg = caller_arg(x), call = caller_env())

Arguments

x

For ⁠vec_*()⁠ functions, an object. For ⁠list_*()⁠ functions, a list.

...

These dots are for future extensions and must be empty.

arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

call

Locate observations matching specified conditions

Description

vec_locate_matches() is a more flexible version of vec_match() used to identify locations where each value of needles matches one or multiple values in haystack. Unlike vec_match(), vec_locate_matches() returns all matches by default, and can match on binary conditions other than equality, such as >, >=, <, and <=.

Usage

vec_locate_matches(
  needles,
  haystack,
  ...,
  condition = "==",
  filter = "none",
  incomplete = "compare",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all",
  relationship = "none",
  nan_distinct = FALSE,
  chr_proxy_collate = NULL,
  needles_arg = "needles",
  haystack_arg = "haystack",
  error_call = current_env()
)

Arguments

needles, haystack

Vectors used for matching.

needles represents the vector to search for.
haystack represents the vector to search in.

Prior to comparison, needles and haystack are coerced to the same type.

...

These dots are for future extensions and must be empty.

condition

Condition controlling how needles should be compared against haystack to identify a successful match.

One of: "==", ">", ">=", "<", or "<=".
For data frames, a length 1 or ncol(needles) character vector containing only the above options, specifying how matching is determined for each column.

filter

Filter to be applied to the matched results.

"none" doesn't apply any filter.
"min" returns only the minimum haystack value matching the current needle.
"max" returns only the maximum haystack value matching the current needle.
For data frames, a length 1 or ncol(needles) character vector containing only the above options, specifying a filter to apply to each column.

Filters don't have any effect on "==" conditions, but are useful for computing "rolling" matches with other conditions.

A filter can return multiple haystack matches for a particular needle if the maximum or minimum haystack value is duplicated in haystack. These can be further controlled with multiple.

incomplete

Handling of missing and incomplete values in needles.

"compare" uses condition to determine whether or not a missing value in needles matches a missing value in haystack. If condition is ==, >=, or <=, then missing values will match.
"match" always allows missing values in needles to match missing values in haystack, regardless of the condition.
"drop" drops incomplete values in needles from the result.
"error" throws an error if any needles are incomplete.
If a single integer is provided, this represents the value returned in the haystack column for values of needles that are incomplete. If no_match = NA, setting incomplete = NA forces incomplete values in needles to be treated like unmatched values.

nan_distinct determines whether a NA is allowed to match a NaN.

no_match

Handling of needles without a match.

"drop" drops needles with zero matches from the result.
"error" throws an error if any needles have zero matches.
If a single integer is provided, this represents the value returned in the haystack column for values of needles that have zero matches. The default represents an unmatched needle with NA.

remaining

Handling of haystack values that needles never matched.

"drop" drops remaining haystack values from the result. Typically, this is the desired behavior if you only care when needles has a match.
"error" throws an error if there are any remaining haystack values.
If a single integer is provided (often NA), this represents the value returned in the needles column for the remaining haystack values that needles never matched. Remaining haystack values are always returned at the end of the result.

multiple

Handling of needles with multiple matches. For each needle:

"all" returns all matches detected in haystack.
"any" returns any match detected in haystack with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
"first" returns the first match detected in haystack.
"last" returns the last match detected in haystack.

relationship

Handling of the expected relationship between needles and haystack. If the expectations chosen from the list below are invalidated, an error is thrown.

"none" doesn't perform any relationship checks.
"one-to-one" expects:
- Each value in needles matches at most 1 value in haystack.
- Each value in haystack matches at most 1 value in needles.
"one-to-many" expects:
- Each value in needles matches any number of values in haystack.
- Each value in haystack matches at most 1 value in needles.
"many-to-one" expects:
- Each value in needles matches at most 1 value in haystack.
- Each value in haystack matches any number of values in needles.
"many-to-many" expects:
- Each value in needles matches any number of values in haystack.
- Each value in haystack matches any number of values in needles.
This performs no checks, and is identical to "none", but is provided to allow you to be explicit about this relationship if you know it exists.
"warn-many-to-many" doesn't assume there is any known relationship, but will warn if needles and haystack have a many-to-many relationship (which is typically unexpected), encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying "many-to-many".

relationship is applied after filter and multiple to allow potential multiple matches to be filtered out first.

relationship doesn't handle cases where there are zero matches. For that, see no_match and remaining.

nan_distinct

A single logical specifying whether or not NaN should be considered distinct from NA for double and complex vectors. If TRUE, NaN will always be ordered between NA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representation of character vectors to use for collation, often used for locale-aware ordering.

If NULL, no transformation is done.
Otherwise, this must be a function of one argument. If the input contains a character vector, it will be passed to this function after it has been translated to UTF-8. This function should return a character vector with the same length as the input. The result should sort as expected in the C-locale, regardless of encoding.

For data frames, chr_proxy_collate will be applied to all character columns.

Common transformation functions include: tolower() for case-insensitive ordering and stringi::stri_sort_key() for locale-aware ordering.

needles_arg, haystack_arg

Argument tags for needles and haystack used in error messages.

error_call

Details

vec_match() is identical to (but often slightly faster than):

vec_locate_matches(
  needles,
  haystack,
  condition = "==",
  multiple = "first",
  nan_distinct = TRUE
)

vec_locate_matches() is extremely similar to a SQL join between needles and haystack, with the default being most similar to a left join.

Be very careful when specifying match conditions. If a condition is misspecified, it is very easy to accidentally generate an exponentially large number of matches.

Value

A two column data frame containing the locations of the matches.

needles is an integer vector containing the location of the needle currently being matched.
haystack is an integer vector containing the location of the corresponding match in the haystack for the current needle.

Dependencies of `vec_locate_matches()`

vec_order_radix()
vec_detect_complete()

Examples

x <- c(1, 2, NA, 3, NaN)
y <- c(2, 1, 4, NA, 1, 2, NaN)

# By default, for each value of `x`, all matching locations in `y` are
# returned
matches <- vec_locate_matches(x, y)
matches

# The result can be used to slice the inputs to align them
data_frame(
  x = vec_slice(x, matches$needles),
  y = vec_slice(y, matches$haystack)
)

# If multiple matches are present, control which is returned with `multiple`
vec_locate_matches(x, y, multiple = "first")
vec_locate_matches(x, y, multiple = "last")
vec_locate_matches(x, y, multiple = "any")

# Use `relationship` to add constraints and error on multiple matches if
# they aren't expected
try(vec_locate_matches(x, y, relationship = "one-to-one"))

# In this case, the `NA` in `y` matches two rows in `x`
try(vec_locate_matches(x, y, relationship = "one-to-many"))

# By default, `NA` is treated as being identical to `NaN`.
# Using `nan_distinct = TRUE` treats `NA` and `NaN` as different values, so
# `NA` can only match `NA`, and `NaN` can only match `NaN`.
vec_locate_matches(x, y, nan_distinct = TRUE)

# If you never want missing values to match, set `incomplete = NA` to return
# `NA` in the `haystack` column anytime there was an incomplete value
# in `needles`.
vec_locate_matches(x, y, incomplete = NA)

# Using `incomplete = NA` allows us to enforce the one-to-many relationship
# that we couldn't before
vec_locate_matches(x, y, relationship = "one-to-many", incomplete = NA)

# `no_match` allows you to specify the returned value for a needle with
# zero matches. Note that this is different from an incomplete value,
# so specifying `no_match` allows you to differentiate between incomplete
# values and unmatched values.
vec_locate_matches(x, y, incomplete = NA, no_match = 0L)

# If you want to require that every `needle` has at least 1 match, set
# `no_match` to `"error"`:
try(vec_locate_matches(x, y, incomplete = NA, no_match = "error"))

# By default, `vec_locate_matches()` detects equality between `needles` and
# `haystack`. Using `condition`, you can detect where an inequality holds
# true instead. For example, to find every location where `x[[i]] >= y`:
matches <- vec_locate_matches(x, y, condition = ">=")

data_frame(
  x = vec_slice(x, matches$needles),
  y = vec_slice(y, matches$haystack)
)

# You can limit which matches are returned with a `filter`. For example,
# with the above example you can filter the matches returned by `x[[i]] >= y`
# down to only the ones containing the maximum `y` value of those matches.
matches <- vec_locate_matches(x, y, condition = ">=", filter = "max")

# Here, the matches for the `3` needle value have been filtered down to
# only include the maximum haystack value of those matches, `2`. This is
# often referred to as a rolling join.
data_frame(
  x = vec_slice(x, matches$needles),
  y = vec_slice(y, matches$haystack)
)

# In the very rare case that you need to generate locations for a
# cross match, where every value of `x` is forced to match every
# value of `y` regardless of what the actual values are, you can
# replace `x` and `y` with integer vectors of the same size that contain
# a single value and match on those instead.
x_proxy <- vec_rep(1L, vec_size(x))
y_proxy <- vec_rep(1L, vec_size(y))
nrow(vec_locate_matches(x_proxy, y_proxy))
vec_size(x) * vec_size(y)

# By default, missing values will match other missing values when using
# `==`, `>=`, or `<=` conditions, but not when using `>` or `<` conditions.
# This is similar to how `vec_compare(x, y, na_equal = TRUE)` works.
x <- c(1, NA)
y <- c(NA, 2)

vec_locate_matches(x, y, condition = "<=")
vec_locate_matches(x, y, condition = "<")

# You can force missing values to match regardless of the `condition`
# by using `incomplete = "match"`
vec_locate_matches(x, y, condition = "<", incomplete = "match")

# You can also use data frames for `needles` and `haystack`. The
# `condition` will be recycled to the number of columns in `needles`, or
# you can specify varying conditions per column. In this example, we take
# a vector of date `values` and find all locations where each value is
# between lower and upper bounds specified by the `haystack`.
values <- as.Date("2019-01-01") + 0:9
needles <- data_frame(lower = values, upper = values)

set.seed(123)
lower <- as.Date("2019-01-01") + sample(10, 10, replace = TRUE)
upper <- lower + sample(3, 10, replace = TRUE)
haystack <- data_frame(lower = lower, upper = upper)

# (values >= lower) & (values <= upper)
matches <- vec_locate_matches(needles, haystack, condition = c(">=", "<="))

data_frame(
  lower = vec_slice(lower, matches$haystack),
  value = vec_slice(values, matches$needle),
  upper = vec_slice(upper, matches$haystack)
)

Locate sorted groups

Description

vec_locate_sorted_groups() returns a data frame containing a key column with sorted unique groups, and a loc column with the locations of each group in x. It is similar to vec_group_loc(), except the groups are returned sorted rather than by first appearance.

Usage

vec_locate_sorted_groups(
  x,
  ...,
  direction = "asc",
  na_value = "largest",
  nan_distinct = FALSE,
  chr_proxy_collate = NULL
)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

direction

Direction to sort in.

A single "asc" or "desc" for ascending or descending order respectively.
For data frames, a length 1 or ncol(x) character vector containing only "asc" or "desc", specifying the direction for each column.

na_value

Ordering of missing values.

A single "largest" or "smallest" for ordering missing values as the largest or smallest values respectively.
For data frames, a length 1 or ncol(x) character vector containing only "largest" or "smallest", specifying how missing values should be ordered within each column.

nan_distinct

A single logical specifying whether or not NaN should be considered distinct from NA for double and complex vectors. If TRUE, NaN will always be ordered between NA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representation of character vectors to use for collation, often used for locale-aware ordering.

If NULL, no transformation is done.
Otherwise, this must be a function of one argument. If the input contains a character vector, it will be passed to this function after it has been translated to UTF-8. This function should return a character vector with the same length as the input. The result should sort as expected in the C-locale, regardless of encoding.

For data frames, chr_proxy_collate will be applied to all character columns.

Common transformation functions include: tolower() for case-insensitive ordering and stringi::stri_sort_key() for locale-aware ordering.

Details

vec_locate_sorted_groups(x) is equivalent to, but faster than:

info <- vec_group_loc(x)
vec_slice(info, vec_order(info$key))

Value

A two column data frame with size equal to vec_size(vec_unique(x)).

A key column of type vec_ptype(x).
A loc column of type list, with elements of type integer.

Dependencies of `vec_locate_sorted_groups()`

vec_proxy_order()

Examples

df <- data.frame(
  g = sample(2, 10, replace = TRUE),
  x = c(NA, sample(5, 9, replace = TRUE))
)

# `vec_locate_sorted_groups()` is similar to `vec_group_loc()`, except keys
# are returned ordered rather than by first appearance.
vec_locate_sorted_groups(df)

vec_group_loc(df)

Find matching observations across vectors

Description

vec_in() returns a logical vector based on whether needle is found in haystack. vec_match() returns an integer vector giving location of needle in haystack, or NA if it's not found.

Usage

vec_match(
  needles,
  haystack,
  ...,
  na_equal = TRUE,
  needles_arg = "",
  haystack_arg = ""
)

vec_in(
  needles,
  haystack,
  ...,
  na_equal = TRUE,
  needles_arg = "",
  haystack_arg = ""
)

Arguments

needles, haystack

Vector of needles to search for in vector haystack. haystack should usually be unique; if not vec_match() will only return the location of the first match.

needles and haystack are coerced to the same type prior to comparison.

...

These dots are for future extensions and must be empty.

na_equal

If TRUE, missing values in needles can be matched to missing values in haystack. If FALSE, they propagate, missing values in needles are represented as NA in the return value.

needles_arg, haystack_arg

Argument tags for needles and haystack used in error messages.

Details

vec_in() is equivalent to %in%; vec_match() is equivalent to match().

Value

A vector the same length as needles. vec_in() returns a logical vector; vec_match() returns an integer vector.

Missing values

In most cases places in R, missing values are not considered to be equal, i.e. NA == NA is not TRUE. The exception is in matching functions like match() and merge(), where an NA will match another NA. By vec_match() and vec_in() will match NAs; but you can control this behaviour with the na_equal argument.

Dependencies

vec_cast_common() with fallback
vec_proxy_equal()

Examples

hadley <- strsplit("hadley", "")[[1]]
vec_match(hadley, letters)

vowels <- c("a", "e", "i", "o", "u")
vec_match(hadley, vowels)
vec_in(hadley, vowels)

# Only the first index of duplicates is returned
vec_match(c("a", "b"), c("a", "b", "a", "b"))

Mathematical operations

Description

This generic provides a common dispatch mechanism for all regular unary mathematical functions. It is used as a common wrapper around many of the Summary group generics, the Math group generics, and a handful of other mathematical functions like mean() (but not var() or sd()).

Usage

vec_math(.fn, .x, ...)

vec_math_base(.fn, .x, ...)

Arguments

.fn

A mathematical function from the base package, as a string.

.x

A vector.

...

Additional arguments passed to .fn.

Details

vec_math_base() is provided as a convenience for writing methods. It calls the base .fn on the underlying vec_data().

Included functions

From the Summary group generic: prod(), sum(), any(), all().
From the Math group generic: abs(), sign(), sqrt(), ceiling(), floor(), trunc(), cummax(), cummin(), cumprod(), cumsum(), log(), log10(), log2(), log1p(), acos(), acosh(), asin(), asinh(), atan(), atanh(), exp(), expm1(), cos(), cosh(), cospi(), sin(), sinh(), sinpi(), tan(), tanh(), tanpi(), gamma(), lgamma(), digamma(), trigamma().
Additional generics: mean(), is.nan(), is.finite(), is.infinite().

Note that median() is currently not implemented, and sd() and var() are currently not generic and so do not support custom classes.

Examples

x <- new_vctr(c(1, 2.5, 10))
x

abs(x)
sum(x)
cumsum(x)

Get or set the names of a vector

Description

These functions work like rlang::names2(), names() and names<-(), except that they return or modify the the rowwise names of the vector. These are:

The usual names() for atomic vectors and lists
The row names for data frames and matrices
The names of the first dimension for arrays Rowwise names are size consistent: the length of the names always equals vec_size().

vec_names2() returns the repaired names from a vector, even if it is unnamed. See vec_as_names() for details on name repair.

vec_names() is a bare-bones version that returns NULL if the vector is unnamed.

vec_set_names() sets the names or removes them.

Usage

vec_names2(
  x,
  ...,
  repair = c("minimal", "unique", "universal", "check_unique", "unique_quiet",
    "universal_quiet"),
  quiet = FALSE
)

vec_names(x)

vec_set_names(x, names)

Arguments

x

A vector with names

...

These dots are for future extensions and must be empty.

repair

Minimal names are never NULL or NA. When an element doesn't have a name, its minimal name is an empty string.
Unique names are unique. A suffix is appended to duplicate names to make them unique.
Universal names are unique and syntactic, meaning that you can safely use the names as variables without causing a syntax error.

The "check_unique" option doesn't perform any name repair. Instead, an error is raised if the names don't suit the "unique" criteria.

quiet

By default, the user is informed of any renaming caused by repairing the names. This only concerns unique and universal repairing. Set quiet to TRUE to silence the messages.

Users can silence the name repair messages by setting the "rlib_name_repair_verbosity" global option to "quiet".

names

A character vector, or NULL.

Value

vec_names2() returns the names of x, repaired. vec_names() returns the names of x or NULL if unnamed. vec_set_names() returns x with names updated.

Examples

vec_names2(1:3)
vec_names2(1:3, repair = "unique")
vec_names2(c(a = 1, b = 2))

# `vec_names()` consistently returns the rowwise names of data frames and arrays:
vec_names(data.frame(a = 1, b = 2))
names(data.frame(a = 1, b = 2))
vec_names(mtcars)
names(mtcars)
vec_names(Titanic)
names(Titanic)

vec_set_names(1:3, letters[1:3])
vec_set_names(data.frame(a = 1:3), letters[1:3])

Order and sort vectors

Description

Order and sort vectors

Usage

vec_order(
  x,
  ...,
  direction = c("asc", "desc"),
  na_value = c("largest", "smallest")
)

vec_sort(
  x,
  ...,
  direction = c("asc", "desc"),
  na_value = c("largest", "smallest")
)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

direction

Direction to sort in. Defaults to ascending.

na_value

Should NAs be treated as the largest or smallest values?

Value

vec_order() an integer vector the same size as x.
vec_sort() a vector with the same size and type as x.

Differences with `order()`

Unlike the na.last argument of order() which decides the positions of missing values irrespective of the decreasing argument, the na_value argument of vec_order() interacts with direction. If missing values are considered the largest value, they will appear last in ascending order, and first in descending order.

Dependencies of `vec_order()`

vec_proxy_order()

Dependencies of `vec_sort()`

vec_proxy_order()
vec_order()
vec_slice()

Examples

x <- round(c(runif(9), NA), 3)
vec_order(x)
vec_sort(x)
vec_sort(x, direction = "desc")

# Can also handle data frames
df <- data.frame(g = sample(2, 10, replace = TRUE), x = x)
vec_order(df)
vec_sort(df)
vec_sort(df, direction = "desc")

# Missing values interpreted as largest values are last when
# in increasing order:
vec_order(c(1, NA), na_value = "largest", direction = "asc")
vec_order(c(1, NA), na_value = "largest", direction = "desc")

Proxy and restore

Description

vec_proxy() returns the data structure containing the values of a vector. This data structure is usually the vector itself. In this case the proxy is the identity function, which is the default vec_proxy() method.

Only experts should implement special vec_proxy() methods, for these cases:

A vector has vectorised attributes, i.e. metadata for each element of the vector. These record types are implemented in vctrs by returning a data frame in the proxy method. If you're starting your class from scratch, consider deriving from the rcrd class. It implements the appropriate data frame proxy and is generally the preferred way to create a record class.
When you're implementing a vector on top of a non-vector type, like an environment or an S4 object. This is currently only partially supported.
S3 lists are considered scalars by default. This is the safe choice for list objects such as returned by stats::lm(). To declare that your S3 list class is a vector, you normally add "list" to the right of your class vector. Explicit inheritance from list is generally the preferred way to declare an S3 list in R, for instance it makes it possible to dispatch on generic.list S3 methods.

If you can't modify your class vector, you can implement an identity proxy (i.e. a proxy method that just returns its input) to let vctrs know this is a vector list and not a scalar.

vec_restore() is the inverse operation of vec_proxy(). It should only be called on vector proxies.

It undoes the transformations of vec_proxy().
It restores attributes and classes. These may be lost when the memory values are manipulated. For example slicing a subset of a vector's proxy causes a new proxy to be allocated.

By default vctrs restores all attributes and classes automatically. You only need to implement a vec_restore() method if your class has attributes that depend on the data.

Usage

vec_proxy(x, ...)

vec_restore(x, to, ...)

Arguments

x

A vector.

...

These dots are for future extensions and must be empty.

to

The original vector to restore to.

Proxying

You should only implement vec_proxy() when your type is designed around a non-vector class. I.e. anything that is not either:

An atomic vector
A bare list
A data frame

In this case, implement vec_proxy() to return such a vector class. The vctrs operations such as vec_slice() are applied on the proxy and vec_restore() is called to restore the original representation of your type.

The most common case where you need to implement vec_proxy() is for S3 lists. In vctrs, S3 lists are treated as scalars by default. This way we don't treat objects like model fits as vectors. To prevent vctrs from treating your S3 list as a scalar, unclass it in the vec_proxy() method. For instance, here is the definition for list_of:

vec_proxy.vctrs_list_of <- function(x) {
  unclass(x)
}

Another case where you need to implement a proxy is record types. Record types should return a data frame, as in the POSIXlt method:

vec_proxy.POSIXlt <- function(x) {
  new_data_frame(unclass(x))
}

Note that you don't need to implement vec_proxy() when your class inherits from vctrs_vctr or vctrs_rcrd.

Restoring

A restore is a specialised type of cast, primarily used in conjunction with NextMethod() or a C-level function that works on the underlying data structure. A vec_restore() method can make the following assumptions about x:

It has the correct type.
It has the correct names.
It has the correct dim and dimnames attributes.
It is unclassed. This way you can call vctrs generics with x without triggering an infinite loop of restoration.

The length may be different (for example after vec_slice() has been called), and all other attributes may have been lost. The method should restore all attributes so that after restoration, vec_restore(vec_data(x), x) yields x.

To understand the difference between vec_cast() and vec_restore() think about factors: it doesn't make sense to cast an integer to a factor, but if NextMethod() or another low-level function has stripped attributes, you still need to be able to restore them.

The default method copies across all attributes so you only need to provide your own method if your attributes require special care (i.e. they are dependent on the data in some way). When implementing your own method, bear in mind that many R users add attributes to track additional metadata that is important to them, so you should preserve any attributes that don't require special handling for your class.

Dependencies

x must be a vector in the vctrs sense (see vec_is())
By default the underlying data is returned as is (identity proxy)

All vector classes have a proxy, even those who don't implement any vctrs methods. The exception is S3 lists that don't inherit from "list" explicitly. These might have to implement an identity proxy for compatibility with vctrs (see discussion above).

Comparison and order proxy

Description

vec_proxy_compare() and vec_proxy_order() return proxy objects, i.e. an atomic vector or data frame of atomic vectors.

For vctrs_vctr objects:

vec_proxy_compare() determines the behavior of <, >, >= and <= (via vec_compare()); and min(), max(), median(), and quantile().
vec_proxy_order() determines the behavior of order() and sort() (via xtfrm()).

Usage

vec_proxy_compare(x, ...)

vec_proxy_order(x, ...)

Arguments

x

A vector x.

...

These dots are for future extensions and must be empty.

Details

The default method of vec_proxy_compare() assumes that all classes built on top of atomic vectors or records are comparable. Internally the default calls vec_proxy_equal(). If your class is not comparable, you will need to provide a vec_proxy_compare() method that throws an error.

The behavior of vec_proxy_order() is identical to vec_proxy_compare(), with the exception of lists. Lists are not comparable, as comparing elements of different types is undefined. However, to allow ordering of data frames containing list-columns, the ordering proxy of a list is generated as an integer vector that can be used to order list elements by first appearance.

If a class implements a vec_proxy_compare() method, it usually doesn't need to provide a vec_proxy_order() method, because the latter is implemented by forwarding to vec_proxy_compare() by default. Classes inheriting from list are an exception: due to the default vec_proxy_order() implementation, vec_proxy_compare() and vec_proxy_order() should be provided for such classes (with identical implementations) to avoid mismatches between comparison and sorting.

Value

A 1d atomic vector or a data frame.

Dependencies

vec_proxy_equal() called by default in vec_proxy_compare()
vec_proxy_compare() called by default in vec_proxy_order()

Data frames

If the proxy for x is a data frame, the proxy function is automatically recursively applied on all columns as well. After applying the proxy recursively, if there are any data frame columns present in the proxy, then they are unpacked. Finally, if the resulting data frame only has a single column, then it is unwrapped and a vector is returned as the proxy.

Examples

# Lists are not comparable
x <- list(1:2, 1, 1:2, 3)
try(vec_compare(x, x))

# But lists are orderable by first appearance to allow for
# ordering data frames with list-cols
df <- new_data_frame(list(x = x))
vec_sort(df)

Equality proxy

Description

Returns a proxy object (i.e. an atomic vector or data frame of atomic vectors). For vctrs, this determines the behaviour of == and != (via vec_equal()); unique(), duplicated() (via vec_unique() and vec_duplicate_detect()); is.na() and anyNA() (via vec_detect_missing()).

Usage

vec_proxy_equal(x, ...)

Arguments

x

A vector x.

...

These dots are for future extensions and must be empty.

Details

The default method calls vec_proxy(), as the default underlying vector data should be equal-able in most cases. If your class is not equal-able, provide a vec_proxy_equal() method that throws an error.

Value

A 1d atomic vector or a data frame.

Data frames

Dependencies

vec_proxy() called by default

Find the prototype of a set of vectors

Description

vec_ptype() returns the unfinalised prototype of a single vector. vec_ptype_common() finds the common type of multiple vectors. vec_ptype_show() nicely prints the common type of any number of inputs, and is designed for interactive exploration.

Usage

vec_ptype(x, ..., x_arg = "", call = caller_env())

vec_ptype_common(..., .ptype = NULL, .arg = "", .call = caller_env())

vec_ptype_show(...)

Arguments

x

A vector

...

For vec_ptype(), these dots are for future extensions and must be empty.

For vec_ptype_common() and vec_ptype_show(), vector inputs.

x_arg

Argument name for x. This is used in error messages to inform the user about the locations of incompatible types.

call, .call

.ptype

If NULL, the default, the output type is determined by computing the common type across all elements of ....

.arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

Value

vec_ptype() and vec_ptype_common() return a prototype (a size-0 vector)

`vec_ptype()`

vec_ptype() returns size 0 vectors potentially containing attributes but no data. Generally, this is just vec_slice(x, 0L), but some inputs require special handling.

While you can't slice NULL, the prototype of NULL is itself. This is because we treat NULL as an identity value in the vec_ptype2() monoid.
The prototype of logical vectors that only contain missing values is the special unspecified type, which can be coerced to any other 1d type. This allows bare NAs to represent missing values for any 1d vector type.

See internal-faq-ptype2-identity for more information about identity values.

vec_ptype() is a performance generic. It is not necessary to implement it because the default method will work for any vctrs type. However the default method builds around other vctrs primitives like vec_slice() which incurs performance costs. If your class has a static prototype, you might consider implementing a custom vec_ptype() method that returns a constant. This will improve the performance of your class in many cases (common type imputation in particular).

Because it may contain unspecified vectors, the prototype returned by vec_ptype() is said to be unfinalised. Call vec_ptype_finalise() to finalise it. Commonly you will need the finalised prototype as returned by vec_slice(x, 0L).

`vec_ptype_common()`

vec_ptype_common() first finds the prototype of each input, then successively calls vec_ptype2() to find a common type. It returns a finalised prototype.

Dependencies of `vec_ptype()`

vec_slice() for returning an empty slice

Dependencies of `vec_ptype_common()`

vec_ptype2()
vec_ptype_finalise()

Examples

# Unknown types ------------------------------------------
vec_ptype_show()
vec_ptype_show(NA)
vec_ptype_show(NULL)

# Vectors ------------------------------------------------
vec_ptype_show(1:10)
vec_ptype_show(letters)
vec_ptype_show(TRUE)

vec_ptype_show(Sys.Date())
vec_ptype_show(Sys.time())
vec_ptype_show(factor("a"))
vec_ptype_show(ordered("a"))

# Matrices -----------------------------------------------
# The prototype of a matrix includes the number of columns
vec_ptype_show(array(1, dim = c(1, 2)))
vec_ptype_show(array("x", dim = c(1, 2)))

# Data frames --------------------------------------------
# The prototype of a data frame includes the prototype of
# every column
vec_ptype_show(iris)

# The prototype of multiple data frames includes the prototype
# of every column that in any data frame
vec_ptype_show(
  data.frame(x = TRUE),
  data.frame(y = 2),
  data.frame(z = "a")
)

Vector type as a string

Description

vec_ptype_full() displays the full type of the vector. vec_ptype_abbr() provides an abbreviated summary suitable for use in a column heading.

Usage

vec_ptype_full(x, ...)

vec_ptype_abbr(x, ..., prefix_named = FALSE, suffix_shape = TRUE)

Arguments

x

A vector.

...

These dots are for future extensions and must be empty.

prefix_named

If TRUE, add a prefix for named vectors.

suffix_shape

If TRUE (the default), append the shape of the vector.

Value

A string.

S3 dispatch

The default method for vec_ptype_full() uses the first element of the class vector. Override this method if your class has parameters that should be prominently displayed.

The default method for vec_ptype_abbr() abbreviate()s vec_ptype_full() to 8 characters. You should almost always override, aiming for 4-6 characters where possible.

These arguments are handled by the generic and not passed to methods:

prefix_named
suffix_shape

Examples

cat(vec_ptype_full(1:10))
cat(vec_ptype_full(iris))

cat(vec_ptype_abbr(1:10))

64 bit integers

Description

A integer64 is a 64 bits integer vector, implemented in the bit64 package.

Usage

## S3 method for class 'integer64'
vec_ptype_full(x, ...)

## S3 method for class 'integer64'
vec_ptype_abbr(x, ...)

## S3 method for class 'integer64'
vec_ptype2(x, y, ...)

## S3 method for class 'integer64'
vec_cast(x, to, ...)

Details

These functions help the integer64 class from bit64 in to the vctrs type system by providing coercion functions and casting functions.

Find the common type for a pair of vectors

Description

vec_ptype2() defines the coercion hierarchy for a set of related vector types. Along with vec_cast(), this generic forms the foundation of type coercions in vctrs.

vec_ptype2() is relevant when you are implementing vctrs methods for your class, but it should not usually be called directly. If you need to find the common type of a set of inputs, call vec_ptype_common() instead. This function supports multiple inputs and finalises the common type.

Usage

## S3 method for class 'logical'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'integer'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'double'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'complex'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'character'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'raw'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

## S3 method for class 'list'
vec_ptype2(x, y, ..., x_arg = "", y_arg = "")

vec_ptype2(
  x,
  y,
  ...,
  x_arg = caller_arg(x),
  y_arg = caller_arg(y),
  call = caller_env()
)

Arguments

x, y

Vector types.

...

These dots are for future extensions and must be empty.

x_arg, y_arg

Argument names for x and y. These are used in error messages to inform the user about the locations of incompatible types (see stop_incompatible_type()).

call

Implementing coercion methods

For an overview of how these generics work and their roles in vctrs, see ?theory-faq-coercion.
For an example of implementing coercion methods for simple vectors, see ?howto-faq-coercion.
For an example of implementing coercion methods for data frame subclasses, see ?howto-faq-coercion-data-frame.
For a tutorial about implementing vctrs classes from scratch, see vignette("s3-vector").

Dependencies

vec_ptype() is applied to x and y

Compute ranks

Description

vec_rank() computes the sample ranks of a vector. For data frames, ranks are computed along the rows, using all columns after the first to break ties.

Usage

vec_rank(
  x,
  ...,
  ties = c("min", "max", "sequential", "dense"),
  incomplete = c("rank", "na"),
  direction = "asc",
  na_value = "largest",
  nan_distinct = FALSE,
  chr_proxy_collate = NULL
)

Arguments

x

A vector

...

These dots are for future extensions and must be empty.

ties

Ranking of duplicate values.

"min": Use the current rank for all duplicates. The next non-duplicate value will have a rank incremented by the number of duplicates present.
"max": Use the current rank + n_duplicates - 1 for all duplicates. The next non-duplicate value will have a rank incremented by the number of duplicates present.
"sequential": Use an increasing sequence of ranks starting at the current rank, applied to duplicates in order of appearance.
"dense": Use the current rank for all duplicates. The next non-duplicate value will have a rank incremented by 1, effectively removing any gaps in the ranking.

incomplete

Ranking of missing and incomplete observations.

"rank": Rank incomplete observations normally. Missing values within incomplete observations will be affected by na_value and nan_distinct.
"na": Don't rank incomplete observations at all. Instead, they are given a rank of NA. In this case, na_value and nan_distinct have no effect.

direction

Direction to sort in.

A single "asc" or "desc" for ascending or descending order respectively.
For data frames, a length 1 or ncol(x) character vector containing only "asc" or "desc", specifying the direction for each column.

na_value

Ordering of missing values.

A single "largest" or "smallest" for ordering missing values as the largest or smallest values respectively.
For data frames, a length 1 or ncol(x) character vector containing only "largest" or "smallest", specifying how missing values should be ordered within each column.

nan_distinct

A single logical specifying whether or not NaN should be considered distinct from NA for double and complex vectors. If TRUE, NaN will always be ordered between NA and non-missing numbers.

chr_proxy_collate

A function generating an alternate representation of character vectors to use for collation, often used for locale-aware ordering.

If NULL, no transformation is done.
Otherwise, this must be a function of one argument. If the input contains a character vector, it will be passed to this function after it has been translated to UTF-8. This function should return a character vector with the same length as the input. The result should sort as expected in the C-locale, regardless of encoding.

For data frames, chr_proxy_collate will be applied to all character columns.

Common transformation functions include: tolower() for case-insensitive ordering and stringi::stri_sort_key() for locale-aware ordering.

Details

Unlike base::rank(), when incomplete = "rank" all missing values are given the same rank, rather than an increasing sequence of ranks. When nan_distinct = FALSE, NaN values are given the same rank as NA, otherwise they are given a rank that differentiates them from NA.

Like vec_order_radix(), ordering is done in the C-locale. This can affect the ranks of character vectors, especially regarding how uppercase and lowercase letters are ranked. See the documentation of vec_order_radix() for more information.

Dependencies

vec_order_radix()
vec_slice()

Examples

x <- c(5L, 6L, 3L, 3L, 5L, 3L)

vec_rank(x, ties = "min")
vec_rank(x, ties = "max")

# Sequential ranks use an increasing sequence for duplicates
vec_rank(x, ties = "sequential")

# Dense ranks remove gaps between distinct values,
# even if there are duplicates
vec_rank(x, ties = "dense")

y <- c(NA, x, NA, NaN)

# Incomplete values match other incomplete values by default, and their
# overall position can be adjusted with `na_value`
vec_rank(y, na_value = "largest")
vec_rank(y, na_value = "smallest")

# NaN can be ranked separately from NA if required
vec_rank(y, nan_distinct = TRUE)

# Rank in descending order. Since missing values are the largest value,
# they are given a rank of `1` when ranking in descending order.
vec_rank(y, direction = "desc", na_value = "largest")

# Give incomplete values a rank of `NA` by setting `incomplete = "na"`
vec_rank(y, incomplete = "na")

# Can also rank data frames, using columns after the first to break ties
z <- c(2L, 3L, 4L, 4L, 5L, 2L)
df <- data_frame(x = x, z = z)
df

vec_rank(df)

Vector recycling

Description

vec_recycle(x, size) recycles a single vector to a given size. vec_recycle_common(...) recycles multiple vectors to their common size. All functions obey the vctrs recycling rules, and will throw an error if recycling is not possible. See vec_size() for the precise definition of size.

Usage

vec_recycle(x, size, ..., x_arg = "", call = caller_env())

vec_recycle_common(..., .size = NULL, .arg = "", .call = caller_env())

Arguments

x

A vector to recycle.

size

Desired output size.

...

Depending on the function used:

For vec_recycle_common(), vectors to recycle.
For vec_recycle(), these dots should be empty.

x_arg

Argument name for x. These are used in error messages to inform the user about which argument has an incompatible size.

call, .call

.size

Desired output size. If omitted, will use the common size from vec_size_common().

.arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

Dependencies

vec_slice()

Examples

# Inputs with 1 observation are recycled
vec_recycle_common(1:5, 5)
vec_recycle_common(integer(), 5)
## Not run: 
vec_recycle_common(1:5, 1:2)

## End(Not run)

# Data frames and matrices are recycled along their rows
vec_recycle_common(data.frame(x = 1), 1:5)
vec_recycle_common(array(1:2, c(1, 2)), 1:5)
vec_recycle_common(array(1:3, c(1, 3, 1)), 1:5)

Expand the length of a vector

Description

vec_repeat() has been replaced with vec_rep() and vec_rep_each() and is deprecated as of vctrs 0.3.0.

Usage

vec_repeat(x, each = 1L, times = 1L)

Arguments

x

A vector.

each

Number of times to repeat each element of x.

times

Number of times to repeat the whole vector of x.

Value

A vector the same type as x with size vec_size(x) * times * each.

Useful sequences

Description

vec_seq_along() is equivalent to seq_along() but uses size, not length. vec_init_along() creates a vector of missing values with size matching an existing object.

Usage

vec_seq_along(x)

vec_init_along(x, y = x)

Arguments

x, y

Vectors

Value

vec_seq_along() an integer vector with the same size as x.
vec_init_along() a vector with the same type as x and the same size as y.

Examples

vec_seq_along(mtcars)
vec_init_along(head(mtcars))

Number of observations

Description

vec_size(x) returns the size of a vector. vec_is_empty() returns TRUE if the size is zero, FALSE otherwise.

The size is distinct from the length() of a vector because it generalises to the "number of observations" for 2d structures, i.e. it's the number of rows in matrix or a data frame. This definition has the important property that every column of a data frame (even data frame and matrix columns) have the same size. vec_size_common(...) returns the common size of multiple vectors.

list_sizes() returns an integer vector containing the size of each element of a list. It is nearly equivalent to, but faster than, map_int(x, vec_size), with the exception that list_sizes() will error on non-list inputs, as defined by obj_is_list(). list_sizes() is to vec_size() as lengths() is to length().

Usage

vec_size(x)

vec_size_common(
  ...,
  .size = NULL,
  .absent = 0L,
  .arg = "",
  .call = caller_env()
)

list_sizes(x)

vec_is_empty(x)

Arguments

x, ...

Vector inputs or NULL.

.size

If NULL, the default, the output size is determined by recycling the lengths of all elements of .... Alternatively, you can supply .size to force a known size; in this case, x and ... are ignored.

.absent

The size used when no input is provided, or when all input is NULL. If left as NULL when no input is supplied, an error is thrown.

.arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

.call

Details

There is no vctrs helper that retrieves the number of columns: as this is a property of the type.

vec_size() is equivalent to NROW() but has a name that is easier to pronounce, and throws an error when passed non-vector inputs.

Value

An integer (or double for long vectors).

vec_size_common() returns .absent if all inputs are NULL or absent, 0L by default.

Invariants

vec_size(dataframe) == vec_size(dataframe[[i]])
vec_size(matrix) == vec_size(matrix[, i, drop = FALSE])
vec_size(vec_c(x, y)) == vec_size(x) + vec_size(y)

The size of NULL

The size of NULL is hard-coded to 0L in vec_size(). vec_size_common() returns .absent when all inputs are NULL (if only some inputs are NULL, they are simply ignored).

A default size of 0 makes sense because sizes are most often queried in order to compute a total size while assembling a collection of vectors. Since we treat NULL as an absent input by principle, we return the identity of sizes under addition to reflect that an absent input doesn't take up any size.

Note that other defaults might make sense under different circumstances. For instance, a default size of 1 makes sense for finding the common size because 1 is the identity of the recycling rules.

Dependencies

vec_proxy()

Examples

vec_size(1:100)
vec_size(mtcars)
vec_size(array(dim = c(3, 5, 10)))

vec_size_common(1:10, 1:10)
vec_size_common(1:10, 1)
vec_size_common(integer(), 1)

list_sizes(list("a", 1:5, letters))

Get or set observations in a vector

Description

This provides a common interface to extracting and modifying observations for all vector types, regardless of dimensionality. It is an analog to [ that matches vec_size() instead of length().

Usage

vec_slice(x, i, ..., error_call = current_env())

vec_slice(x, i) <- value

vec_assign(x, i, value, ..., x_arg = "", value_arg = "")

Arguments

x

A vector

i

...

These dots are for future extensions and must be empty.

error_call

value

Replacement values. value is cast to the type of x, but only if they have a common type. See below for examples of this rule.

x_arg, value_arg

Argument names for x and value. These are used in error messages to inform the user about the locations of incompatible types and sizes (see stop_incompatible_type() and stop_incompatible_size()).

Value

A vector of the same type as x.

Genericity

Support for S3 objects depends on whether the object implements a vec_proxy() method.

When a vec_proxy() method exists, the proxy is sliced and vec_restore() is called on the result.
Otherwise vec_slice() falls back to the base generic [.

Note that S3 lists are treated as scalars by default, and will cause an error if they don't implement a vec_proxy() method.

Differences with base R subsetting

vec_slice() only slices along one dimension. For two-dimensional types, the first dimension is subsetted.
vec_slice() preserves attributes by default.
⁠vec_slice<-()⁠ is type-stable and always returns the same type as the LHS.

Dependencies

vctrs dependencies

vec_proxy()
vec_restore()

base dependencies

base::`[`

If a non-data-frame vector class doesn't have a vec_proxy() method, the vector is sliced with [ instead.

Examples

x <- sample(10)
x
vec_slice(x, 1:3)

# You can assign with the infix variant:
vec_slice(x, 2) <- 100
x

# Or with the regular variant that doesn't modify the original input:
y <- vec_assign(x, 3, 500)
y
x


# Slicing objects of higher dimension:
vec_slice(mtcars, 1:3)

# Type stability --------------------------------------------------

# The assign variant is type stable. It always returns the same
# type as the input.
x <- 1:5
vec_slice(x, 2) <- 20.0

# `x` is still an integer vector because the RHS was cast to the
# type of the LHS:
vec_ptype(x)

# Compare to `[<-`:
x[2] <- 20.0
vec_ptype(x)


# Note that the types must be coercible for the cast to happen.
# For instance, you can cast a double vector of whole numbers to an
# integer vector:
vec_cast(1, integer())

# But not fractional doubles:
try(vec_cast(1.5, integer()))

# For this reason you can't assign fractional values in an integer
# vector:
x <- 1:3
try(vec_slice(x, 2) <- 1.5)

Split a vector into groups

Description

This is a generalisation of split() that can split by any type of vector, not just factors. Instead of returning the keys in the character names, the are returned in a separate parallel vector.

Usage

vec_split(x, by)

Arguments

x

Vector to divide into groups.

by

Vector whose unique values defines the groups.

Value

A data frame with two columns and size equal to vec_size(vec_unique(by)). The key column has the same type as by, and the val column is a list containing elements of type vec_ptype(x).

Note for complex types, the default data.frame print method will be suboptimal, and you will want to coerce into a tibble to better understand the output.

Dependencies

vec_group_loc()
vec_chop()

Examples

vec_split(mtcars$cyl, mtcars$vs)
vec_split(mtcars$cyl, mtcars[c("vs", "am")])

if (require("tibble")) {
  as_tibble(vec_split(mtcars$cyl, mtcars[c("vs", "am")]))
  as_tibble(vec_split(mtcars, mtcars[c("vs", "am")]))
}

Deprecated type functions

Description

These functions have been renamed:

vec_type() => vec_ptype()
vec_type2() => vec_ptype2()
vec_type_common() => vec_ptype_common()

Usage

vec_type(x)

vec_type_common(..., .ptype = NULL)

vec_type2(x, y, ...)

Arguments

x, y, ..., .ptype

Arguments for deprecated functions.

Chopping

Description

vec_unchop() has been renamed to list_unchop() and is deprecated as of vctrs 0.5.0.

Usage

vec_unchop(
  x,
  indices = NULL,
  ptype = NULL,
  name_spec = NULL,
  name_repair = c("minimal", "unique", "check_unique", "universal")
)

Arguments

x

A vector

indices

ptype

If NULL, the default, the output type is determined by computing the common type across all elements of x. Alternatively, you can supply ptype to give the output a known type.

name_spec

A function of two arguments. The outer name is passed as a string to the first argument, and the inner names or positions are passed as second argument.
An anonymous function as a purrr-style formula.
A glue specification of the form "{outer}_{inner}".
An rlang::zap() object, in which case both outer and inner names are ignored and the result is unnamed.

See the name specification topic.

name_repair

How to repair names, see repair options in vec_as_names().

Value

vec_chop(): A list where each element has the same type as x. The size of the list is equal to vec_size(indices), vec_size(sizes), or vec_size(x) depending on whether or not indices or sizes is provided.
list_unchop(): A vector of type vec_ptype_common(!!!x), or ptype, if specified. The size is computed as vec_size_common(!!!indices) unless the indices are NULL, in which case the size is vec_size_common(!!!x).

Find and count unique values

Description

vec_unique(): the unique values. Equivalent to unique().
vec_unique_loc(): the locations of the unique values.
vec_unique_count(): the number of unique values.

Usage

vec_unique(x)

vec_unique_loc(x)

vec_unique_count(x)

Arguments

x

A vector (including a data frame).

Value

vec_unique(): a vector the same type as x containing only unique values.
vec_unique_loc(): an integer vector, giving locations of unique values.
vec_unique_count(): an integer vector of length 1, giving the number of unique values.

Dependencies

vec_proxy_equal()

Missing values

Examples

x <- rpois(100, 8)
vec_unique(x)
vec_unique_loc(x)
vec_unique_count(x)

# `vec_unique()` returns values in the order that encounters them
# use sort = "location" to match to the result of `vec_count()`
head(vec_unique(x))
head(vec_count(x, sort = "location"))

# Normally missing values are not considered to be equal
NA == NA

# But they are for the purposes of considering uniqueness
vec_unique(c(NA, NA, NA, NA, 1, 2, 1))

Repeat a vector

Description

vec_rep() repeats an entire vector a set number of times.
vec_rep_each() repeats each element of a vector a set number of times.
vec_unrep() compresses a vector with repeated values. The repeated values are returned as a key alongside the number of times each key is repeated.

Usage

vec_rep(
  x,
  times,
  ...,
  error_call = current_env(),
  x_arg = "x",
  times_arg = "times"
)

vec_rep_each(
  x,
  times,
  ...,
  error_call = current_env(),
  x_arg = "x",
  times_arg = "times"
)

vec_unrep(x)

Arguments

x

A vector.

times

For vec_rep(), a single integer for the number of times to repeat the entire vector.

For vec_rep_each(), an integer vector of the number of times to repeat each element of x. times will be recycled to the size of x.

...

These dots are for future extensions and must be empty.

error_call

x_arg, times_arg

Argument names for errors.

Details

Using vec_unrep() and vec_rep_each() together is similar to using base::rle() and base::inverse.rle(). The following invariant shows the relationship between the two functions:

compressed <- vec_unrep(x)
identical(x, vec_rep_each(compressed$key, compressed$times))

There are two main differences between vec_unrep() and base::rle():

vec_unrep() treats adjacent missing values as equivalent, while rle() treats them as different values.
vec_unrep() works along the size of x, while rle() works along its length. This means that vec_unrep() works on data frames by compressing repeated rows.

Value

For vec_rep(), a vector the same type as x with size vec_size(x) * times.

For vec_rep_each(), a vector the same type as x with size sum(vec_recycle(times, vec_size(x))).

For vec_unrep(), a data frame with two columns, key and times. key is a vector with the same type as x, and times is an integer vector.

Dependencies

vec_slice()

Examples

# Repeat the entire vector
vec_rep(1:2, 3)

# Repeat within each vector
vec_rep_each(1:2, 3)
x <- vec_rep_each(1:2, c(3, 4))
x

# After using `vec_rep_each()`, you can recover the original vector
# with `vec_unrep()`
vec_unrep(x)

df <- data.frame(x = 1:2, y = 3:4)

# `rep()` repeats columns of data frames, and returns lists
rep(df, each = 2)

# `vec_rep()` and `vec_rep_each()` repeat rows, and return data frames
vec_rep(df, 2)
vec_rep_each(df, 2)

# `rle()` treats adjacent missing values as different
y <- c(1, NA, NA, 2)
rle(y)

# `vec_unrep()` treats them as equivalent
vec_unrep(y)

Set operations

Description

vec_set_intersect() returns all values in both x and y.
vec_set_difference() returns all values in x but not y. Note that this is an asymmetric set difference, meaning it is not commutative.
vec_set_union() returns all values in either x or y.
vec_set_symmetric_difference() returns all values in either x or y but not both. This is a commutative difference.

Because these are set operations, these functions only return unique values from x and y, returned in the order they first appeared in the original input. Names of x and y are retained on the result, but names are always taken from x if the value appears in both inputs.

These functions work similarly to intersect(), setdiff(), and union(), but don't strip attributes and can be used with data frames.

Usage

vec_set_intersect(
  x,
  y,
  ...,
  ptype = NULL,
  x_arg = "x",
  y_arg = "y",
  error_call = current_env()
)

vec_set_difference(
  x,
  y,
  ...,
  ptype = NULL,
  x_arg = "x",
  y_arg = "y",
  error_call = current_env()
)

vec_set_union(
  x,
  y,
  ...,
  ptype = NULL,
  x_arg = "x",
  y_arg = "y",
  error_call = current_env()
)

vec_set_symmetric_difference(
  x,
  y,
  ...,
  ptype = NULL,
  x_arg = "x",
  y_arg = "y",
  error_call = current_env()
)

Arguments

x, y

A pair of vectors.

...

These dots are for future extensions and must be empty.

ptype

If NULL, the default, the output type is determined by computing the common type between x and y. If supplied, both x and y will be cast to this type.

x_arg, y_arg

Argument names for x and y. These are used in error messages.

error_call

Details

Missing values are treated as equal to other missing values. For doubles and complexes, NaN are equal to other NaN, but not to NA.

Value

A vector of the common type of x and y (or ptype, if supplied) containing the result of the corresponding set function.

Dependencies

Examples

x <- c(1, 2, 1, 4, 3)
y <- c(2, 5, 5, 1)

# All unique values in both `x` and `y`.
# Duplicates in `x` and `y` are always removed.
vec_set_intersect(x, y)

# All unique values in `x` but not `y`
vec_set_difference(x, y)

# All unique values in either `x` or `y`
vec_set_union(x, y)

# All unique values in either `x` or `y` but not both
vec_set_symmetric_difference(x, y)

# These functions can also be used with data frames
x <- data_frame(
  a = c(2, 3, 2, 2),
  b = c("j", "k", "j", "l")
)
y <- data_frame(
  a = c(1, 2, 2, 2, 3),
  b = c("j", "l", "j", "l", "j")
)

vec_set_intersect(x, y)
vec_set_difference(x, y)
vec_set_union(x, y)
vec_set_symmetric_difference(x, y)

# Vector names don't affect set membership, but if you'd like to force
# them to, you can transform the vector into a two column data frame
x <- c(a = 1, b = 2, c = 2, d = 3)
y <- c(c = 2, b = 1, a = 3, d = 3)

vec_set_intersect(x, y)

x <- data_frame(name = names(x), value = unname(x))
y <- data_frame(name = names(y), value = unname(y))

vec_set_intersect(x, y)

Vector checks

Description

obj_is_vector() tests if x is considered a vector in the vctrs sense. See Vectors and scalars below for the exact details.
obj_check_vector() uses obj_is_vector() and throws a standardized and informative error if it returns FALSE.
vec_check_size() tests if x has size size, and throws an informative error if it doesn't.

Usage

obj_is_vector(x)

obj_check_vector(x, ..., arg = caller_arg(x), call = caller_env())

vec_check_size(x, size, ..., arg = caller_arg(x), call = caller_env())

Arguments

x

For ⁠obj_*()⁠ functions, an object. For ⁠vec_*()⁠ functions, a vector.

...

These dots are for future extensions and must be empty.

arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

call

size

The size to check for.

Value

obj_is_vector() returns a single TRUE or FALSE.
obj_check_vector() returns NULL invisibly, or errors.
vec_check_size() returns NULL invisibly, or errors.

Vectors and scalars

Informally, a vector is a collection that makes sense to use as column in a data frame. The following rules define whether or not x is considered a vector.

If no vec_proxy() method has been registered, x is a vector if:

The base type of the object is atomic: "logical", "integer", "double", "complex", "character", or "raw".
x is a list, as defined by obj_is_list().
x is a data.frame.

If a vec_proxy() method has been registered, x is a vector if:

The proxy satisfies one of the above conditions.
The base type of the proxy is "list", regardless of its class. S3 lists are thus treated as scalars unless they implement a vec_proxy() method.

Otherwise an object is treated as scalar and cannot be used as a vector. In particular:

NULL is not a vector.
S3 lists like lm objects are treated as scalars by default.
Objects of type expression are not treated as vectors.

Technical limitations

Support for S4 vectors is currently limited to objects that inherit from an atomic type.
Subclasses of data.frame that append their class to the back of the "class" attribute are not treated as vectors. If you inherit from an S3 class, always prepend your class to the front of the "class" attribute for correct dispatch. This matches our general principle of allowing subclasses but not mixins.

Examples

obj_is_vector(1)

# Data frames are vectors
obj_is_vector(data_frame())

# Bare lists are vectors
obj_is_vector(list())

# S3 lists are vectors if they explicitly inherit from `"list"`
x <- structure(list(), class = c("my_list", "list"))
obj_is_list(x)
obj_is_vector(x)

# But if they don't explicitly inherit from `"list"`, they aren't
# automatically considered to be vectors. Instead, vctrs considers this
# to be a scalar object, like a linear model returned from `lm()`.
y <- structure(list(), class = "my_list")
obj_is_list(y)
obj_is_vector(y)

# `obj_check_vector()` throws an informative error if the input
# isn't a vector
try(obj_check_vector(y))

# `vec_check_size()` throws an informative error if the size of the
# input doesn't match `size`
vec_check_size(1:5, size = 5)
try(vec_check_size(1:5, size = 4))

vctrs: Vector Helpers

Description

Author(s)

See Also

Default value for empty vectors

Description

Usage

Arguments

Examples

AsIs S3 class

Description

Usage

Construct a data frame

Description

Usage

Arguments

Details

Properties

See Also

Examples

Collect columns for data frame construction

Description

Usage

Arguments

Properties

See Also

Examples

Coercion between two data frames

Description

Usage

Arguments

Value

FAQ - How is the compatibility of vector types decided?

Description

Common type of multiple vectors

Type conversion and lossy cast errors

How to make two vector classes compatible?

FAQ - Error/Warning: Some attributes are incompatible

Description

Implementing coercion methods

FAQ - Error: Input must be a vector

Description

Vectorness in base R and in the tidyverse

I get a scalar type error but I think this is a bug

Tools for accessing the fields of a record.

Description

Usage

Arguments

Examples

FAQ - How to implement ptype2 and cast methods?

Description

The natural number class

Roxygen workflow

Implementing vec_ptype2()

The self-self method

The parent and children methods

Incompatible attributes

Implementing vec_cast()

FAQ - How to implement ptype2 and cast methods? (Data frames)

Description

Roxygen workflow

Parent methods

A data.table example

A tibble example

FAQ - Why isn't my class treated as a vector?

Description

Why isn’t my list class considered a vector?

Why isn’t my data frame class considered a vector?

Internal FAQ - Implementation of vec_locate_matches()

Description

Algorithm description

Overview and ==

Non-equi conditions and containers

Internal FAQ - vec_ptype2(), NULL, and unspecified vectors

Description

Promotion monoid

The NULL identity

Unspecified vectors

Finalising common types

Drop empty elements from a list

Implementing `vec_ptype2()`

Implementing `vec_cast()`

A `data.table` example

Internal FAQ - Implementation of `vec_locate_matches()`

Overview and `==`

Internal FAQ - `vec_ptype2()`, `NULL`, and unspecified vectors

The `NULL` identity

`list_of` S3 class for homogenous lists

Differences with `is.na()`