Title: | Read and Write Rectangular Text Data Quickly |
Version: | 1.6.5 |
Description: | The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting. |
License: | MIT + file LICENSE |
URL: | https://vroom.r-lib.org, https://github.com/tidyverse/vroom |
BugReports: | https://github.com/tidyverse/vroom/issues |
Depends: | R (≥ 3.6) |
Imports: | bit64, cli (≥ 3.2.0), crayon, glue, hms, lifecycle (≥ 1.0.3), methods, rlang (≥ 0.4.2), stats, tibble (≥ 2.0.0), tidyselect, tzdb (≥ 0.1.1), vctrs (≥ 0.2.0), withr |
Suggests: | archive, bench (≥ 1.1.0), covr, curl, dplyr, forcats, fs, ggplot2, knitr, patchwork, prettyunits, purrr, rmarkdown, rstudioapi, scales, spelling, testthat (≥ 2.1.0), tidyr, utils, waldo, xml2 |
LinkingTo: | cpp11 (≥ 0.2.0), progress (≥ 1.2.1), tzdb (≥ 0.1.1) |
VignetteBuilder: | knitr |
Config/Needs/website: | nycflights13, tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | false |
Copyright: | file COPYRIGHTS |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.2.3.9000 |
NeedsCompilation: | yes |
Packaged: | 2023-12-05 16:46:59 UTC; jenny |
Author: | Jim Hester |
Maintainer: | Jennifer Bryan <jenny@posit.co> |
Repository: | CRAN |
Date/Publication: | 2023-12-05 23:50:02 UTC |
vroom: Read and Write Rectangular Text Data Quickly
Description
The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.
Author(s)
Maintainer: Jennifer Bryan jenny@posit.co (ORCID)
Authors:
Jim Hester (ORCID)
Hadley Wickham hadley@posit.co (ORCID)
Other contributors:
Shelby Bearrows [contributor]
https://github.com/mandreyel/ (mio library) [copyright holder]
Jukka Jylänki (grisu3 implementation) [copyright holder]
Mikkel Jørgensen (grisu3 implementation) [copyright holder]
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidyverse/vroom/issues
Coerce to a column specification
Description
This is most useful for generating a specification using the short form or coercing from a list.
Usage
as.col_spec(x)
Arguments
x |
Input object |
Examples
as.col_spec("cccnnn")
Create column specification
Description
cols()
includes all columns in the input data, guessing the column types
as the default. cols_only()
includes only the columns you explicitly
specify, skipping the rest.
Usage
cols(..., .default = col_guess(), .delim = NULL)
cols_only(...)
col_logical(...)
col_integer(...)
col_big_integer(...)
col_double(...)
col_character(...)
col_skip(...)
col_number(...)
col_guess(...)
col_factor(levels = NULL, ordered = FALSE, include_na = FALSE, ...)
col_datetime(format = "", ...)
col_date(format = "", ...)
col_time(format = "", ...)
Arguments
... |
Either column objects created by |
.default |
Any named columns not explicitly overridden in |
.delim |
The delimiter to use when parsing. If the |
levels |
Character vector of the allowed levels. When |
ordered |
Is it an ordered factor? |
include_na |
If |
format |
A format specification, as described below. If set to "",
date times are parsed as ISO8601, dates and times used the date and
time formats specified in the Unlike |
Details
The available specifications are: (long names in quotes and string abbreviations in brackets)
function | long name | short name | description |
col_logical() | "logical" | "l" | Logical values containing only T , F , TRUE or FALSE . |
col_integer() | "integer" | "i" | Integer numbers. |
col_big_integer() | "big_integer" | "I" | Big Integers (64bit), requires the bit64 package. |
col_double() | "double", "numeric" | "d" | 64-bit double floating point numbers. |
col_character() | "character" | "c" | Character string data. |
col_factor(levels, ordered) | "factor" | "f" | A fixed set of values. |
col_date(format = "") | "date" | "D" | Calendar dates formatted with the locale's date_format . |
col_time(format = "") | "time" | "t" | Times formatted with the locale's time_format . |
col_datetime(format = "") | "datetime", "POSIXct" | "T" | ISO8601 date times. |
col_number() | "number" | "n" | Human readable numbers containing the grouping_mark |
col_skip() | "skip", "NULL" | "_", "-" | Skip and don't import this column. |
col_guess() | "guess", "NA" | "?" | Parse using the "best" guessed type based on the input. |
Examples
cols(a = col_integer())
cols_only(a = col_integer())
# You can also use the standard abbreviations
cols(a = "i")
cols(a = "i", b = "d", c = "_")
# Or long names (like utils::read.csv)
cols(a = "integer", b = "double", c = "skip")
# You can also use multiple sets of column definitions by combining
# them like so:
t1 <- cols(
column_one = col_integer(),
column_two = col_number())
t2 <- cols(
column_three = col_character())
t3 <- t1
t3$cols <- c(t1$cols, t2$cols)
t3
Examine the column specifications for a data frame
Description
cols_condense()
takes a spec object and condenses its definition by setting
the default column type to the most frequent type and only listing columns
with a different type.
spec()
extracts the full column specification from a tibble
created by readr.
Usage
cols_condense(x)
spec(x)
Arguments
x |
The data frame object to extract from |
Value
A col_spec object.
Examples
df <- vroom(vroom_example("mtcars.csv"))
s <- spec(df)
s
cols_condense(s)
Create or retrieve date names
Description
When parsing dates, you often need to know how weekdays of the week and
months are represented as text. This pair of functions allows you to either
create your own, or retrieve from a standard list. The standard list is
derived from ICU (https://site.icu-project.org
) via the stringi package.
Usage
date_names(mon, mon_ab = mon, day, day_ab = day, am_pm = c("AM", "PM"))
date_names_lang(language)
date_names_langs()
Arguments
mon , mon_ab |
Full and abbreviated month names. |
day , day_ab |
Full and abbreviated week day names. Starts with Sunday. |
am_pm |
Names used for AM and PM. |
language |
A BCP 47 locale, made up of a language and a region,
e.g. |
Examples
date_names_lang("en")
date_names_lang("ko")
date_names_lang("fr")
Generate a random tibble
Description
This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.
Usage
gen_tbl(
rows,
cols = NULL,
col_types = NULL,
locale = default_locale(),
missing = 0
)
Arguments
rows |
Number of rows to generate |
cols |
Number of columns to generate, if |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
|
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
missing |
The percentage (from 0 to 1) of missing data to use |
Details
There is also a family of functions to generate individual vectors of each type.
See Also
generators to generate individual vectors.
Examples
# random 10 x 5 table with random column types
rand_tbl <- gen_tbl(10, 5)
rand_tbl
# all double 25 x 4 table
dbl_tbl <- gen_tbl(25, 4, col_types = "dddd")
dbl_tbl
# Use the dots in long form column types to change the random function and options
types <- rep(times = 4, list(col_double(f = stats::runif, min = -10, max = 25)))
types
dbl_tbl2 <- gen_tbl(25, 4, col_types = types)
dbl_tbl2
Generate individual vectors of the types supported by vroom
Description
Generate individual vectors of the types supported by vroom
Usage
gen_character(n, min = 5, max = 25, values = c(letters, LETTERS, 0:9), ...)
gen_double(n, f = stats::rnorm, ...)
gen_number(n, f = stats::rnorm, ...)
gen_integer(n, min = 1L, max = .Machine$integer.max, prob = NULL, ...)
gen_factor(
n,
levels = NULL,
ordered = FALSE,
num_levels = gen_integer(1L, 1L, 25L),
...
)
gen_time(n, min = 0, max = hms::hms(days = 1), fractional = FALSE, ...)
gen_date(n, min = as.Date("2001-01-01"), max = as.Date("2021-01-01"), ...)
gen_datetime(
n,
min = as.POSIXct("2001-01-01"),
max = as.POSIXct("2021-01-01"),
tz = "UTC",
...
)
gen_logical(n, ...)
gen_name(n)
Arguments
n |
The size of the vector to generate |
min |
The minimum range for the vector |
max |
The maximum range for the vector |
values |
The explicit values to use. |
... |
Additional arguments passed to internal generation functions |
f |
The random function to use. |
prob |
a vector of probability weights for obtaining the elements of the vector being sampled. |
levels |
The explicit levels to use, if |
ordered |
Should the factors be ordered factors? |
num_levels |
The number of factor levels to generate |
fractional |
Whether to generate times with fractional seconds |
tz |
The timezone to use for dates |
Examples
# characters
gen_character(4)
# factors
gen_factor(4)
# logical
gen_logical(4)
# numbers
gen_double(4)
gen_integer(4)
# temporal data
gen_time(4)
gen_date(4)
gen_datetime(4)
Guess the type of a vector
Description
Guess the type of a vector
Usage
guess_type(
x,
na = c("", "NA"),
locale = default_locale(),
guess_integer = FALSE
)
Arguments
x |
Character vector of values to parse. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
guess_integer |
If |
Examples
# Logical vectors
guess_type(c("FALSE", "TRUE", "F", "T"))
# Integers and doubles
guess_type(c("1","2","3"))
guess_type(c("1.6","2.6","3.4"))
# Numbers containing grouping mark
guess_type("1,234,566")
# ISO 8601 date times
guess_type(c("2010-10-10"))
guess_type(c("2010-10-10 01:02:03"))
guess_type(c("01:02:03 AM"))
Create locales
Description
A locale object tries to capture all the defaults that can vary between
countries. You set the locale in once, and the details are automatically
passed on down to the columns parsers. The defaults have been chosen to
match R (i.e. US English) as closely as possible. See
vignette("locales")
for more details.
Usage
locale(
date_names = "en",
date_format = "%AD",
time_format = "%AT",
decimal_mark = ".",
grouping_mark = ",",
tz = "UTC",
encoding = "UTF-8"
)
default_locale()
Arguments
date_names |
Character representations of day and month names. Either
the language code as string (passed on to |
date_format , time_format |
Default date and time formats. |
decimal_mark , grouping_mark |
Symbols used to indicate the decimal
place, and to chunk larger numbers. Decimal mark can only be |
tz |
Default tz. This is used both for input (if the time zone isn't present in individual strings), and for output (to control the default display). The default is to use "UTC", a time zone that does not use daylight savings time (DST) and hence is typically most useful for data. The absence of time zones makes it approximately 50x faster to generate UTC times than any other time zone. Use For a complete list of possible time zones, see |
encoding |
Default encoding. |
Examples
locale()
locale("fr")
# South American locale
locale("es", decimal_mark = ",")
Preprocess column for output
Description
This is a generic function that applied to each column before it is saved to disk. It provides a hook for S3 classes that need special handling.
Usage
output_column(x)
Arguments
x |
A vector |
Examples
# Most types are returned unchanged
output_column(1)
output_column("x")
# datetimes are formatted in ISO 8601
output_column(Sys.Date())
output_column(Sys.time())
Retrieve parsing problems
Description
vroom will only fail to parse a file if the file is invalid in a way that is unrecoverable. However there are a number of non-fatal problems that you might want to know about. You can retrieve a data frame of these problems with this function.
Usage
problems(x = .Last.value, lazy = FALSE)
Arguments
x |
A data frame from |
lazy |
If |
Value
A data frame with one row for each problem and four columns:
row,col - Row and column number that caused the problem, referencing the original input
expected - What vroom expected to find
actual - What it actually found
file - The file with the problem
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- tidyselect
contains
,ends_with
,everything
,last_col
,matches
,num_range
,one_of
,starts_with
Read a delimited file into a tibble
Description
Read a delimited file into a tibble
Usage
vroom(
file,
delim = NULL,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
skip = 0,
n_max = Inf,
na = c("", "NA"),
quote = "\"",
comment = "",
skip_empty_rows = TRUE,
trim_ws = TRUE,
escape_double = TRUE,
escape_backslash = FALSE,
locale = default_locale(),
guess_max = 100,
altrep = TRUE,
altrep_opts = deprecated(),
num_threads = vroom_threads(),
progress = vroom_progress(),
show_col_types = NULL,
.name_repair = "unique"
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a
single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, wrap the input with |
delim |
One or more characters used to delimit fields within a
file. If |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
|
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
Either a string or 'NULL'. If a string, the output will contain a variable with that name with the filename(s) as the value. If 'NULL', the default, no variable will be created. |
skip |
Number of lines to skip before reading data. If |
n_max |
Maximum number of lines to read. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
quote |
Single character used to quote strings. |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
guess_max |
Maximum number of lines to use for guessing column types.
See |
altrep |
Control which column types use Altrep representations,
either a character vector of types, |
altrep_opts |
|
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
show_col_types |
Control showing the column specifications. If |
.name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
Examples
# get path to example file
input_file <- vroom_example("mtcars.csv")
input_file
# Read from a path
# Input sources -------------------------------------------------------------
# Read from a path
vroom(input_file)
# You can also use paths directly
# vroom("mtcars.csv")
## Not run:
# Including remote paths
vroom("https://github.com/tidyverse/vroom/raw/main/inst/extdata/mtcars.csv")
## End(Not run)
# Or directly from a string with `I()`
vroom(I("x,y\n1,2\n3,4\n"))
# Column selection ----------------------------------------------------------
# Pass column names or indexes directly to select them
vroom(input_file, col_select = c(model, cyl, gear))
vroom(input_file, col_select = c(1, 3, 11))
# Or use the selection helpers
vroom(input_file, col_select = starts_with("d"))
# You can also rename specific columns
vroom(input_file, col_select = c(car = model, everything()))
# Column types --------------------------------------------------------------
# By default, vroom guesses the columns types, looking at 1000 rows
# throughout the dataset.
# You can specify them explicitly with a compact specification:
vroom(I("x,y\n1,2\n3,4\n"), col_types = "dc")
# Or with a list of column types:
vroom(I("x,y\n1,2\n3,4\n"), col_types = list(col_double(), col_character()))
# File types ----------------------------------------------------------------
# csv
vroom(I("a,b\n1.0,2.0\n"), delim = ",")
# tsv
vroom(I("a\tb\n1.0\t2.0\n"))
# Other delimiters
vroom(I("a|b\n1.0|2.0\n"), delim = "|")
# Read datasets across multiple files ---------------------------------------
mtcars_by_cyl <- vroom_example(vroom_examples("mtcars-"))
mtcars_by_cyl
# Pass the filenames directly to vroom, they are efficiently combined
vroom(mtcars_by_cyl)
# If you need to extract data from the filenames, use `id` to request a
# column that reveals the underlying file path
dat <- vroom(mtcars_by_cyl, id = "source")
dat$source <- basename(dat$source)
dat
Show which column types are using Altrep
Description
vroom_altrep()
can be used directly as input to the altrep
argument of vroom()
.
Usage
vroom_altrep(which = NULL)
Arguments
which |
A character vector of column types to use Altrep for. Can also
take |
Details
Alternatively there is also a family of environment variables to control use of
the Altrep framework. These can then be set in your .Renviron
file, e.g.
with usethis::edit_r_environ()
. For versions of R where the Altrep
framework is unavailable (R < 3.5.0) they are automatically turned off and
the variables have no effect. The variables can take one of true
, false
,
TRUE
, FALSE
, 1
, or 0
.
-
VROOM_USE_ALTREP_NUMERICS
- If set use Altrep for all numeric types (defaultfalse
).
There are also individual variables for each type. Currently only
VROOM_USE_ALTREP_CHR
defaults to true
.
-
VROOM_USE_ALTREP_CHR
-
VROOM_USE_ALTREP_FCT
-
VROOM_USE_ALTREP_INT
-
VROOM_USE_ALTREP_BIG_INT
-
VROOM_USE_ALTREP_DBL
-
VROOM_USE_ALTREP_NUM
-
VROOM_USE_ALTREP_LGL
-
VROOM_USE_ALTREP_DTTM
-
VROOM_USE_ALTREP_DATE
-
VROOM_USE_ALTREP_TIME
Examples
vroom_altrep()
vroom_altrep(c("chr", "fct", "int"))
vroom_altrep(TRUE)
vroom_altrep(FALSE)
Show which column types are using Altrep
Description
This function is deprecated in favor of
vroom_altrep()
.
Usage
vroom_altrep_opts(which = NULL)
Arguments
which |
A character vector of column types to use Altrep for. Can also
take |
Get path to vroom examples
Description
vroom comes bundled with a number of sample files in
its 'inst/extdata' directory. Use vroom_examples()
to list all the
available examples and vroom_example()
to retrieve the path to one
example.
Usage
vroom_example(path)
vroom_examples(pattern = NULL)
Arguments
path |
Name of file. |
pattern |
A regular expression of filenames to match. If |
Examples
# List all available examples
vroom_examples()
# Get path to one example
vroom_example("mtcars.csv")
Convert a data frame to a delimited string
Description
This is equivalent to vroom_write()
, but instead of writing to
disk, it returns a string. It is primarily useful for examples and for
testing.
Usage
vroom_format(
x,
delim = "\t",
eol = "\n",
na = "NA",
col_names = TRUE,
escape = c("double", "backslash", "none"),
quote = c("needed", "all", "none"),
bom = FALSE,
num_threads = vroom_threads()
)
Arguments
x |
A data frame or tibble to write to disk. |
delim |
Delimiter used to separate values. Defaults to |
eol |
The end of line character to use. Most commonly either |
na |
String used for missing values. Defaults to 'NA'. |
col_names |
If |
escape |
The type of escape to use when quotes are in the data.
|
quote |
How to handle fields which contain characters that need to be quoted.
|
bom |
If |
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |
Read a fixed width file into a tibble
Description
Read a fixed width file into a tibble
Usage
vroom_fwf(
file,
col_positions = fwf_empty(file, skip, n = guess_max),
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
comment = "",
skip_empty_rows = TRUE,
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = 100,
altrep = TRUE,
altrep_opts = deprecated(),
num_threads = vroom_threads(),
progress = vroom_progress(),
show_col_types = NULL,
.name_repair = "unique"
)
fwf_empty(file, skip = 0, col_names = NULL, comment = "", n = 100L)
fwf_widths(widths, col_names = NULL)
fwf_positions(start, end = NULL, col_names = NULL)
fwf_cols(...)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
col_positions |
Column positions, as created by |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
The name of a column in which to store the file path. This is
useful when reading multiple input files and there is data in the file
paths, such as the data collection date. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
altrep |
Control which column types use Altrep representations,
either a character vector of types, |
altrep_opts |
|
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
show_col_types |
If |
.name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
col_names |
Either NULL, or a character vector column names. |
n |
Number of lines the tokenizer will read to determine file structure. By default it is set to 100. |
widths |
Width of each field. Use NA as width of last field when reading a ragged fwf file. |
start , end |
Starting and ending (inclusive) positions of each field. Use NA as last end field when reading a ragged fwf file. |
... |
If the first element is a data frame,
then it must have all numeric columns and either one or two rows.
The column names are the variable names. The column values are the
variable widths if a length one vector, and if length two, variable start and end
positions. The elements of |
Details
Note: fwf_empty()
cannot take a R connection such as a URL as input, as
this would result in reading from the connection twice. In these cases it is
better to download the file first before reading.
Examples
fwf_sample <- vroom_example("fwf-sample.txt")
writeLines(vroom_lines(fwf_sample))
# You can specify column positions in several ways:
# 1. Guess based on position of empty columns
vroom_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))
# 2. A vector of field widths
vroom_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))
# 3. Paired vectors of start and end positions
vroom_fwf(fwf_sample, fwf_positions(c(1, 30), c(20, 42), c("name", "ssn")))
# 4. Named arguments with start and end positions
vroom_fwf(fwf_sample, fwf_cols(name = c(1, 20), ssn = c(30, 42)))
# 5. Named arguments with column widths
vroom_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))
Read lines from a file
Description
vroom_lines()
is similar to readLines()
, however it reads the lines
lazily like vroom()
, so operations like length()
, head()
, tail()
and sample()
can be done much more efficiently without reading all the data into R.
Usage
vroom_lines(
file,
n_max = Inf,
skip = 0,
na = character(),
skip_empty_rows = FALSE,
locale = default_locale(),
altrep = TRUE,
altrep_opts = deprecated(),
num_threads = vroom_threads(),
progress = vroom_progress()
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a
single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, wrap the input with |
n_max |
Maximum number of lines to read. |
skip |
Number of lines to skip before reading data. If |
na |
Character vector of strings to interpret as missing values. Set this
option to |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
altrep |
Control which column types use Altrep representations,
either a character vector of types, |
altrep_opts |
|
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
Examples
lines <- vroom_lines(vroom_example("mtcars.csv"))
length(lines)
head(lines, n = 2)
tail(lines, n = 2)
sample(lines, size = 2)
Determine whether progress bars should be shown
Description
By default, vroom shows progress bars. However, progress reporting is suppressed if any of the following conditions hold:
The bar is explicitly disabled by setting the environment variable
VROOM_SHOW_PROGRESS
to"false"
.The code is run in a non-interactive session, as determined by
rlang::is_interactive()
.The code is run in an RStudio notebook chunk, as determined by
getOption("rstudio.notebook.executing")
.
Usage
vroom_progress()
Examples
vroom_progress()
Structure of objects
Description
Similar to str()
but with more information for Altrep objects.
Usage
vroom_str(x)
Arguments
x |
a vector |
Examples
# when used on non-altrep objects altrep will always be false
vroom_str(mtcars)
mt <- vroom(vroom_example("mtcars.csv"), ",", altrep = c("chr", "dbl"))
vroom_str(mt)
Write a data frame to a delimited file
Description
Write a data frame to a delimited file
Usage
vroom_write(
x,
file,
delim = "\t",
eol = "\n",
na = "NA",
col_names = !append,
append = FALSE,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
bom = FALSE,
num_threads = vroom_threads(),
progress = vroom_progress(),
path = deprecated()
)
Arguments
x |
A data frame or tibble to write to disk. |
file |
File or connection to write to. |
delim |
Delimiter used to separate values. Defaults to |
eol |
The end of line character to use. Most commonly either |
na |
String used for missing values. Defaults to 'NA'. |
col_names |
If |
append |
If |
quote |
How to handle fields which contain characters that need to be quoted.
|
escape |
The type of escape to use when quotes are in the data.
|
bom |
If |
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The display
is updated every 50,000 values and will only display if estimated reading
time is 5 seconds or more. The automatic progress bar can be disabled by
setting option |
path |
Examples
# If you only specify a file name, vroom_write() will write
# the file to your current working directory.
out_file <- tempfile(fileext = "csv")
vroom_write(mtcars, out_file, ",")
# You can also use a literal filename
# vroom_write(mtcars, "mtcars.tsv")
# If you add an extension to the file name, write_()* will
# automatically compress the output.
# vroom_write(mtcars, "mtcars.tsv.gz")
# vroom_write(mtcars, "mtcars.tsv.bz2")
# vroom_write(mtcars, "mtcars.tsv.xz")
Write lines to a file
Description
Write lines to a file
Usage
vroom_write_lines(
x,
file,
eol = "\n",
na = "NA",
append = FALSE,
num_threads = vroom_threads()
)
Arguments
x |
A character vector. |
file |
File or connection to write to. |
eol |
The end of line character to use. Most commonly either |
na |
String used for missing values. Defaults to 'NA'. |
append |
If |
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |