Title: | Read Rectangular Text Data |
Version: | 2.1.5 |
Description: | The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. |
License: | MIT + file LICENSE |
URL: | https://readr.tidyverse.org, https://github.com/tidyverse/readr |
BugReports: | https://github.com/tidyverse/readr/issues |
Depends: | R (≥ 3.6) |
Imports: | cli (≥ 3.2.0), clipr, crayon, hms (≥ 0.4.1), lifecycle (≥ 0.2.0), methods, R6, rlang, tibble, utils, vroom (≥ 1.6.0) |
Suggests: | covr, curl, datasets, knitr, rmarkdown, spelling, stringi, testthat (≥ 3.2.0), tzdb (≥ 0.1.1), waldo, withr, xml2 |
LinkingTo: | cpp11, tzdb (≥ 0.1.1) |
VignetteBuilder: | knitr |
Config/Needs/website: | tidyverse, tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | false |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | yes |
Packaged: | 2024-01-10 21:03:49 UTC; jenny |
Author: | Hadley Wickham [aut],
Jim Hester [aut],
Romain Francois [ctb],
Jennifer Bryan |
Maintainer: | Jennifer Bryan <jenny@posit.co> |
Repository: | CRAN |
Date/Publication: | 2024-01-10 23:20:02 UTC |
readr: Read Rectangular Text Data
Description
The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
Author(s)
Maintainer: Jennifer Bryan jenny@posit.co (ORCID)
Authors:
Hadley Wickham hadley@posit.co
Jim Hester
Other contributors:
Romain Francois [contributor]
Shelby Bearrows [contributor]
Posit Software, PBC [copyright holder, funder]
https://github.com/mandreyel/ (mio library) [copyright holder]
Jukka Jylänki (grisu3 implementation) [contributor, copyright holder]
Mikkel Jørgensen (grisu3 implementation) [contributor, copyright holder]
See Also
Useful links:
Report bugs at https://github.com/tidyverse/readr/issues
Generate a column specification
Description
This is most useful for generating a specification using the short form
Usage
as.col_spec(x)
Arguments
x |
Input object |
Examples
as.col_spec("cccnnn")
Callback classes
Description
These classes are used to define callback behaviors.
Details
- ChunkCallback
Callback interface definition, all callback functions should inherit from this class.
- SideEffectChunkCallback
Callback function that is used only for side effects, no results are returned.
- DataFrameCallback
Callback function that combines each result together at the end.
- AccumulateCallBack
-
Callback function that accumulates a single result. Requires the parameter
acc
to specify the initial value of the accumulator. The parameteracc
isNULL
by default.
See Also
Other chunked:
melt_delim_chunked()
,
read_delim_chunked()
,
read_lines_chunked()
Examples
## If given a regular function it is converted to a SideEffectChunkCallback
# view structure of each chunk
read_lines_chunked(readr_example("mtcars.csv"), str, chunk_size = 5)
# Print starting line of each chunk
f <- function(x, pos) print(pos)
read_lines_chunked(readr_example("mtcars.csv"), SideEffectChunkCallback$new(f), chunk_size = 5)
# If combined results are desired you can use the DataFrameCallback
# Cars with 3 gears
f <- function(x, pos) subset(x, gear == 3)
read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)
# The ListCallback can be used for more flexible output
f <- function(x, pos) x$mpg[x$hp > 100]
read_csv_chunked(readr_example("mtcars.csv"), ListCallback$new(f), chunk_size = 5)
# The AccumulateCallback accumulates results from each chunk
f <- function(x, pos, acc) sum(x$mpg) + acc
read_csv_chunked(readr_example("mtcars.csv"), AccumulateCallback$new(f, acc = 0), chunk_size = 5)
Returns values from the clipboard
Description
This is useful in the read_delim()
functions to read from the clipboard.
Usage
clipboard()
See Also
read_delim
Skip a column
Description
Use this function to ignore a column when reading in a file.
To skip all columns not otherwise specified, use cols_only()
.
Usage
col_skip()
See Also
Other parsers:
cols_condense()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_number()
,
parse_vector()
Create column specification
Description
cols()
includes all columns in the input data, guessing the column types
as the default. cols_only()
includes only the columns you explicitly
specify, skipping the rest. In general you can substitute list()
for
cols()
without changing the behavior.
Usage
cols(..., .default = col_guess())
cols_only(...)
Arguments
... |
Either column objects created by |
.default |
Any named columns not explicitly overridden in |
Details
The available specifications are: (with string abbreviations in brackets)
-
col_logical()
[l], containing onlyT
,F
,TRUE
orFALSE
. -
col_integer()
[i], integers. -
col_double()
[d], doubles. -
col_character()
[c], everything else. -
col_factor(levels, ordered)
[f], a fixed set of values. -
col_date(format = "")
[D]: with the locale'sdate_format
. -
col_time(format = "")
[t]: with the locale'stime_format
. -
col_datetime(format = "")
[T]: ISO8601 date times -
col_number()
[n], numbers containing thegrouping_mark
-
col_skip()
[_, -], don't import this column. -
col_guess()
[?], parse using the "best" type based on the input.
See Also
Other parsers:
col_skip()
,
cols_condense()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_number()
,
parse_vector()
Examples
cols(a = col_integer())
cols_only(a = col_integer())
# You can also use the standard abbreviations
cols(a = "i")
cols(a = "i", b = "d", c = "_")
# You can also use multiple sets of column definitions by combining
# them like so:
t1 <- cols(
column_one = col_integer(),
column_two = col_number()
)
t2 <- cols(
column_three = col_character()
)
t3 <- t1
t3$cols <- c(t1$cols, t2$cols)
t3
Examine the column specifications for a data frame
Description
cols_condense()
takes a spec object and condenses its definition by setting
the default column type to the most frequent type and only listing columns
with a different type.
spec()
extracts the full column specification from a tibble
created by readr.
Usage
cols_condense(x)
spec(x)
Arguments
x |
The data frame object to extract from |
Value
A col_spec object.
See Also
Other parsers:
col_skip()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_number()
,
parse_vector()
Examples
df <- read_csv(readr_example("mtcars.csv"))
s <- spec(df)
s
cols_condense(s)
Count the number of fields in each line of a file
Description
This is useful for diagnosing problems with functions that fail to parse correctly.
Usage
count_fields(file, tokenizer, skip = 0, n_max = -1L)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
tokenizer |
A tokenizer that specifies how to break the |
skip |
Number of lines to skip before reading data. |
n_max |
Optionally, maximum number of rows to count fields for. |
Examples
count_fields(readr_example("mtcars.csv"), tokenizer_csv())
Create a source object.
Description
Create a source object.
Usage
datasource(
file,
skip = 0,
skip_empty_rows = FALSE,
comment = "",
skip_quote = TRUE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
skip |
Number of lines to skip before reading data. |
Examples
# Literal csv
datasource("a,b,c\n1,2,3")
datasource(charToRaw("a,b,c\n1,2,3"))
# Strings
datasource(readr_example("mtcars.csv"))
datasource(readr_example("mtcars.csv.bz2"))
datasource(readr_example("mtcars.csv.zip"))
## Not run:
datasource("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")
## End(Not run)
# Connection
con <- rawConnection(charToRaw("abc\n123"))
datasource(con)
close(con)
Create or retrieve date names
Description
When parsing dates, you often need to know how weekdays of the week and
months are represented as text. This pair of functions allows you to either
create your own, or retrieve from a standard list. The standard list is
derived from ICU (http://site.icu-project.org
) via the stringi package.
Usage
date_names(mon, mon_ab = mon, day, day_ab = day, am_pm = c("AM", "PM"))
date_names_lang(language)
date_names_langs()
Arguments
mon , mon_ab |
Full and abbreviated month names. |
day , day_ab |
Full and abbreviated week day names. Starts with Sunday. |
am_pm |
Names used for AM and PM. |
language |
A BCP 47 locale, made up of a language and a region,
e.g. |
Examples
date_names_lang("en")
date_names_lang("ko")
date_names_lang("fr")
Retrieve the currently active edition
Description
Retrieve the currently active edition
Usage
edition_get()
Value
An integer corresponding to the currently active edition.
Examples
edition_get()
Convert a data frame to a delimited string
Description
These functions are equivalent to write_csv()
etc., but instead
of writing to disk, they return a string.
Usage
format_delim(
x,
delim,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
quote_escape = deprecated()
)
format_csv(
x,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
quote_escape = deprecated()
)
format_csv2(
x,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
quote_escape = deprecated()
)
format_tsv(
x,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
quote_escape = deprecated()
)
Arguments
x |
A data frame. |
delim |
Delimiter used to separate values. Defaults to |
na |
String used for missing values. Defaults to NA. Missing values
will never be quoted; strings with the same value as |
append |
If |
col_names |
If |
quote |
How to handle fields which contain characters that need to be quoted.
|
escape |
The type of escape to use when quotes are in the data.
|
eol |
The end of line character to use. Most commonly either |
quote_escape |
Value
A string.
Output
Factors are coerced to character. Doubles are formatted to a decimal string
using the grisu3 algorithm. POSIXct
values are formatted as ISO8601 with a
UTC timezone Note: POSIXct
objects in local or non-UTC timezones will be
converted to UTC time before writing.
All columns are encoded as UTF-8. write_excel_csv()
and write_excel_csv2()
also include a
UTF-8 Byte order mark
which indicates to Excel the csv is UTF-8 encoded.
write_excel_csv2()
and write_csv2
were created to allow users with
different locale settings to save .csv files using their default settings
(e.g. ;
as the column separator and ,
as the decimal separator).
This is common in some European countries.
Values are only quoted if they contain a comma, quote or newline.
The write_*()
functions will automatically compress outputs if an appropriate extension is given.
Three extensions are currently supported: .gz
for gzip compression, .bz2
for bzip2 compression and
.xz
for lzma compression. See the examples for more information.
References
Florian Loitsch, Printing Floating-Point Numbers Quickly and Accurately with Integers, PLDI '10, http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
Examples
# format_()* functions are useful for testing and reprexes
cat(format_csv(mtcars))
cat(format_tsv(mtcars))
cat(format_delim(mtcars, ";"))
# Specifying missing values
df <- data.frame(x = c(1, NA, 3))
format_csv(df, na = "missing")
# Quotes are automatically added as needed
df <- data.frame(x = c("a ", '"', ",", "\n"))
cat(format_csv(df))
Guess encoding of file
Description
Uses stringi::stri_enc_detect()
: see the documentation there
for caveats.
Usage
guess_encoding(file, n_max = 10000, threshold = 0.2)
Arguments
file |
A character string specifying an input as specified in
|
n_max |
Number of lines to read. If |
threshold |
Only report guesses above this threshold of certainty. |
Value
A tibble
Examples
guess_encoding(readr_example("mtcars.csv"))
guess_encoding(read_lines_raw(readr_example("mtcars.csv")))
guess_encoding(read_file_raw(readr_example("mtcars.csv")))
guess_encoding("a\n\u00b5\u00b5")
Create locales
Description
A locale object tries to capture all the defaults that can vary between
countries. You set the locale in once, and the details are automatically
passed on down to the columns parsers. The defaults have been chosen to
match R (i.e. US English) as closely as possible. See
vignette("locales")
for more details.
Usage
locale(
date_names = "en",
date_format = "%AD",
time_format = "%AT",
decimal_mark = ".",
grouping_mark = ",",
tz = "UTC",
encoding = "UTF-8",
asciify = FALSE
)
default_locale()
Arguments
date_names |
Character representations of day and month names. Either
the language code as string (passed on to |
date_format , time_format |
Default date and time formats. |
decimal_mark , grouping_mark |
Symbols used to indicate the decimal
place, and to chunk larger numbers. Decimal mark can only be |
tz |
Default tz. This is used both for input (if the time zone isn't present in individual strings), and for output (to control the default display). The default is to use "UTC", a time zone that does not use daylight savings time (DST) and hence is typically most useful for data. The absence of time zones makes it approximately 50x faster to generate UTC times than any other time zone. Use For a complete list of possible time zones, see |
encoding |
Default encoding. This only affects how the file is read - readr always converts the output to UTF-8. |
asciify |
Should diacritics be stripped from date names and converted to ASCII? This is useful if you're dealing with ASCII data where the correct spellings have been lost. Requires the stringi package. |
Examples
locale()
locale("fr")
# South American locale
locale("es", decimal_mark = ",")
Return melted data for each token in a delimited file (including csv & tsv)
Description
This function has been superseded in readr and moved to the meltr package.
Usage
melt_delim(
file,
delim,
quote = "\"",
escape_backslash = FALSE,
escape_double = TRUE,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
comment = "",
trim_ws = FALSE,
skip = 0,
n_max = Inf,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_csv(
file,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_csv2(
file,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_tsv(
file,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
progress = show_progress(),
skip_empty_rows = FALSE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
delim |
Single character used to separate fields within a record. |
quote |
Single character used to quote strings. |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
quoted_na |
|
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
n_max |
Maximum number of lines to read. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
Details
For certain non-rectangular data formats, it can be useful to parse the data into a melted format where each row represents a single token.
melt_csv()
and melt_tsv()
are special cases of the general
melt_delim()
. They're useful for reading the most common types of
flat file data, comma separated values and tab separated values,
respectively. melt_csv2()
uses ;
for the field separator and ,
for the
decimal point. This is common in some European countries.
Value
A tibble()
of four columns:
-
row
, the row that the token comes from in the original file -
col
, the column that the token comes from in the original file -
data_type
, the data type of the token, e.g."integer"
,"character"
,"date"
, guessed in a similar way to theguess_parser()
function. -
value
, the token itself as a character string, unchanged from its representation in the original file.
If there are parsing problems, a warning tells you
how many, and you can retrieve the details with problems()
.
See Also
read_delim()
for the conventional way to read rectangular data
from delimited files.
Examples
# Input sources -------------------------------------------------------------
# Read from a path
melt_csv(readr_example("mtcars.csv"))
melt_csv(readr_example("mtcars.csv.zip"))
melt_csv(readr_example("mtcars.csv.bz2"))
## Not run:
melt_csv("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")
## End(Not run)
# Or directly from a string (must contain a newline)
melt_csv("x,y\n1,2\n3,4")
# To import empty cells as 'empty' rather than `NA`
melt_csv("x,y\n,NA,\"\",''", na = "NA")
# File types ----------------------------------------------------------------
melt_csv("a,b\n1.0,2.0")
melt_csv2("a;b\n1,0;2,0")
melt_tsv("a\tb\n1.0\t2.0")
melt_delim("a|b\n1.0|2.0", delim = "|")
Melt a delimited file by chunks
Description
For certain non-rectangular data formats, it can be useful to parse the data into a melted format where each row represents a single token.
Usage
melt_delim_chunked(
file,
callback,
chunk_size = 10000,
delim,
quote = "\"",
escape_backslash = FALSE,
escape_double = TRUE,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
comment = "",
trim_ws = FALSE,
skip = 0,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_csv_chunked(
file,
callback,
chunk_size = 10000,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_csv2_chunked(
file,
callback,
chunk_size = 10000,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
progress = show_progress(),
skip_empty_rows = FALSE
)
melt_tsv_chunked(
file,
callback,
chunk_size = 10000,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
progress = show_progress(),
skip_empty_rows = FALSE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
callback |
A callback function to call on each chunk |
chunk_size |
The number of rows to include in each chunk |
delim |
Single character used to separate fields within a record. |
quote |
Single character used to quote strings. |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
quoted_na |
|
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
Details
melt_delim_chunked()
and the specialisations melt_csv_chunked()
,
melt_csv2_chunked()
and melt_tsv_chunked()
read files by a chunk of rows
at a time, executing a given function on one chunk before reading the next.
See Also
Other chunked:
callback
,
read_delim_chunked()
,
read_lines_chunked()
Examples
# Cars with 3 gears
f <- function(x, pos) subset(x, data_type == "integer")
melt_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)
Return melted data for each token in a fixed width file
Description
This function has been superseded in readr and moved to the meltr package.
Usage
melt_fwf(
file,
col_positions,
locale = default_locale(),
na = c("", "NA"),
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
progress = show_progress(),
skip_empty_rows = FALSE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
col_positions |
Column positions, as created by |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
Details
For certain non-rectangular data formats, it can be useful to parse the data into a melted format where each row represents a single token.
melt_fwf()
parses each token of a fixed width file into a single row, but
it still requires that each field is in the same in every row of the
source file.
See Also
melt_table()
to melt fixed width files where each
column is separated by whitespace, and read_fwf()
for the conventional
way to read rectangular data from fixed width files.
Examples
fwf_sample <- readr_example("fwf-sample.txt")
cat(read_lines(fwf_sample))
# You can specify column positions in several ways:
# 1. Guess based on position of empty columns
melt_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))
# 2. A vector of field widths
melt_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))
# 3. Paired vectors of start and end positions
melt_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn")))
# 4. Named arguments with start and end positions
melt_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42)))
# 5. Named arguments with column widths
melt_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))
Return melted data for each token in a whitespace-separated file
Description
This function has been superseded in readr and moved to the meltr package.
For certain non-rectangular data formats, it can be useful to parse the data into a melted format where each row represents a single token.
melt_table()
and melt_table2()
are designed to read the type of textual
data where each column is separated by one (or more) columns of space.
melt_table2()
allows any number of whitespace characters between columns,
and the lines can be of different lengths.
melt_table()
is more strict, each line must be the same length,
and each field is in the same position in every line. It first finds empty
columns and then parses like a fixed width file.
Usage
melt_table(
file,
locale = default_locale(),
na = "NA",
skip = 0,
n_max = Inf,
guess_max = min(n_max, 1000),
progress = show_progress(),
comment = "",
skip_empty_rows = FALSE
)
melt_table2(
file,
locale = default_locale(),
na = "NA",
skip = 0,
n_max = Inf,
progress = show_progress(),
comment = "",
skip_empty_rows = FALSE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
See Also
melt_fwf()
to melt fixed width files where each column
is not separated by whitespace. melt_fwf()
is also useful for reading
tabular data with non-standard formatting. read_table()
is the
conventional way to read tabular data from whitespace-separated files.
Examples
fwf <- readr_example("fwf-sample.txt")
writeLines(read_lines(fwf))
melt_table(fwf)
ws <- readr_example("whitespace-sample.txt")
writeLines(read_lines(ws))
melt_table2(ws)
Preprocess column for output
Description
This is a generic function that applied to each column before it is saved to disk. It provides a hook for S3 classes that need special handling.
Usage
output_column(x, name)
Arguments
x |
A vector |
Examples
# Most columns are not altered, but POSIXct are converted to ISO8601.
x <- parse_datetime("2016-01-01")
str(output_column(x))
Parse logicals, integers, and reals
Description
Use parse_*()
if you have a character vector you want to parse. Use
col_*()
in conjunction with a read_*()
function to parse the
values as they're read in.
Usage
parse_logical(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)
parse_integer(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)
parse_double(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)
parse_character(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)
col_logical()
col_integer()
col_double()
col_character()
Arguments
x |
Character vector of values to parse. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_number()
,
parse_vector()
Examples
parse_integer(c("1", "2", "3"))
parse_double(c("1", "2", "3.123"))
parse_number("$1,123,456.00")
# Use locale to override default decimal and grouping marks
es_MX <- locale("es", decimal_mark = ",")
parse_number("$1.123.456,00", locale = es_MX)
# Invalid values are replaced with missing values with a warning.
x <- c("1", "2", "3", "-")
parse_double(x)
# Or flag values as missing
parse_double(x, na = "-")
Parse date/times
Description
Parse date/times
Usage
parse_datetime(
x,
format = "",
na = c("", "NA"),
locale = default_locale(),
trim_ws = TRUE
)
parse_date(
x,
format = "",
na = c("", "NA"),
locale = default_locale(),
trim_ws = TRUE
)
parse_time(
x,
format = "",
na = c("", "NA"),
locale = default_locale(),
trim_ws = TRUE
)
col_datetime(format = "")
col_date(format = "")
col_time(format = "")
Arguments
x |
A character vector of dates to parse. |
format |
A format specification, as described below. If set to "",
date times are parsed as ISO8601, dates and times used the date and
time formats specified in the Unlike |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
Value
A POSIXct()
vector with tzone
attribute set to
tz
. Elements that could not be parsed (or did not generate valid
dates) will be set to NA
, and a warning message will inform
you of the total number of failures.
Format specification
readr
uses a format specification similar to strptime()
.
There are three types of element:
Date components are specified with "%" followed by a letter. For example "%Y" matches a 4 digit year, "%m", matches a 2 digit month and "%d" matches a 2 digit day. Month and day default to
1
, (i.e. Jan 1st) if not present, for example if only a year is given.Whitespace is any sequence of zero or more whitespace characters.
Any other character is matched exactly.
parse_datetime()
recognises the following format specifications:
Year: "%Y" (4 digits). "%y" (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
Month: "%m" (2 digits), "%b" (abbreviated name in current locale), "%B" (full name in current locale).
Day: "%d" (2 digits), "%e" (optional leading space), "%a" (abbreviated name in current locale).
Hour: "%H" or "%I" or "%h", use I (and not H) with AM/PM, use h (and not H) if your times represent durations longer than one day.
Minutes: "%M"
Seconds: "%S" (integer seconds), "%OS" (partial seconds)
Time zone: "%Z" (as name, e.g. "America/Chicago"), "%z" (as offset from UTC, e.g. "+0800")
AM/PM indicator: "%p".
Non-digits: "%." skips one non-digit character, "%+" skips one or more non-digit characters, "%*" skips any number of non-digits characters.
Automatic parsers: "%AD" parses with a flexible YMD parser, "%AT" parses with a flexible HMS parser.
Time since the Unix epoch: "%s" decimal seconds since the Unix epoch.
Shortcuts: "%D" = "%m/%d/%y", "%F" = "%Y-%m-%d", "%R" = "%H:%M", "%T" = "%H:%M:%S", "%x" = "%y/%m/%d".
ISO8601 support
Currently, readr does not support all of ISO8601. Missing features:
Week & weekday specifications, e.g. "2013-W05", "2013-W05-10".
Ordinal dates, e.g. "2013-095".
Using commas instead of a period for decimal separator.
The parser is also a little laxer than ISO8601:
Dates and times can be separated with a space, not just T.
Mostly correct specifications like "2009-05-19 14:" and "200912-01" work.
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_number()
,
parse_vector()
Examples
# Format strings --------------------------------------------------------
parse_datetime("01/02/2010", "%d/%m/%Y")
parse_datetime("01/02/2010", "%m/%d/%Y")
# Handle any separator
parse_datetime("01/02/2010", "%m%.%d%.%Y")
# Dates look the same, but internally they use the number of days since
# 1970-01-01 instead of the number of seconds. This avoids a whole lot
# of troubles related to time zones, so use if you can.
parse_date("01/02/2010", "%d/%m/%Y")
parse_date("01/02/2010", "%m/%d/%Y")
# You can parse timezones from strings (as listed in OlsonNames())
parse_datetime("2010/01/01 12:00 US/Central", "%Y/%m/%d %H:%M %Z")
# Or from offsets
parse_datetime("2010/01/01 12:00 -0600", "%Y/%m/%d %H:%M %z")
# Use the locale parameter to control the default time zone
# (but note UTC is considerably faster than other options)
parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M",
locale = locale(tz = "US/Central")
)
parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M",
locale = locale(tz = "US/Eastern")
)
# Unlike strptime, the format specification must match the complete
# string (ignoring leading and trailing whitespace). This avoids common
# errors:
strptime("01/02/2010", "%d/%m/%y")
parse_datetime("01/02/2010", "%d/%m/%y")
# Failures -------------------------------------------------------------
parse_datetime("01/01/2010", "%d/%m/%Y")
parse_datetime(c("01/ab/2010", "32/01/2010"), "%d/%m/%Y")
# Locales --------------------------------------------------------------
# By default, readr expects English date/times, but that's easy to change'
parse_datetime("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
parse_datetime("1 enero 2015", "%d %B %Y", locale = locale("es"))
# ISO8601 --------------------------------------------------------------
# With separators
parse_datetime("1979-10-14")
parse_datetime("1979-10-14T10")
parse_datetime("1979-10-14T10:11")
parse_datetime("1979-10-14T10:11:12")
parse_datetime("1979-10-14T10:11:12.12345")
# Without separators
parse_datetime("19791014")
parse_datetime("19791014T101112")
# Time zones
us_central <- locale(tz = "US/Central")
parse_datetime("1979-10-14T1010", locale = us_central)
parse_datetime("1979-10-14T1010-0500", locale = us_central)
parse_datetime("1979-10-14T1010Z", locale = us_central)
# Your current time zone
parse_datetime("1979-10-14T1010", locale = locale(tz = ""))
Parse factors
Description
parse_factor()
is similar to factor()
, but generates a warning if
levels
have been specified and some elements of x
are not found in those
levels
.
Usage
parse_factor(
x,
levels = NULL,
ordered = FALSE,
na = c("", "NA"),
locale = default_locale(),
include_na = TRUE,
trim_ws = TRUE
)
col_factor(levels = NULL, ordered = FALSE, include_na = FALSE)
Arguments
x |
Character vector of values to parse. |
levels |
Character vector of the allowed levels. When |
ordered |
Is it an ordered factor? |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
include_na |
If |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_datetime()
,
parse_guess()
,
parse_logical()
,
parse_number()
,
parse_vector()
Examples
# discover the levels from the data
parse_factor(c("a", "b"))
parse_factor(c("a", "b", "-99"))
parse_factor(c("a", "b", "-99"), na = c("", "NA", "-99"))
parse_factor(c("a", "b", "-99"), na = c("", "NA", "-99"), include_na = FALSE)
# provide the levels explicitly
parse_factor(c("a", "b"), levels = letters[1:5])
x <- c("cat", "dog", "caw")
animals <- c("cat", "dog", "cow")
# base::factor() silently converts elements that do not match any levels to
# NA
factor(x, levels = animals)
# parse_factor() generates same factor as base::factor() but throws a warning
# and reports problems
parse_factor(x, levels = animals)
Parse using the "best" type
Description
parse_guess()
returns the parser vector; guess_parser()
returns the name of the parser. These functions use a number of heuristics
to determine which type of vector is "best". Generally they try to err of
the side of safety, as it's straightforward to override the parsing choice
if needed.
Usage
parse_guess(
x,
na = c("", "NA"),
locale = default_locale(),
trim_ws = TRUE,
guess_integer = FALSE
)
col_guess()
guess_parser(
x,
locale = default_locale(),
guess_integer = FALSE,
na = c("", "NA")
)
Arguments
x |
Character vector of values to parse. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
guess_integer |
If |
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_logical()
,
parse_number()
,
parse_vector()
Examples
# Logical vectors
parse_guess(c("FALSE", "TRUE", "F", "T"))
# Integers and doubles
parse_guess(c("1", "2", "3"))
parse_guess(c("1.6", "2.6", "3.4"))
# Numbers containing grouping mark
guess_parser("1,234,566")
parse_guess("1,234,566")
# ISO 8601 date times
guess_parser(c("2010-10-10"))
parse_guess(c("2010-10-10"))
Parse numbers, flexibly
Description
This parses the first number it finds, dropping any non-numeric characters before the first number and all characters after the first number. The grouping mark specified by the locale is ignored inside the number.
Usage
parse_number(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)
col_number()
Arguments
x |
Character vector of values to parse. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
Value
A numeric vector (double) of parsed numbers.
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_vector()
Examples
## These all return 1000
parse_number("$1,000") ## leading `$` and grouping character `,` ignored
parse_number("euro1,000") ## leading non-numeric euro ignored
parse_number("t1000t1000") ## only parses first number found
parse_number("1,234.56")
## explicit locale specifying European grouping and decimal marks
parse_number("1.234,56", locale = locale(decimal_mark = ",", grouping_mark = "."))
## SI/ISO 31-0 standard spaces for number grouping
parse_number("1 234.56", locale = locale(decimal_mark = ".", grouping_mark = " "))
## Specifying strings for NAs
parse_number(c("1", "2", "3", "NA"))
parse_number(c("1", "2", "3", "NA", "Nothing"), na = c("NA", "Nothing"))
Parse a character vector.
Description
Parse a character vector.
Usage
parse_vector(
x,
collector,
na = c("", "NA"),
locale = default_locale(),
trim_ws = TRUE
)
Arguments
x |
Character vector of elements to parse. |
collector |
Column specification. |
na |
Character vector of strings to interpret as missing values. Set this
option to |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
See Also
Other parsers:
col_skip()
,
cols_condense()
,
cols()
,
parse_datetime()
,
parse_factor()
,
parse_guess()
,
parse_logical()
,
parse_number()
Examples
x <- c("1", "2", "3", "NA")
parse_vector(x, col_integer())
parse_vector(x, col_double())
Retrieve parsing problems
Description
Readr functions will only throw an error if parsing fails in an unrecoverable
way. However, there are lots of potential problems that you might want to
know about - these are stored in the problems
attribute of the
output, which you can easily access with this function.
stop_for_problems()
will throw an error if there are any parsing
problems: this is useful for automated scripts where you want to throw
an error as soon as you encounter a problem.
Usage
problems(x = .Last.value)
stop_for_problems(x)
Arguments
x |
A data frame (from |
Value
A data frame with one row for each problem and four columns:
row , col |
Row and column of problem |
expected |
What readr expected to find |
actual |
What it actually got |
Examples
x <- parse_integer(c("1X", "blah", "3"))
problems(x)
y <- parse_integer(c("1", "2", "3"))
problems(y)
Read built-in object from package
Description
Consistent wrapper around data()
that forces the promise. This is also a
stronger parallel to loading data from a file.
Usage
read_builtin(x, package = NULL)
Arguments
x |
Name (character string) of data set to read. |
package |
Name of package from which to find data set. By default, all attached packages are searched and then the 'data' subdirectory (if present) of the current working directory. |
Value
An object of the built-in class of x
.
Examples
read_builtin("mtcars", "datasets")
Read a delimited file (including CSV and TSV) into a tibble
Description
read_csv()
and read_tsv()
are special cases of the more general
read_delim()
. They're useful for reading the most common types of
flat file data, comma separated values and tab separated values,
respectively. read_csv2()
uses ;
for the field separator and ,
for the
decimal point. This format is common in some European countries.
Usage
read_delim(
file,
delim = NULL,
quote = "\"",
escape_backslash = FALSE,
escape_double = TRUE,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
comment = "",
trim_ws = FALSE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
read_csv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
read_csv2(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
read_tsv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
delim |
Single character used to separate fields within a record. |
quote |
Single character used to quote strings. |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
The name of a column in which to store the file path. This is
useful when reading multiple input files and there is data in the file
paths, such as the data collection date. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
quoted_na |
|
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
show_col_types |
If |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
lazy |
Read values lazily? By default, this is Learn more in |
Value
A tibble()
. If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling problems()
on your dataset.
Examples
# Input sources -------------------------------------------------------------
# Read from a path
read_csv(readr_example("mtcars.csv"))
read_csv(readr_example("mtcars.csv.zip"))
read_csv(readr_example("mtcars.csv.bz2"))
## Not run:
# Including remote paths
read_csv("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")
## End(Not run)
# Read from multiple file paths at once
continents <- c("africa", "americas", "asia", "europe", "oceania")
filepaths <- vapply(
paste0("mini-gapminder-", continents, ".csv"),
FUN = readr_example,
FUN.VALUE = character(1)
)
read_csv(filepaths, id = "file")
# Or directly from a string with `I()`
read_csv(I("x,y\n1,2\n3,4"))
# Column selection-----------------------------------------------------------
# Pass column names or indexes directly to select them
read_csv(readr_example("chickens.csv"), col_select = c(chicken, eggs_laid))
read_csv(readr_example("chickens.csv"), col_select = c(1, 3:4))
# Or use the selection helpers
read_csv(
readr_example("chickens.csv"),
col_select = c(starts_with("c"), last_col())
)
# You can also rename specific columns
read_csv(
readr_example("chickens.csv"),
col_select = c(egg_yield = eggs_laid, everything())
)
# Column types --------------------------------------------------------------
# By default, readr guesses the columns types, looking at `guess_max` rows.
# You can override with a compact specification:
read_csv(I("x,y\n1,2\n3,4"), col_types = "dc")
# Or with a list of column types:
read_csv(I("x,y\n1,2\n3,4"), col_types = list(col_double(), col_character()))
# If there are parsing problems, you get a warning, and can extract
# more details with problems()
y <- read_csv(I("x\n1\n2\nb"), col_types = list(col_double()))
y
problems(y)
# Column names --------------------------------------------------------------
# By default, readr duplicate name repair is noisy
read_csv(I("x,x\n1,2\n3,4"))
# Same default repair strategy, but quiet
read_csv(I("x,x\n1,2\n3,4"), name_repair = "unique_quiet")
# There's also a global option that controls verbosity of name repair
withr::with_options(
list(rlib_name_repair_verbosity = "quiet"),
read_csv(I("x,x\n1,2\n3,4"))
)
# Or use "minimal" to turn off name repair
read_csv(I("x,x\n1,2\n3,4"), name_repair = "minimal")
# File types ----------------------------------------------------------------
read_csv(I("a,b\n1.0,2.0"))
read_csv2(I("a;b\n1,0;2,0"))
read_tsv(I("a\tb\n1.0\t2.0"))
read_delim(I("a|b\n1.0|2.0"), delim = "|")
Read a delimited file by chunks
Description
Read a delimited file by chunks
Usage
read_delim_chunked(
file,
callback,
delim = NULL,
chunk_size = 10000,
quote = "\"",
escape_backslash = FALSE,
escape_double = TRUE,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
comment = "",
trim_ws = FALSE,
skip = 0,
guess_max = chunk_size,
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
read_csv_chunked(
file,
callback,
chunk_size = 10000,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
guess_max = chunk_size,
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
read_csv2_chunked(
file,
callback,
chunk_size = 10000,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
guess_max = chunk_size,
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
read_tsv_chunked(
file,
callback,
chunk_size = 10000,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
guess_max = chunk_size,
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
callback |
A callback function to call on each chunk |
delim |
Single character used to separate fields within a record. |
chunk_size |
The number of rows to include in each chunk |
quote |
Single character used to quote strings. |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
quoted_na |
|
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
show_col_types |
If |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
Details
The number of lines in file
can exceed the maximum integer value in R (~2 billion).
See Also
Other chunked:
callback
,
melt_delim_chunked()
,
read_lines_chunked()
Examples
# Cars with 3 gears
f <- function(x, pos) subset(x, gear == 3)
read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)
Read/write a complete file
Description
read_file()
reads a complete file into a single object: either a
character vector of length one, or a raw vector. write_file()
takes a
single string, or a raw vector, and writes it exactly as is. Raw vectors
are useful when dealing with binary data, or if you have text data with
unknown encoding.
Usage
read_file(file, locale = default_locale())
read_file_raw(file)
write_file(x, file, append = FALSE, path = deprecated())
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
x |
A single string, or a raw vector to write to disk. |
append |
If |
path |
Value
read_file
: A length 1 character vector.
read_lines_raw
: A raw vector.
Examples
read_file(file.path(R.home("doc"), "AUTHORS"))
read_file_raw(file.path(R.home("doc"), "AUTHORS"))
tmp <- tempfile()
x <- format_csv(mtcars[1:6, ])
write_file(x, tmp)
identical(x, read_file(tmp))
read_lines(I(x))
Read a fixed width file into a tibble
Description
A fixed width file can be a very compact representation of numeric data. It's also very fast to parse, because every field is in the same place in every line. Unfortunately, it's painful to parse because you need to describe the length of every field. Readr aims to make it as easy as possible by providing a number of different ways to describe the field structure.
-
fwf_empty()
- Guesses based on the positions of empty columns. -
fwf_widths()
- Supply the widths of the columns. -
fwf_positions()
- Supply paired vectors of start and end positions. -
fwf_cols()
- Supply named arguments of paired start and end positions or column widths.
Usage
read_fwf(
file,
col_positions = fwf_empty(file, skip, n = guess_max),
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(n_max, 1000),
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
lazy = should_read_lazy(),
skip_empty_rows = TRUE
)
fwf_empty(
file,
skip = 0,
skip_empty_rows = FALSE,
col_names = NULL,
comment = "",
n = 100L
)
fwf_widths(widths, col_names = NULL)
fwf_positions(start, end = NULL, col_names = NULL)
fwf_cols(...)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
col_positions |
Column positions, as created by |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
The name of a column in which to store the file path. This is
useful when reading multiple input files and there is data in the file
paths, such as the data collection date. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
show_col_types |
If |
lazy |
Read values lazily? By default, this is Learn more in |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
col_names |
Either NULL, or a character vector column names. |
n |
Number of lines the tokenizer will read to determine file structure. By default it is set to 100. |
widths |
Width of each field. Use NA as width of last field when reading a ragged fwf file. |
start , end |
Starting and ending (inclusive) positions of each field. Use NA as last end field when reading a ragged fwf file. |
... |
If the first element is a data frame,
then it must have all numeric columns and either one or two rows.
The column names are the variable names. The column values are the
variable widths if a length one vector, and if length two, variable start and end
positions. The elements of |
Second edition changes
Comments are no longer looked for anywhere in the file. They are now only ignored at the start of a line.
See Also
read_table()
to read fixed width files where each
column is separated by whitespace.
Examples
fwf_sample <- readr_example("fwf-sample.txt")
writeLines(read_lines(fwf_sample))
# You can specify column positions in several ways:
# 1. Guess based on position of empty columns
read_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))
# 2. A vector of field widths
read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))
# 3. Paired vectors of start and end positions
read_fwf(fwf_sample, fwf_positions(c(1, 30), c(20, 42), c("name", "ssn")))
# 4. Named arguments with start and end positions
read_fwf(fwf_sample, fwf_cols(name = c(1, 20), ssn = c(30, 42)))
# 5. Named arguments with column widths
read_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))
Read/write lines to/from a file
Description
read_lines()
reads up to n_max
lines from a file. New lines are
not included in the output. read_lines_raw()
produces a list of raw
vectors, and is useful for handling data with unknown encoding.
write_lines()
takes a character vector or list of raw vectors, appending a
new line after each entry.
Usage
read_lines(
file,
skip = 0,
skip_empty_rows = FALSE,
n_max = Inf,
locale = default_locale(),
na = character(),
lazy = should_read_lazy(),
num_threads = readr_threads(),
progress = show_progress()
)
read_lines_raw(
file,
skip = 0,
n_max = -1L,
num_threads = readr_threads(),
progress = show_progress()
)
write_lines(
x,
file,
sep = "\n",
na = "NA",
append = FALSE,
num_threads = readr_threads(),
path = deprecated()
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
skip |
Number of lines to skip before reading data. |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
n_max |
Number of lines to read. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
lazy |
Read values lazily? By default, this is Learn more in |
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
x |
A character vector or list of raw vectors to write to disk. |
sep |
The line separator. Defaults to |
append |
If |
path |
Value
read_lines()
: A character vector with one element for each line.
read_lines_raw()
: A list containing a raw vector for each line.
write_lines()
returns x
, invisibly.
Examples
read_lines(file.path(R.home("doc"), "AUTHORS"), n_max = 10)
read_lines_raw(file.path(R.home("doc"), "AUTHORS"), n_max = 10)
tmp <- tempfile()
write_lines(rownames(mtcars), tmp)
read_lines(tmp, lazy = FALSE)
read_file(tmp) # note trailing \n
write_lines(airquality$Ozone, tmp, na = "-1")
read_lines(tmp)
Read lines from a file or string by chunk.
Description
Read lines from a file or string by chunk.
Usage
read_lines_chunked(
file,
callback,
chunk_size = 10000,
skip = 0,
locale = default_locale(),
na = character(),
progress = show_progress()
)
read_lines_raw_chunked(
file,
callback,
chunk_size = 10000,
skip = 0,
progress = show_progress()
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
callback |
A callback function to call on each chunk |
chunk_size |
The number of rows to include in each chunk |
skip |
Number of lines to skip before reading data. |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
See Also
Other chunked:
callback
,
melt_delim_chunked()
,
read_delim_chunked()
Read common/combined log file into a tibble
Description
This is a fairly standard format for log files - it uses both quotes and square brackets for quoting, and there may be literal quotes embedded in a quoted string. The dash, "-", is used for missing values.
Usage
read_log(
file,
col_names = FALSE,
col_types = NULL,
trim_ws = TRUE,
skip = 0,
n_max = Inf,
show_col_types = should_show_types(),
progress = show_progress()
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
n_max |
Maximum number of lines to read. |
show_col_types |
If |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
Examples
read_log(readr_example("example.log"))
Read/write RDS files.
Description
Consistent wrapper around saveRDS()
and readRDS()
.
write_rds()
does not compress by default as space is generally cheaper
than time.
Usage
read_rds(file, refhook = NULL)
write_rds(
x,
file,
compress = c("none", "gz", "bz2", "xz"),
version = 2,
refhook = NULL,
text = FALSE,
path = deprecated(),
...
)
Arguments
file |
The file path to read from/write to. |
refhook |
A function to handle reference objects. |
x |
R object to write to serialise. |
compress |
Compression method to use: "none", "gz" ,"bz", or "xz". |
version |
Serialization format version to be used. The default value is 2
as it's compatible for R versions prior to 3.5.0. See |
text |
If |
path |
|
... |
Additional arguments to connection function. For example, control
the space-time trade-off of different compression methods with
|
Value
write_rds()
returns x
, invisibly.
Examples
temp <- tempfile()
write_rds(mtcars, temp)
read_rds(temp)
## Not run:
write_rds(mtcars, "compressed_mtc.rds", "xz", compression = 9L)
## End(Not run)
Read whitespace-separated columns into a tibble
Description
read_table()
is designed to read the type of textual
data where each column is separated by one (or more) columns of space.
read_table()
is like read.table()
, it allows any number of whitespace
characters between columns, and the lines can be of different lengths.
spec_table()
returns the column specifications rather than a data frame.
Usage
read_table(
file,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = "NA",
skip = 0,
n_max = Inf,
guess_max = min(n_max, 1000),
progress = show_progress(),
comment = "",
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
show_col_types |
If |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
See Also
read_fwf()
to read fixed width files where each column
is not separated by whitespace. read_fwf()
is also useful for reading
tabular data with non-standard formatting.
Examples
ws <- readr_example("whitespace-sample.txt")
writeLines(read_lines(ws))
read_table(ws)
Read whitespace-separated columns into a tibble
Description
This function is deprecated because we renamed it to read_table()
and
removed the old read_table
function, which was too strict for most cases
and was analogous to just using read_fwf()
.
Usage
read_table2(
file,
col_names = TRUE,
col_types = NULL,
locale = default_locale(),
na = "NA",
skip = 0,
n_max = Inf,
guess_max = min(n_max, 1000),
progress = show_progress(),
comment = "",
skip_empty_rows = TRUE
)
Get path to readr example
Description
readr comes bundled with a number of sample files in its inst/extdata
directory. This function make them easy to access
Usage
readr_example(file = NULL)
Arguments
file |
Name of file. If |
Examples
readr_example()
readr_example("challenge.csv")
Determine how many threads readr should use when processing
Description
The number of threads returned can be set by
The global option
readr.num_threads
The environment variable
VROOM_THREADS
The value of
parallel::detectCores()
Usage
readr_threads()
Determine whether to read a file lazily
Description
This function consults the option readr.read_lazy
to figure out whether to
do lazy reading or not. If the option is unset, the default is FALSE
,
meaning readr will read files eagerly, not lazily. If you want to use this
option to express a preference for lazy reading, do this:
options(readr.read_lazy = TRUE)
Typically, one would use the option to control lazy reading at the session,
file, or user level. The lazy
argument of functions like read_csv()
can
be used to control laziness in an individual call.
Usage
should_read_lazy()
See Also
The blog post "Eager vs lazy reading in readr 2.1.0" explains the benefits (and downsides) of lazy reading.
Determine whether column types should be shown
Description
Wrapper around getOption("readr.show_col_types")
that implements some fall
back logic if the option is unset. This returns:
-
TRUE
if the option is set toTRUE
-
FALSE
if the option is set toFALSE
-
FALSE
if the option is unset and we appear to be running tests -
NULL
otherwise, in which case the caller determines whether to show column types based on context, e.g. whethershow_col_types
or actualcol_types
were explicitly specified
Usage
should_show_types()
Determine whether progress bars should be shown
Description
By default, readr shows progress bars. However, progress reporting is suppressed if any of the following conditions hold:
The bar is explicitly disabled by setting
options(readr.show_progress = FALSE)
.The code is run in a non-interactive session, as determined by
rlang::is_interactive()
.The code is run in an RStudio notebook chunk, as determined by
getOption("rstudio.notebook.executing")
.
Usage
show_progress()
Generate a column specification
Description
When printed, only the first 20 columns are printed by default. To override,
set options(readr.num_columns)
can be used to modify this (a value of 0
turns off printing).
Usage
spec_delim(
file,
delim = NULL,
quote = "\"",
escape_backslash = FALSE,
escape_double = TRUE,
col_names = TRUE,
col_types = list(),
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
comment = "",
trim_ws = FALSE,
skip = 0,
n_max = 0,
guess_max = 1000,
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
spec_csv(
file,
col_names = TRUE,
col_types = list(),
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = 0,
guess_max = 1000,
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
spec_csv2(
file,
col_names = TRUE,
col_types = list(),
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = 0,
guess_max = 1000,
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
spec_tsv(
file,
col_names = TRUE,
col_types = list(),
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = 0,
guess_max = 1000,
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
spec_table(
file,
col_names = TRUE,
col_types = list(),
locale = default_locale(),
na = "NA",
skip = 0,
n_max = 0,
guess_max = 1000,
progress = show_progress(),
comment = "",
show_col_types = should_show_types(),
skip_empty_rows = TRUE
)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
delim |
Single character used to separate fields within a record. |
quote |
Single character used to quote strings. |
escape_backslash |
Does the file use backslashes to escape special
characters? This is more general than |
escape_double |
Does the file escape quotes by doubling them?
i.e. If this option is |
col_names |
Either If If Missing ( |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
The name of a column in which to store the file path. This is
useful when reading multiple input files and there is data in the file
paths, such as the data collection date. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
quoted_na |
|
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. If |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
show_col_types |
If |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
lazy |
Read values lazily? By default, this is Learn more in |
Value
The col_spec
generated for the file.
Examples
# Input sources -------------------------------------------------------------
# Retrieve specs from a path
spec_csv(system.file("extdata/mtcars.csv", package = "readr"))
spec_csv(system.file("extdata/mtcars.csv.zip", package = "readr"))
# Or directly from a string (must contain a newline)
spec_csv(I("x,y\n1,2\n3,4"))
# Column types --------------------------------------------------------------
# By default, readr guesses the columns types, looking at 1000 rows
# throughout the file.
# You can specify the number of rows used with guess_max.
spec_csv(system.file("extdata/mtcars.csv", package = "readr"), guess_max = 20)
Tokenize a file/string.
Description
Turns input into a character vector. Usually the tokenization is done purely in C++, and never exposed to R (because that requires a copy). This function is useful for testing, or when a file doesn't parse correctly and you want to see the underlying tokens.
Usage
tokenize(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L)
Arguments
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with Using a value of |
tokenizer |
A tokenizer specification. |
skip |
Number of lines to skip before reading data. |
n_max |
Optionally, maximum number of rows to tokenize. |
Examples
tokenize("1,2\n3,4,5\n\n6")
# Only tokenize first two lines
tokenize("1,2\n3,4,5\n\n6", n = 2)
Tokenizers.
Description
Explicitly create tokenizer objects. Usually you will not call these
function, but will instead use one of the use friendly wrappers like
read_csv()
.
Usage
tokenizer_delim(
delim,
quote = "\"",
na = "NA",
quoted_na = TRUE,
comment = "",
trim_ws = TRUE,
escape_double = TRUE,
escape_backslash = FALSE,
skip_empty_rows = TRUE
)
tokenizer_csv(
na = "NA",
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_tsv(
na = "NA",
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_line(na = character(), skip_empty_rows = TRUE)
tokenizer_log(trim_ws)
tokenizer_fwf(
begin,
end,
na = "NA",
comment = "",
trim_ws = TRUE,
skip_empty_rows = TRUE
)
tokenizer_ws(na = "NA", comment = "", skip_empty_rows = TRUE)
Arguments
Examples
tokenizer_csv()
Re-convert character columns in existing data frame
Description
This is useful if you need to do some manual munging - you can read the
columns in as character, clean it up with (e.g.) regular expressions and
then let readr take another stab at parsing it. The name is a homage to
the base utils::type.convert()
.
Usage
type_convert(
df,
col_types = NULL,
na = c("", "NA"),
trim_ws = TRUE,
locale = default_locale(),
guess_integer = FALSE
)
Arguments
df |
A data frame. |
col_types |
One of If |
na |
Character vector of strings to interpret as missing values. Set this
option to |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
guess_integer |
If |
Note
type_convert()
removes a 'spec' attribute,
because it likely modifies the column data types.
(see spec()
for more information about column specifications).
Examples
df <- data.frame(
x = as.character(runif(10)),
y = as.character(sample(10)),
stringsAsFactors = FALSE
)
str(df)
str(type_convert(df))
df <- data.frame(x = c("NA", "10"), stringsAsFactors = FALSE)
str(type_convert(df))
# Type convert can be used to infer types from an entire dataset
# first read the data as character
data <- read_csv(readr_example("mtcars.csv"),
col_types = list(.default = col_character())
)
str(data)
# Then convert it with type_convert
type_convert(data)
Temporarily change the active readr edition
Description
with_edition()
allows you to change the active edition of readr for a given
block of code. local_edition()
allows you to change the active edition of
readr until the end of the current function or file.
Usage
with_edition(edition, code)
local_edition(edition, env = parent.frame())
Arguments
edition |
Should be a single integer, such as |
code |
Code to run with the changed edition. |
env |
Environment that controls scope of changes. For expert use only. |
Examples
with_edition(1, edition_get())
with_edition(2, edition_get())
# readr 1e and 2e behave differently when input rows have different number
# number of fields
with_edition(1, read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")))
with_edition(2, read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")))
# local_edition() applies in a specific scope, for example, inside a function
read_csv_1e <- function(...) {
local_edition(1)
read_csv(...)
}
read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")) # 2e behaviour
read_csv_1e("1,2\n3,4,5", col_names = c("X", "Y", "Z")) # 1e behaviour
read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")) # 2e behaviour
Write a data frame to a delimited file
Description
The write_*()
family of functions are an improvement to analogous function such
as write.csv()
because they are approximately twice as fast. Unlike write.csv()
,
these functions do not include row names as a column in the written file.
A generic function, output_column()
, is applied to each variable
to coerce columns to suitable output.
Usage
write_delim(
x,
file,
delim = " ",
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
write_csv(
x,
file,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
write_csv2(
x,
file,
na = "NA",
append = FALSE,
col_names = !append,
quote = c("needed", "all", "none"),
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
write_excel_csv(
x,
file,
na = "NA",
append = FALSE,
col_names = !append,
delim = ",",
quote = "all",
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
write_excel_csv2(
x,
file,
na = "NA",
append = FALSE,
col_names = !append,
delim = ";",
quote = "all",
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
write_tsv(
x,
file,
na = "NA",
append = FALSE,
col_names = !append,
quote = "none",
escape = c("double", "backslash", "none"),
eol = "\n",
num_threads = readr_threads(),
progress = show_progress(),
path = deprecated(),
quote_escape = deprecated()
)
Arguments
x |
A data frame or tibble to write to disk. |
file |
File or connection to write to. |
delim |
Delimiter used to separate values. Defaults to |
na |
String used for missing values. Defaults to NA. Missing values
will never be quoted; strings with the same value as |
append |
If |
col_names |
If |
quote |
How to handle fields which contain characters that need to be quoted.
|
escape |
The type of escape to use when quotes are in the data.
|
eol |
The end of line character to use. Most commonly either |
num_threads |
Number of threads to use when reading and materializing vectors. If your data contains newlines within fields the parser will automatically be forced to use a single thread only. |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The display
is updated every 50,000 values and will only display if estimated reading
time is 5 seconds or more. The automatic progress bar can be disabled by
setting option |
path |
|
quote_escape |
Value
write_*()
returns the input x
invisibly.
Output
Factors are coerced to character. Doubles are formatted to a decimal string
using the grisu3 algorithm. POSIXct
values are formatted as ISO8601 with a
UTC timezone Note: POSIXct
objects in local or non-UTC timezones will be
converted to UTC time before writing.
All columns are encoded as UTF-8. write_excel_csv()
and write_excel_csv2()
also include a
UTF-8 Byte order mark
which indicates to Excel the csv is UTF-8 encoded.
write_excel_csv2()
and write_csv2
were created to allow users with
different locale settings to save .csv files using their default settings
(e.g. ;
as the column separator and ,
as the decimal separator).
This is common in some European countries.
Values are only quoted if they contain a comma, quote or newline.
The write_*()
functions will automatically compress outputs if an appropriate extension is given.
Three extensions are currently supported: .gz
for gzip compression, .bz2
for bzip2 compression and
.xz
for lzma compression. See the examples for more information.
References
Florian Loitsch, Printing Floating-Point Numbers Quickly and Accurately with Integers, PLDI '10, http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
Examples
# If only a file name is specified, write_()* will write
# the file to the current working directory.
write_csv(mtcars, "mtcars.csv")
write_tsv(mtcars, "mtcars.tsv")
# If you add an extension to the file name, write_()* will
# automatically compress the output.
write_tsv(mtcars, "mtcars.tsv.gz")
write_tsv(mtcars, "mtcars.tsv.bz2")
write_tsv(mtcars, "mtcars.tsv.xz")