Version: | 0.2.5 |
Date: | 2022-12-06 |
Title: | Read Large Text Files |
Description: | Read large text files by splitting them in smaller files. Package 'bigreadr' also provides some convenient wrappers around fread() and fwrite() from package 'data.table'. |
License: | GPL-3 |
Encoding: | UTF-8 |
ByteCompile: | true |
RoxygenNote: | 6.1.0 |
Imports: | bigassertr (≥ 0.1.1), data.table, parallelly, Rcpp, utils |
Suggests: | spelling, testthat, covr, RSQLite |
LinkingTo: | Rcpp |
Language: | en-US |
URL: | https://github.com/privefl/bigreadr |
BugReports: | https://github.com/privefl/bigreadr/issues |
NeedsCompilation: | yes |
Packaged: | 2022-12-06 14:52:49 UTC; au639593 |
Author: | Florian Privé [aut, cre] |
Maintainer: | Florian Privé <florian.prive.21@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-12-06 15:50:02 UTC |
bigreadr: Read Large Text Files
Description
Read large text files by splitting them in smaller files. Package 'bigreadr' also provides some convenient wrappers around fread() and fwrite() from package 'data.table'.
Author(s)
Maintainer: Florian Privé florian.prive.21@gmail.com
See Also
Useful links:
Read large text file
Description
Read large text file by splitting lines.
Usage
big_fread1(file, every_nlines, .transform = identity,
.combine = rbind_df, skip = 0, ..., print_timings = TRUE)
Arguments
file |
Path to file that you want to read. |
every_nlines |
Maximum number of lines in new file parts. |
.transform |
Function to transform each data frame corresponding to each
part of the |
.combine |
Function to combine results (list of data frames). |
skip |
Number of lines to skip at the beginning of |
... |
Other arguments to be passed to data.table::fread,
excepted |
print_timings |
Whether to print timings? Default is |
Value
A data.frame
by default; a data.table
when data.table = TRUE
.
Read large text file
Description
Read large text file by splitting columns.
Usage
big_fread2(file, nb_parts = NULL, .transform = identity,
.combine = cbind_df, skip = 0, select = NULL, progress = FALSE,
part_size = 500 * 1024^2, ...)
Arguments
file |
Path to file that you want to read. |
nb_parts |
Number of parts in which to split reading (and transforming).
Parts are referring to blocks of selected columns.
Default uses |
.transform |
Function to transform each data frame corresponding to each block of selected columns. Default doesn't change anything. |
.combine |
Function to combine results (list of data frames). |
skip |
Number of lines to skip at the beginning of |
select |
Indices of columns to keep (sorted). Default keeps them all. |
progress |
Show progress? Default is |
part_size |
Size of the parts if |
... |
Other arguments to be passed to data.table::fread,
excepted |
Value
The outputs of fread2
+ .transform
, combined with .combine
.
Merge data frames
Description
Merge data frames
Usage
cbind_df(list_df)
Arguments
list_df |
A list of multiple data frames with the same observations in the same order. |
Value
One merged data frame.
Examples
str(iris)
str(cbind_df(list(iris, iris)))
Read text file(s)
Description
Read text file(s)
Usage
fread2(input, ..., data.table = FALSE,
nThread = getOption("bigreadr.nThread"))
Arguments
input |
Path to the file(s) that you want to read from. This can also be a command, some text or an URL. If a vector of inputs is provided, resulting data frames are appended. |
... |
Other arguments to be passed to data.table::fread. |
data.table |
Whether to return a |
nThread |
Number of threads to use. Default uses all threads minus one. |
Value
A data.frame
by default; a data.table
when data.table = TRUE
.
Examples
tmp <- fwrite2(iris)
iris2 <- fread2(tmp)
all.equal(iris2, iris) ## fread doesn't use factors
Write a data frame to a text file
Description
Write a data frame to a text file
Usage
fwrite2(x, file = tempfile(), ..., quote = FALSE,
nThread = getOption("bigreadr.nThread"))
Arguments
x |
Data frame to write. |
file |
Path to the file that you want to write to.
Defaults uses |
... |
Other arguments to be passed to data.table::fwrite. |
quote |
Whether to quote strings (default is |
nThread |
Number of threads to use. Default uses all threads minus one. |
Value
Input parameter file
, invisibly.
Examples
tmp <- fwrite2(iris)
iris2 <- fread2(tmp)
all.equal(iris2, iris) ## fread doesn't use factors
Number of lines
Description
Get the number of lines of a file.
Usage
nlines(file)
Arguments
file |
Path of the file. |
Value
The number of lines as one integer.
Examples
tmp <- fwrite2(iris)
nlines(tmp)
Merge data frames
Description
Merge data frames
Usage
rbind_df(list_df)
Arguments
list_df |
A list of multiple data frames with the same variables in the same order. |
Value
One merged data frame with the names of the first input data frame.
Examples
str(iris)
str(rbind_df(list(iris, iris)))
Split file every nlines
Description
Split file every nlines
Get files from splitting.
Usage
split_file(file, every_nlines, prefix_out = tempfile(),
repeat_header = FALSE)
get_split_files(split_file_out)
Arguments
file |
Path to file that you want to split. |
every_nlines |
Maximum number of lines in new file parts. |
prefix_out |
Prefix for created files. Default uses |
repeat_header |
Whether to repeat the header row in each file.
Default is |
split_file_out |
Output of split_file. |
Value
A list with
-
name_in
: input parameterfile
, -
prefix_out
: input parameter 'prefix_out“, -
nfiles
: Number of files (parts) created, -
nlines_part
: input parameterevery_nlines
, -
nlines_all
: total number of lines offile
.
Vector of file paths created by split_file.
Examples
tmp <- fwrite2(iris)
infos <- split_file(tmp, 100)
str(infos)
get_split_files(infos)