Type: | Package |
Title: | A 'dplyr' Back End for Databases |
Version: | 2.5.0 |
Description: | A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. |
License: | MIT + file LICENSE |
URL: | https://dbplyr.tidyverse.org/, https://github.com/tidyverse/dbplyr |
BugReports: | https://github.com/tidyverse/dbplyr/issues |
Depends: | R (≥ 3.6) |
Imports: | blob (≥ 1.2.0), cli (≥ 3.6.1), DBI (≥ 1.1.3), dplyr (≥ 1.1.2), glue (≥ 1.6.2), lifecycle (≥ 1.0.3), magrittr, methods, pillar (≥ 1.9.0), purrr (≥ 1.0.1), R6 (≥ 2.2.2), rlang (≥ 1.1.1), tibble (≥ 3.2.1), tidyr (≥ 1.3.0), tidyselect (≥ 1.2.1), utils, vctrs (≥ 0.6.3), withr (≥ 2.5.0) |
Suggests: | bit64, covr, knitr, Lahman, nycflights13, odbc (≥ 1.4.2), RMariaDB (≥ 1.2.2), rmarkdown, RPostgres (≥ 1.4.5), RPostgreSQL, RSQLite (≥ 2.3.1), testthat (≥ 3.1.10) |
VignetteBuilder: | knitr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | TRUE |
Encoding: | UTF-8 |
Language: | en-gb |
RoxygenNote: | 7.3.1 |
Collate: | 'db-sql.R' 'utils-check.R' 'import-standalone-types-check.R' 'import-standalone-obj-type.R' 'utils.R' 'sql.R' 'escape.R' 'translate-sql-cut.R' 'translate-sql-quantile.R' 'translate-sql-string.R' 'translate-sql-paste.R' 'translate-sql-helpers.R' 'translate-sql-window.R' 'translate-sql-conditional.R' 'backend-.R' 'backend-access.R' 'backend-hana.R' 'backend-hive.R' 'backend-impala.R' 'verb-copy-to.R' 'backend-mssql.R' 'backend-mysql.R' 'backend-odbc.R' 'backend-oracle.R' 'backend-postgres.R' 'backend-postgres-old.R' 'backend-redshift.R' 'backend-snowflake.R' 'backend-spark-sql.R' 'backend-sqlite.R' 'backend-teradata.R' 'build-sql.R' 'data-cache.R' 'data-lahman.R' 'data-nycflights13.R' 'db-escape.R' 'db-io.R' 'db.R' 'dbplyr.R' 'explain.R' 'ident.R' 'import-standalone-s3-register.R' 'join-by-compat.R' 'join-cols-compat.R' 'lazy-join-query.R' 'lazy-ops.R' 'lazy-query.R' 'lazy-select-query.R' 'lazy-set-op-query.R' 'memdb.R' 'optimise-utils.R' 'pillar.R' 'progress.R' 'sql-build.R' 'query-join.R' 'query-select.R' 'query-semi-join.R' 'query-set-op.R' 'query.R' 'reexport.R' 'remote.R' 'rows.R' 'schema.R' 'simulate.R' 'sql-clause.R' 'sql-expr.R' 'src-sql.R' 'src_dbi.R' 'table-name.R' 'tbl-lazy.R' 'tbl-sql.R' 'test-frame.R' 'testthat.R' 'tidyeval-across.R' 'tidyeval.R' 'translate-sql.R' 'utils-format.R' 'verb-arrange.R' 'verb-compute.R' 'verb-count.R' 'verb-distinct.R' 'verb-do-query.R' 'verb-do.R' 'verb-expand.R' 'verb-fill.R' 'verb-filter.R' 'verb-group_by.R' 'verb-head.R' 'verb-joins.R' 'verb-mutate.R' 'verb-pivot-longer.R' 'verb-pivot-wider.R' 'verb-pull.R' 'verb-select.R' 'verb-set-ops.R' 'verb-slice.R' 'verb-summarise.R' 'verb-uncount.R' 'verb-window.R' 'zzz.R' |
NeedsCompilation: | no |
Packaged: | 2024-03-15 17:27:18 UTC; hadleywickham |
Author: | Hadley Wickham [aut, cre], Maximilian Girlich [aut], Edgar Ruiz [aut], Posit Software, PBC [cph, fnd] |
Maintainer: | Hadley Wickham <hadley@posit.co> |
Repository: | CRAN |
Date/Publication: | 2024-03-19 17:00:05 UTC |
dbplyr: A 'dplyr' Back End for Databases
Description
A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.
Author(s)
Maintainer: Hadley Wickham hadley@posit.co
Authors:
Maximilian Girlich
Edgar Ruiz
Other contributors:
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidyverse/dbplyr/issues
Arrange rows by column values
Description
This is an method for the dplyr arrange()
generic. It generates
the ORDER BY
clause of the SQL query. It also affects the
window_order()
of windowed expressions in mutate.tbl_lazy()
.
Note that ORDER BY
clauses can not generally appear in subqueries, which
means that you should arrange()
as late as possible in your pipelines.
Usage
## S3 method for class 'tbl_lazy'
arrange(.data, ..., .by_group = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by_group |
If |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Missing values
Unlike R, most databases sorts NA
(NULL
s) at the front. You can
can override this behaviour by explicitly sorting on is.na(x)
.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db %>% arrange(a) %>% show_query()
# Note that NAs are sorted first
db %>% arrange(b)
# override by sorting on is.na() first
db %>% arrange(is.na(b), b)
Backend: MS Access
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
-
SELECT
usesTOP
, notLIMIT
Non-standard types and mathematical functions
String concatenation uses
&
No
ANALYZE
equivalent-
TRUE
andFALSE
converted to 1 and 0
Use simulate_access()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_access()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(x = 1, y = 2, z = "a", con = simulate_access())
lf %>% head()
lf %>% mutate(y = as.numeric(y), z = sqrt(x^2 + 10))
lf %>% mutate(a = paste0(z, " times"))
Backend: SAP HANA
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
Temporary tables get
#
prefix and useLOCAL TEMPORARY COLUMN
.No table analysis performed in
copy_to()
.-
paste()
uses||
Note that you can't create new boolean columns from logical expressions; you need to wrap with explicit
ifelse
:ifelse(x > y, TRUE, FALSE)
.
Use simulate_hana()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_hana()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_hana())
lf %>% transmute(x = paste0(d, " times"))
Backend: Hive
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are a scattering of custom translations provided by users.
Use simulate_hive()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_hive()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = simulate_hive())
lf %>% transmute(x = cot(b))
lf %>% transmute(x = bitwShiftL(c, 1L))
lf %>% transmute(x = str_replace_all(c, "a", "b"))
lf %>% summarise(x = median(d, na.rm = TRUE))
lf %>% summarise(x = var(c, na.rm = TRUE))
Backend: Impala
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are a scattering of custom translations provided by users, mostly focussed
on bitwise operations.
Use simulate_impala()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_impala()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_impala())
lf %>% transmute(X = bitwNot(bitwOr(b, c)))
Backend: SQL server
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
-
SELECT
usesTOP
notLIMIT
Automatically prefixes
#
to create temporary tables. Add the prefix yourself to avoid the message.String basics:
paste()
,substr()
,nchar()
Custom types for
as.*
functionsLubridate extraction functions,
year()
,month()
,day()
etcSemi-automated bit <-> boolean translation (see below)
Use simulate_mssql()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_mssql(version = "15.0")
Arguments
version |
Version of MS SQL to simulate. Currently only, difference is
that 15.0 and above will use |
Bit vs boolean
SQL server uses two incompatible types to represent TRUE
and FALSE
values:
The
BOOLEAN
type is the result of logical comparisons (e.g.x > y
) and can be usedWHERE
but not to create new columns inSELECT
. https://docs.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sqlThe
BIT
type is a special type of numeric column used to storeTRUE
andFALSE
values, but can't be used inWHERE
clauses. https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-ver15
dbplyr does its best to automatically create the correct type when needed, but can't do it 100% correctly because it does not have a full type inference system. This means that you many need to manually do conversions from time to time.
To convert from bit to boolean use
x == 1
To convert from boolean to bit use
as.logical(if(x, 0, 1))
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_mssql())
lf %>% head()
lf %>% transmute(x = paste(b, c, d))
# Can use boolean as is:
lf %>% filter(c > d)
# Need to convert from boolean to bit:
lf %>% transmute(x = c > d)
# Can use boolean as is:
lf %>% transmute(x = ifelse(c > d, "c", "d"))
Backend: MySQL/MariaDB
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
-
paste()
usesCONCAT_WS()
String translations for
str_detect()
,str_locate()
, andstr_replace_all()
Clear error message for unsupported full joins
Use simulate_mysql()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_mysql()
simulate_mariadb()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_mysql())
lf %>% transmute(x = paste0(d, " times"))
Backend: ODBC
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are minor translations for common data types.
Use simulate_odbc()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_odbc()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = simulate_odbc())
lf %>% transmute(x = as.numeric(b))
lf %>% transmute(x = as.integer(b))
lf %>% transmute(x = as.character(b))
Backend: Oracle
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
Use
FETCH FIRST
instead ofLIMIT
Custom types
-
paste()
uses||
Custom subquery generation (no
AS
)-
setdiff()
usesMINUS
instead ofEXCEPT
Note that versions of Oracle prior to 23c have limited supported for
TRUE
and FALSE
and you may need to use 1
and 0
instead.
See https://oracle-base.com/articles/23c/boolean-data-type-23c for
more details.
Use simulate_oracle()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_oracle()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_oracle())
lf %>% transmute(x = paste0(c, " times"))
lf %>% setdiff(lf)
Backend: PostgreSQL
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
Many stringr functions
lubridate date-time extraction functions
More standard statistical summaries
Use simulate_postgres()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_postgres()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_postgres())
lf %>% summarise(x = sd(b, na.rm = TRUE))
lf %>% summarise(y = cor(b, c), z = cov(b, c))
Backend: Redshift
Description
Base translations come from PostgreSQL backend. There are generally few differences, apart from string manipulation.
Use simulate_redshift()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_redshift()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_redshift())
lf %>% transmute(x = paste(c, " times"))
lf %>% transmute(x = substr(c, 2, 3))
lf %>% transmute(x = str_replace_all(c, "a", "z"))
Backend: Snowflake
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology.
Use simulate_snowflake()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_snowflake()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_snowflake())
lf %>% transmute(x = paste0(d, " times"))
Backend: Databricks Spark SQL
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are better translation of statistical aggregate functions
(e.g. var()
, median()
) and use of temporary views instead of temporary
tables when copying data.
Use simulate_spark_sql()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_spark_sql()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = simulate_spark_sql())
lf %>% summarise(x = median(d, na.rm = TRUE))
lf %>% summarise(x = var(c, na.rm = TRUE), .by = d)
lf %>% mutate(x = first(c))
lf %>% mutate(x = first(c), .by = d)
Backend: SQLite
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
Uses non-standard
LOG()
functionDate-time extraction functions from lubridate
Custom median translation
Right and full joins are simulated using left joins
Use simulate_sqlite()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_sqlite()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_sqlite())
lf %>% transmute(x = paste(c, " times"))
lf %>% transmute(x = log(b), y = log(b, base = 2))
Backend: Teradata
Description
See vignette("translation-function")
and vignette("translation-verb")
for
details of overall translation technology. Key differences for this backend
are:
Uses
TOP
instead ofLIMIT
Selection of user supplied translations
Use simulate_teradata()
with lazy_frame()
to see simulated SQL without
converting to live access database.
Usage
simulate_teradata()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_teradata())
lf %>% head()
Build a SQL string.
Description
This is a convenience function that should prevent sql injection attacks (which in the context of dplyr are most likely to be accidental not deliberate) by automatically escaping all expressions in the input, while treating bare strings as sql. This is unlikely to prevent any serious attack, but should make it unlikely that you produce invalid sql.
Usage
build_sql(..., .env = parent.frame(), con = sql_current_con())
Arguments
... |
input to convert to SQL. Use |
.env |
the environment in which to evaluate the arguments. Should not be needed in typical use. |
con |
database connection; used to select correct quoting characters. |
Details
This function should be used only when generating SELECT
clauses,
other high level queries, or for other syntax that has no R equivalent.
For individual function translations, prefer sql_expr()
.
Examples
con <- simulate_dbi()
build_sql("SELECT * FROM TABLE", con = con)
x <- "TABLE"
build_sql("SELECT * FROM ", x, con = con)
build_sql("SELECT * FROM ", ident(x), con = con)
build_sql("SELECT * FROM ", sql(x), con = con)
# http://xkcd.com/327/
name <- "Robert'); DROP TABLE Students;--"
build_sql("INSERT INTO Students (Name) VALUES (", name, ")", con = con)
Compute results of a query
Description
These are methods for the dplyr generics collapse()
, compute()
,
and collect()
. collapse()
creates a subquery, compute()
stores
the results in a remote table, and collect()
executes the query and
downloads the data into R.
Usage
## S3 method for class 'tbl_sql'
collapse(x, ...)
## S3 method for class 'tbl_sql'
compute(
x,
name = NULL,
temporary = TRUE,
unique_indexes = list(),
indexes = list(),
analyze = TRUE,
...,
cte = FALSE
)
## S3 method for class 'tbl_sql'
collect(x, ..., n = Inf, warn_incomplete = TRUE, cte = FALSE)
Arguments
x |
A lazy data frame backed by a database query. |
... |
other parameters passed to methods. |
name |
Table name in remote database. |
temporary |
Should the table be temporary ( |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
cte |
|
n |
Number of rows to fetch. Defaults to |
warn_incomplete |
Warn if |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db %>% filter(a <= 2) %>% collect()
Complete a SQL table with missing combinations of data
Description
Turns implicit missing values into explicit missing values. This is a method
for the tidyr::complete()
generic.
Usage
## S3 method for class 'tbl_lazy'
complete(data, ..., fill = list())
Arguments
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
fill |
A named list that for each variable supplies a single value to use instead of NA for missing combinations. |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
df <- memdb_frame(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df %>% tidyr::complete(group, nesting(item_id, item_name))
# You can also choose to fill in missing values
df %>% tidyr::complete(group, nesting(item_id, item_name), fill = list(value1 = 0))
Use a local data frame in a dbplyr query
Description
This is an alternative to copy_to()
that does not need write access and
is faster for small data.
Usage
copy_inline(con, df, types = NULL)
Arguments
con |
A database connection. |
df |
A local data frame. The data is written directly in the SQL query so it should be small. |
types |
A named character vector of SQL data types to use for the columns.
The data types are backend specific. For example for Postgres this could
be |
Details
It writes the data directly in the SQL query via the VALUES
clause.
Value
A tbl_lazy
.
See Also
copy_to()
to copy the data into a new database table.
Examples
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_inline(con, df)
copy_inline(con, df) %>% dplyr::show_query()
Copy a local data frame to a remote database
Description
This is an implementation of the dplyr copy_to()
generic and it mostly
a wrapper around DBI::dbWriteTable()
.
It is useful for copying small amounts of data to a database for examples, experiments, and joins. By default, it creates temporary tables which are only visible within the current connection to the database.
Usage
## S3 method for class 'src_sql'
copy_to(
dest,
df,
name = deparse(substitute(df)),
overwrite = FALSE,
types = NULL,
temporary = TRUE,
unique_indexes = NULL,
indexes = NULL,
analyze = TRUE,
...,
in_transaction = TRUE
)
Arguments
dest |
remote data source |
df |
A local data frame, a |
name |
Name of new remote table. Use a string to create the table
in the current catalog/schema. Use |
overwrite |
If |
types |
a character vector giving variable types to use for the columns. See https://www.sqlite.org/datatype3.html for available types. |
temporary |
if |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
... |
other parameters passed to methods. |
in_transaction |
Should the table creation be wrapped in a transaction? This typically makes things faster, but you may want to suppress if the database doesn't support transactions, or you're wrapping in a transaction higher up (and your database doesn't support nested transactions.) |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
See Also
copy_inline()
to use small data in an SQL query without actually
writing to a table.
Examples
library(dplyr, warn.conflicts = FALSE)
df <- data.frame(x = 1:5, y = letters[5:1])
db <- copy_to(src_memdb(), df)
db
df2 <- data.frame(y = c("a", "d"), fruit = c("apple", "date"))
# copy_to() is called automatically if you set copy = TRUE
# in the join functions
db %>% left_join(df2, copy = TRUE)
Count observations by group
Description
These are methods for the dplyr count()
and tally()
generics. They
wrap up group_by.tbl_lazy()
, summarise.tbl_lazy()
and, optionally,
arrange.tbl_lazy()
.
Usage
## S3 method for class 'tbl_lazy'
count(x, ..., wt = NULL, sort = FALSE, name = NULL)
## S3 method for class 'tbl_lazy'
add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL)
## S3 method for class 'tbl_lazy'
tally(x, wt = NULL, sort = FALSE, name = NULL)
Arguments
x |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
wt |
<
|
sort |
If |
name |
The name of the new column in the output. If omitted, it will default to |
.drop |
Not supported for lazy tables. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db %>% count(g) %>% show_query()
db %>% count(g, wt = x) %>% show_query()
db %>% count(g, wt = x, sort = TRUE) %>% show_query()
Database I/O generics
Description
These generics are responsible for getting data into and out of the database. They should be used a last resort - only use them when you can't make a backend work by providing methods for DBI generics, or for dbplyr's SQL generation generics. They tend to be most needed when a backend has special handling of temporary tables.
-
db_copy_to()
implementscopy_to.src_sql()
by callingdb_write_table()
(which callsDBI::dbWriteTable()
) to transfer the data, then optionally adds indexes (viasql_table_index()
) and analyses (viasql_table_analyze()
). -
db_compute()
implementscompute.tbl_sql()
by callingsql_query_save()
to create the table, then optionally adds indexes (viasql_table_index()
) and analyses (viasql_table_analyze()
). -
db_collect()
implementscollect.tbl_sql()
usingDBI::dbSendQuery()
andDBI::dbFetch()
. -
db_table_temporary()
is used for databases that have special naming schemes for temporary tables (e.g. SQL server and SAP HANA require temporary tables to start with#
)
Usage
db_copy_to(
con,
table,
values,
...,
overwrite = FALSE,
types = NULL,
temporary = TRUE,
unique_indexes = NULL,
indexes = NULL,
analyze = TRUE,
in_transaction = TRUE
)
db_compute(
con,
table,
sql,
...,
overwrite = FALSE,
temporary = TRUE,
unique_indexes = list(),
indexes = list(),
analyze = TRUE,
in_transaction = TRUE
)
db_collect(con, sql, n = -1, warn_incomplete = TRUE, ...)
db_table_temporary(con, table, temporary, ...)
See Also
Other generic:
db-sql
,
db_connection_describe()
,
sql_escape_logical()
Miscellaneous database generics
Description
-
db_connection_describe()
provides a short string describing the database connection, helping users tell which database a table comes from. It should be a single line, and ideally less than 60 characters wide.
Usage
db_connection_describe(con, ...)
sql_join_suffix(con, suffix, ...)
db_sql_render(con, sql, ..., cte = FALSE, sql_options = NULL)
db_col_types(con, table, call)
dbplyr_edition(con)
Details
-
dbplyr_edition()
declares which version of the dbplyr API you want. See below for more details. -
db_col_types()
returns the column types of a table.
dbplyr 2.0.0
dbplyr 2.0.0 renamed a number of generics so that they could be cleanly moved from dplyr to dbplyr. If you have an existing backend, you'll need to rename the following methods.
-
dplyr::db_desc()
->dbplyr::db_connection_describe()
(also note that the argument named changed fromx
tocon
).
See Also
Other generic:
db-sql
,
db_copy_to()
,
sql_escape_logical()
SQL escaping/quoting generics
Description
These generics translate individual values into SQL. The core
generics are DBI::dbQuoteIdentifier()
and DBI::dbQuoteString
for quoting identifiers and strings, but dbplyr needs additional
tools for inserting logical, date, date-time, and raw values into
queries.
Usage
sql_escape_logical(con, x)
sql_escape_date(con, x)
sql_escape_datetime(con, x)
sql_escape_raw(con, x)
See Also
Other generic:
db-sql
,
db_connection_describe()
,
db_copy_to()
Examples
con <- simulate_dbi()
sql_escape_logical(con, c(TRUE, FALSE, NA))
sql_escape_date(con, Sys.Date())
sql_escape_date(con, Sys.time())
sql_escape_raw(con, charToRaw("hi"))
SQL generation generics
Description
SQL translation:
-
sql_expr_matches(con, x, y)
generates an alternative tox = y
when a pair ofNULL
s should match. The default translation uses aCASE WHEN
as described in https://modern-sql.com/feature/is-distinct-from. -
sql_translation(con)
generates a SQL translation environment. Deprecated:
sql_random(con)
generates SQL to get a random number which can be used to select random rows inslice_sample()
. This is now replaced by adding a translation forrunif(n())
.-
supports_window_clause(con)
does the backend support named windows? -
db_supports_table_alias_with_as(con)
does the backend support usingAS
when using a table alias?
Tables:
-
sql_table_analyze(con, table)
generates SQL that "analyzes" the table, ensuring that the database has up-to-date statistics for use in the query planner. It called fromcopy_to()
whenanalyze = TRUE
. -
sql_table_index()
generates SQL for adding an index to table.
Query manipulation:
-
sql_query_explain(con, sql)
generates SQL that "explains" a query, i.e. generates a query plan describing what indexes etc that the database will use. -
sql_query_fields()
generates SQL for a 0-row result that is used to capture field names intbl_sql()
-
sql_query_save(con, sql)
generates SQL for saving a query into a (temporary) table. -
sql_query_wrap(con, from)
generates SQL for wrapping a query into a subquery.
Query indentation:
-
sql_indent_subquery(from, con, lvl)
helps indenting a subquery.
Query generation:
-
sql_query_select()
generates SQL for aSELECT
query -
sql_query_join()
generates SQL for joins -
sql_query_semi_join()
generates SQL for semi- and anti-joins -
sql_query_set_op()
generates SQL forUNION
,INTERSECT
, andEXCEPT
queries.
Query generation for manipulation:
-
sql_query_insert()
andsql_query_append()
generate SQL for anINSERT FROM
query. -
sql_query_update_from()
generates SQL for anUPDATE FROM
query. -
sql_query_upsert()
generates SQL for anUPSERT
query. -
sql_query_delete()
generates SQL for anDELETE FROM
query -
sql_returning_cols()
generates SQL for aRETURNING
clause
Usage
sql_expr_matches(con, x, y, ...)
sql_translation(con)
sql_random(con)
sql_table_analyze(con, table, ...)
sql_table_index(
con,
table,
columns,
name = NULL,
unique = FALSE,
...,
call = caller_env()
)
sql_query_explain(con, sql, ...)
sql_query_fields(con, sql, ...)
sql_query_save(con, sql, name, temporary = TRUE, ...)
sql_query_wrap(con, from, name = NULL, ..., lvl = 0)
sql_indent_subquery(from, con, lvl = 0)
sql_query_rows(con, sql, ...)
supports_window_clause(con)
db_supports_table_alias_with_as(con)
sql_query_select(
con,
select,
from,
where = NULL,
group_by = NULL,
having = NULL,
window = NULL,
order_by = NULL,
limit = NULL,
distinct = FALSE,
...,
subquery = FALSE,
lvl = 0
)
sql_query_join(
con,
x,
y,
select,
type = "inner",
by = NULL,
na_matches = FALSE,
...,
lvl = 0
)
sql_query_multi_join(con, x, joins, table_names, by_list, select, ..., lvl = 0)
sql_query_semi_join(con, x, y, anti, by, where, vars, ..., lvl = 0)
sql_query_set_op(con, x, y, method, ..., all = FALSE, lvl = 0)
sql_query_union(con, x, unions, ..., lvl = 0)
sql_returning_cols(con, cols, table, ...)
dbplyr 2.0.0
Many dplyr::db_*
generics have been replaced by dbplyr::sql_*
generics.
To update your backend, you'll need to extract the SQL generation out of your
existing code, and place it in a new method for a dbplyr sql_
generic.
-
dplyr::db_analyze()
is replaced bydbplyr::sql_table_analyze()
-
dplyr::db_explain()
is replaced bydbplyr::sql_query_explain()
-
dplyr::db_create_index()
is replaced bydbplyr::sql_table_index()
-
dplyr::db_query_fields()
is replaced bydbplyr::sql_query_fields()
-
dplyr::db_query_rows()
is no longer used; you can delete it -
dplyr::db_save_query()
is replaced bydbplyr::sql_query_save()
The query generating functions have also changed names. Their behaviour is unchanged, so you just need to rename the generic and import from dbplyr instead of dplyr.
-
dplyr::sql_select()
is replaced bydbplyr::sql_query_select()
-
dplyr::sql_join()
is replaced bydbplyr::sql_query_join()
-
dplyr::sql_semi_join()
is replaced bydbplyr::sql_query_semi_join()
-
dplyr::sql_set_op()
is replaced bydbplyr::sql_query_set_op()
-
dplyr::sql_subquery()
is replaced bydbplyr::sql_query_wrap()
Learn more in vignette("backend-2.0")
See Also
Other generic:
db_connection_describe()
,
db_copy_to()
,
sql_escape_logical()
"Uncount" a database table
Description
This is a method for the tidyr uncount()
generic. It uses a temporary
table, so your database user needs permissions to create one.
Usage
dbplyr_uncount(data, weights, .remove = TRUE, .id = NULL)
Arguments
data |
A lazy data frame backed by a database query. |
weights |
A vector of weights. Evaluated in the context of |
.remove |
If |
.id |
Supply a string to create a new variable which gives a unique identifier for each created row. |
Examples
df <- memdb_frame(x = c("a", "b"), n = c(1, 2))
dbplyr_uncount(df, n)
dbplyr_uncount(df, n, .id = "id")
# You can also use constants
dbplyr_uncount(df, 2)
# Or expressions
dbplyr_uncount(df, 2 / n)
Subset rows using their positions
Description
These are methods for the dplyr generics slice_min()
, slice_max()
, and
slice_sample()
. They are translated to SQL using filter()
and
window functions (ROWNUMBER
, MIN_RANK
, or CUME_DIST
depending on
arguments). slice()
, slice_head()
, and slice_tail()
are not supported
since database tables have no intrinsic order.
If data is grouped, the operation will be performed on each group so that
(e.g.) slice_min(db, x, n = 3)
will select the three rows with the smallest
value of x
in each group.
Usage
## S3 method for class 'tbl_lazy'
slice_min(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = TRUE
)
## S3 method for class 'tbl_lazy'
slice_max(
.data,
order_by,
...,
n,
by = NULL,
prop,
with_ties = TRUE,
na_rm = TRUE
)
## S3 method for class 'tbl_lazy'
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
order_by |
Variable or function of variables to order by. |
... |
Not used. |
n , prop |
Provide either If |
by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
weight_by , replace |
Not supported for database backends. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:3, y = c(1, 1, 2))
db %>% slice_min(x) %>% show_query()
db %>% slice_max(x) %>% show_query()
db %>% slice_sample() %>% show_query()
db %>% group_by(y) %>% slice_min(x) %>% show_query()
# By default, ties are includes so you may get more rows
# than you expect
db %>% slice_min(y, n = 1)
db %>% slice_min(y, n = 1, with_ties = FALSE)
# Non-integer group sizes are rounded down
db %>% slice_min(x, prop = 0.5)
Subset distinct/unique rows
Description
This is a method for the dplyr distinct()
generic. It adds the
DISTINCT
clause to the SQL query.
Usage
## S3 method for class 'tbl_lazy'
distinct(.data, ..., .keep_all = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.keep_all |
If |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = c(1, 1, 2, 2), y = c(1, 2, 1, 1))
db %>% distinct() %>% show_query()
db %>% distinct(x) %>% show_query()
Perform arbitrary computation on remote backend
Description
Perform arbitrary computation on remote backend
Usage
## S3 method for class 'tbl_sql'
do(.data, ..., .chunk_size = 10000L)
Arguments
.data |
a tbl |
... |
Expressions to apply to each group. If named, results will be
stored in a new column. If unnamed, must return a data frame. You can
use |
.chunk_size |
The size of each chunk to pull into R. If this number is too big, the process will be slow because R has to allocate and free a lot of memory. If it's too small, it will be slow, because of the overhead of talking to the database. |
Escape/quote a string.
Description
escape()
requires you to provide a database connection to control the
details of escaping. escape_ansi()
uses the SQL 92 ANSI standard.
Usage
escape(x, parens = NA, collapse = " ", con = NULL)
escape_ansi(x, parens = NA, collapse = "")
sql_vector(x, parens = NA, collapse = " ", con = NULL)
Arguments
x |
An object to escape. Existing sql vectors will be left as is,
character vectors are escaped with single quotes, numeric vectors have
trailing |
parens , collapse |
Controls behaviour when multiple values are supplied.
Default behaviour: lists are always wrapped in parens and separated by commas, identifiers are separated by commas and never wrapped, atomic vectors are separated by spaces and wrapped in parens if needed. |
con |
Database connection. |
Examples
# Doubles vs. integers
escape_ansi(1:5)
escape_ansi(c(1, 5.4))
# String vs known sql vs. sql identifier
escape_ansi("X")
escape_ansi(sql("X"))
escape_ansi(ident("X"))
# Escaping is idempotent
escape_ansi("X")
escape_ansi(escape_ansi("X"))
escape_ansi(escape_ansi(escape_ansi("X")))
Expand SQL tables to include all possible combinations of values
Description
This is a method for the tidyr::expand generics. It doesn't sort the
result explicitly, so the order might be different to what expand()
returns for data frames.
Usage
## S3 method for class 'tbl_lazy'
expand(data, ..., .name_repair = "check_unique")
Arguments
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
fruits <- memdb_frame(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2010, 2012),
size = c("XS", "S", "M", "S", "S", "M"),
weights = rnorm(6)
)
# All possible combinations ---------------------------------------
fruits %>% tidyr::expand(type)
fruits %>% tidyr::expand(type, size)
# Only combinations that already appear in the data ---------------
fruits %>% tidyr::expand(nesting(type, size))
Fill in missing values with previous or next value
Description
Fill in missing values with previous or next value
Usage
## S3 method for class 'tbl_lazy'
fill(.data, ..., .direction = c("down", "up", "updown", "downup"))
Arguments
.data |
A lazy data frame backed by a database query. |
... |
Columns to fill. |
.direction |
Direction in which to fill missing values. Currently
either "down" (the default) or "up". Note that "up" does not work when
|
Examples
squirrels <- tibble::tribble(
~group, ~name, ~role, ~n_squirrels, ~ n_squirrels2,
1, "Sam", "Observer", NA, 1,
1, "Mara", "Scorekeeper", 8, NA,
1, "Jesse", "Observer", NA, NA,
1, "Tom", "Observer", NA, 4,
2, "Mike", "Observer", NA, NA,
2, "Rachael", "Observer", NA, 6,
2, "Sydekea", "Scorekeeper", 14, NA,
2, "Gabriela", "Observer", NA, NA,
3, "Derrick", "Observer", NA, NA,
3, "Kara", "Scorekeeper", 9, 10,
3, "Emily", "Observer", NA, NA,
3, "Danielle", "Observer", NA, NA
)
squirrels$id <- 1:12
tbl_memdb(squirrels) %>%
window_order(id) %>%
tidyr::fill(
n_squirrels,
n_squirrels2,
)
Subset rows using column values
Description
This is a method for the dplyr filter()
generic. It generates the
WHERE
clause of the SQL query.
Usage
## S3 method for class 'tbl_lazy'
filter(.data, ..., .by = NULL, .preserve = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.preserve |
Not supported by this method. |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = c(2, NA, 5, NA, 10), y = 1:5)
db %>% filter(x < 5) %>% show_query()
db %>% filter(is.na(x)) %>% show_query()
Extract and check the RETURNING
rows
Description
get_returned_rows()
extracts the RETURNING
rows produced by
rows_insert()
, rows_append()
, rows_update()
, rows_upsert()
,
or rows_delete()
if these are called with the returning
argument.
An error is raised if this information is not available.
has_returned_rows()
checks if x
has stored RETURNING rows produced by
rows_insert()
, rows_append()
, rows_update()
, rows_upsert()
,
or rows_delete()
.
Usage
get_returned_rows(x)
has_returned_rows(x)
Arguments
x |
A lazy tbl. |
Value
For get_returned_rows()
, a tibble.
For has_returned_rows()
, a scalar logical.
Examples
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbExecute(con, "CREATE TABLE Info (
id INTEGER PRIMARY KEY AUTOINCREMENT,
number INTEGER
)")
info <- tbl(con, "Info")
rows1 <- copy_inline(con, data.frame(number = c(1, 5)))
rows_insert(info, rows1, conflict = "ignore", in_place = TRUE)
info
# If the table has an auto incrementing primary key, you can use
# the returning argument + `get_returned_rows()` its value
rows2 <- copy_inline(con, data.frame(number = c(13, 27)))
info <- rows_insert(
info,
rows2,
conflict = "ignore",
in_place = TRUE,
returning = id
)
info
get_returned_rows(info)
Group by one or more variables
Description
This is a method for the dplyr group_by()
generic. It is translated to
the GROUP BY
clause of the SQL query when used with
summarise()
and to the PARTITION BY
clause of
window functions when used with mutate()
.
Usage
## S3 method for class 'tbl_lazy'
group_by(.data, ..., .add = FALSE, add = deprecated(), .drop = TRUE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.add |
When This argument was previously called |
add |
Deprecated. Please use |
.drop |
Not supported by this method. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db %>%
group_by(g) %>%
summarise(n()) %>%
show_query()
db %>%
group_by(g) %>%
mutate(x2 = x / sum(x, na.rm = TRUE)) %>%
show_query()
Subset the first rows
Description
This is a method for the head()
generic. It is usually translated to the
LIMIT
clause of the SQL query. Because LIMIT
is not an official part of
the SQL specification, some database use other clauses like TOP
or
FETCH ROWS
.
Note that databases don't really have a sense of row order, so what "first"
means is subject to interpretation. Most databases will respect ordering
performed with arrange()
, but it's not guaranteed. tail()
is not
supported at all because the situation is even murkier for the "last" rows.
Usage
## S3 method for class 'tbl_lazy'
head(x, n = 6L, ...)
Arguments
x |
A lazy data frame backed by a database query. |
n |
Number of rows to return |
... |
Not used. |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:100)
db %>% head() %>% show_query()
# Pretend we have data in a SQL server database
db2 <- lazy_frame(x = 1:100, con = simulate_mssql())
db2 %>% head() %>% show_query()
Flag a character vector as SQL identifiers
Description
ident()
takes strings and turns them as database identifiers (e.g. table
or column names) quoting them using the identifer rules for your database.
ident_q()
does the same, but assumes the names have already been
quoted, preventing them from being quoted again.
These are generally for internal use only; if you need to supply an
table name that is qualified with schema or catalog, or has already been
quoted for some other reason, use I()
.
Usage
ident(...)
is.ident(x)
Arguments
... |
A character vector, or name-value pairs. |
x |
An object. |
Examples
# SQL92 quotes strings with '
escape_ansi("x")
# And identifiers with "
ident("x")
escape_ansi(ident("x"))
# You can supply multiple inputs
ident(a = "x", b = "y")
ident_q(a = "x", b = "y")
Declare a identifier as being pre-quoted.
Description
No longer needed; please use sql()
instead.
Usage
ident_q(...)
Refer to a table in another schema/catalog
Description
in_schema()
and in_catalog()
can be used to refer to tables outside of
the current catalog/schema. However, we now recommend using I()
as it's
typically less typing.
Usage
in_schema(schema, table)
in_catalog(catalog, schema, table)
Arguments
catalog , schema , table |
Names of catalog, schema, and table.
These will be automatically quoted; use |
Examples
# Previously:
in_schema("my_schema", "my_table")
in_catalog("my_catalog", "my_schema", "my_table")
in_schema(sql("my_schema"), sql("my_table"))
# Now
I("my_schema.my_table")
I("my_catalog.my_schema.my_table")
I("my_schema.my_table")
# Example using schemas with SQLite
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Add auxiliary schema
tmp <- tempfile()
DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux"))
library(dplyr, warn.conflicts = FALSE)
copy_to(con, iris, "df", temporary = FALSE)
copy_to(con, mtcars, I("aux.df"), temporary = FALSE)
con %>% tbl("df")
con %>% tbl(I("aux.df"))
SQL set operations
Description
These are methods for the dplyr generics dplyr::intersect()
,
dplyr::union()
, and dplyr::setdiff()
. They are translated to
INTERSECT
, UNION
, and EXCEPT
respectively.
Usage
## S3 method for class 'tbl_lazy'
intersect(x, y, copy = FALSE, ..., all = FALSE)
## S3 method for class 'tbl_lazy'
union(x, y, copy = FALSE, ..., all = FALSE)
## S3 method for class 'tbl_lazy'
union_all(x, y, copy = FALSE, ...)
## S3 method for class 'tbl_lazy'
setdiff(x, y, copy = FALSE, ..., all = FALSE)
Arguments
x , y |
A pair of lazy data frames backed by database queries. |
copy |
If This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it. |
... |
Not currently used; provided for future extensions. |
all |
If |
Table paths
Description
dbplyr standardises all the ways of referring to a table (i.e. a single
string, a string wrapped in I()
, a DBI::Id()
and the results of
in_schema()
and in_catalog()
) into a table "path" of the form
table
, schema.table
, or catalog.schema.path
. A table path is
always suitable for inlining into a query, so user input is quoted unless
it is wrapped in I()
.
This is primarily for internal usage, but you may need to work with it if you're implementing a backend, and you need to compute with the table path, not just pass it on unchanged to some other dbplyr function.
-
is_table_path()
returnsTRUE
if the object is atable_path
. -
as_table_path()
coerces known table identifiers to atable_path
. -
check_table_path()
throws an error if the object is not atable_path
. -
table_path_name()
returns the last component of the table path (i.e. the name of the table). -
table_path_components()
returns a list containing the components of each table path.
A table_path
object can technically be a vector of table paths, but
you will never see this in table paths constructed from user inputs.
Usage
is_table_path(x)
table_path_name(x, con)
table_path_components(x, con)
check_table_path(x, error_arg = caller_arg(x), error_call = caller_env())
as_table_path(x, con, error_arg = caller_arg(x), error_call = caller_env())
Join SQL tables
Description
These are methods for the dplyr join generics. They are translated to the following SQL queries:
-
inner_join(x, y)
:SELECT * FROM x JOIN y ON x.a = y.a
-
left_join(x, y)
:SELECT * FROM x LEFT JOIN y ON x.a = y.a
-
right_join(x, y)
:SELECT * FROM x RIGHT JOIN y ON x.a = y.a
-
full_join(x, y)
:SELECT * FROM x FULL JOIN y ON x.a = y.a
-
semi_join(x, y)
:SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
-
anti_join(x, y)
:SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
Usage
## S3 method for class 'tbl_lazy'
inner_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
left_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
right_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
full_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
cross_join(
x,
y,
...,
copy = FALSE,
suffix = c(".x", ".y"),
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
semi_join(
x,
y,
by = NULL,
copy = FALSE,
...,
na_matches = c("never", "na"),
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
anti_join(
x,
y,
by = NULL,
copy = FALSE,
...,
na_matches = c("never", "na"),
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
Arguments
x , y |
A pair of lazy data frames backed by database queries. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it. |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
keep |
Should the join keys from both
|
na_matches |
Should NA (NULL) values match one another?
The default, "never", is how databases usually work. |
multiple , unmatched |
Unsupported in database backends. As a workaround for multiple use a unique key and for unmatched a foreign key constraint. |
relationship |
Unsupported in database backends. |
sql_on |
A custom join predicate as an SQL expression.
Usually joins use column equality, but you can perform more complex
queries by supply |
auto_index |
if |
x_as , y_as |
Alias to use for |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
band_db <- tbl_memdb(dplyr::band_members)
instrument_db <- tbl_memdb(dplyr::band_instruments)
band_db %>% left_join(instrument_db) %>% show_query()
# Can join with local data frames by setting copy = TRUE
band_db %>%
left_join(dplyr::band_instruments, copy = TRUE)
# Unlike R, joins in SQL don't usually match NAs (NULLs)
db <- memdb_frame(x = c(1, 2, NA))
label <- memdb_frame(x = c(1, NA), label = c("one", "missing"))
db %>% left_join(label, by = "x")
# But you can activate R's usual behaviour with the na_matches argument
db %>% left_join(label, by = "x", na_matches = "na")
# By default, joins are equijoins, but you can use `sql_on` to
# express richer relationships
db1 <- memdb_frame(x = 1:5)
db2 <- memdb_frame(x = 1:3, y = letters[1:3])
db1 %>% left_join(db2) %>% show_query()
db1 %>% left_join(db2, sql_on = "LHS.x < RHS.x") %>% show_query()
Cache and retrieve an src_sqlite
of the Lahman baseball database.
Description
This creates an interesting database using data from the Lahman baseball data source, provided by Sean Lahman, and made easily available in R through the Lahman package by Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation for that package for documentation of the individual tables.
Usage
lahman_sqlite(path = NULL)
lahman_postgres(dbname = "lahman", host = "localhost", ...)
lahman_mysql(dbname = "lahman", ...)
copy_lahman(con, ...)
has_lahman(type, ...)
lahman_srcs(..., quiet = NULL)
Arguments
... |
Other arguments passed to |
type |
src type. |
quiet |
if |
Examples
# Connect to a local sqlite database, if already created
library(dplyr)
if (has_lahman("sqlite")) {
lahman_sqlite()
batting <- tbl(lahman_sqlite(), "Batting")
batting
}
# Connect to a local postgres database with lahman database, if available
if (has_lahman("postgres")) {
lahman_postgres()
batting <- tbl(lahman_postgres(), "Batting")
}
Build and render SQL from a sequence of lazy operations
Description
sql_build()
creates a select_query
S3 object, that is rendered
to a SQL string by sql_render()
. The output from sql_build()
is
designed to be easy to test, as it's database agnostic, and has
a hierarchical structure. Outside of testing, however, you should
always call sql_render()
.
Usage
lazy_multi_join_query(
x,
joins,
table_names,
vars,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
lazy_rf_join_query(
x,
y,
type,
by,
table_names,
vars,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
lazy_semi_join_query(
x,
y,
vars,
anti,
by,
where,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
lazy_query(
query_type,
x,
...,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x)
)
lazy_select_query(
x,
select = NULL,
where = NULL,
group_by = NULL,
having = NULL,
order_by = NULL,
limit = NULL,
distinct = FALSE,
group_vars = NULL,
order_vars = NULL,
frame = NULL,
select_operation = c("select", "mutate", "summarise"),
message_summarise = NULL
)
lazy_set_op_query(x, y, type, all, call = caller_env())
lazy_union_query(x, unions, call = caller_env())
sql_build(op, con = NULL, ..., sql_options = NULL)
sql_render(
query,
con = NULL,
...,
sql_options = NULL,
subquery = FALSE,
lvl = 0
)
sql_optimise(x, con = NULL, ..., subquery = FALSE)
join_query(
x,
y,
select,
...,
type = "inner",
by = NULL,
suffix = c(".x", ".y"),
na_matches = FALSE
)
select_query(
from,
select = sql("*"),
where = character(),
group_by = character(),
having = character(),
window = character(),
order_by = character(),
limit = NULL,
distinct = FALSE,
from_alias = NULL
)
semi_join_query(
x,
y,
vars,
anti = FALSE,
by = NULL,
where = NULL,
na_matches = FALSE
)
set_op_query(x, y, type, all = FALSE)
union_query(x, unions)
Arguments
... |
Other arguments passed on to the methods. Not currently used. |
op |
A sequence of lazy operations |
con |
A database connection. The default |
sql_options |
SQL rendering options generated by |
subquery |
Is this SQL going to be used in a subquery? This is important because you can place a bare table name in a subquery and ORDER BY does not work in subqueries. |
Details
sql_build()
is generic over the lazy operations, lazy_ops,
and generates an S3 object that represents the query. sql_render()
takes a query object and then calls a function that is generic
over the database. For example, sql_build.op_mutate()
generates
a select_query
, and sql_render.select_query()
calls
sql_select()
, which has different methods for different databases.
The default methods should generate ANSI 92 SQL where possible, so you
backends only need to override the methods if the backend is not ANSI
compliant.
Lazy operations
Description
This set of S3 classes describe the action of dplyr verbs. These are currently used for SQL sources to separate the description of operations in R from their computation in SQL. This API is very new so is likely to evolve in the future.
Usage
lazy_base_query(x, vars, class = character(), ...)
op_grps(op)
op_vars(op)
op_sort(op)
op_frame(op)
Details
op_vars()
and op_grps()
compute the variables and groups from
a sequence of lazy operations. op_sort()
and op_frame()
tracks the
order and frame for use in window functions.
Create a database table in temporary in-memory database.
Description
memdb_frame()
works like tibble::tibble()
, but instead of creating a new
data frame in R, it creates a table in src_memdb()
.
Usage
memdb_frame(..., .name = unique_table_name())
tbl_memdb(df, name = deparse(substitute(df)))
src_memdb()
Arguments
... |
< Arguments are evaluated sequentially.
You can refer to previously created elements directly or using the .data
pronoun.
To refer explicitly to objects in the calling environment, use |
df |
Data frame to copy |
name , .name |
Name of table in database: defaults to a random name that's unlikely to conflict with an existing table. |
Examples
library(dplyr)
df <- memdb_frame(x = runif(100), y = runif(100))
df %>% arrange(x)
df %>% arrange(x) %>% show_query()
mtcars_db <- tbl_memdb(mtcars)
mtcars_db %>% group_by(cyl) %>% summarise(n = n()) %>% show_query()
Create, modify, and delete columns
Description
These are methods for the dplyr mutate()
and transmute()
generics.
They are translated to computed expressions in the SELECT
clause of
the SQL query.
Usage
## S3 method for class 'tbl_lazy'
mutate(
.data,
...,
.by = NULL,
.keep = c("all", "used", "unused", "none"),
.before = NULL,
.after = NULL
)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.keep |
Control which columns from
|
.before , .after |
< |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:5, y = 5:1)
db %>%
mutate(a = (x + y) / 2, b = sqrt(x^2L + y^2L)) %>%
show_query()
# dbplyr automatically creates subqueries as needed
db %>%
mutate(x1 = x + 1, x2 = x1 * 2) %>%
show_query()
Provides comma-separated string out of the parameters
Description
Provides comma-separated string out of the parameters
Usage
named_commas(x)
Database versions of the nycflights13 data
Description
These functions cache the data from the nycflights13
database in
a local database, for use in examples and vignettes. Indexes are created
to making joining tables on natural keys efficient.
Usage
nycflights13_sqlite(path = NULL)
nycflights13_postgres(dbname = "nycflights13", ...)
has_nycflights13(type = c("sqlite", "postgres"), ...)
copy_nycflights13(con, ...)
Arguments
path |
location of SQLite database file |
dbname , ... |
Arguments passed on to |
Partially evaluate an expression.
Description
This function partially evaluates an expression, using information from the tbl to determine whether names refer to local expressions or remote variables. This simplifies SQL translation because expressions don't need to carry around their environment - all relevant information is incorporated into the expression.
Usage
partial_eval(call, data, env = caller_env(), vars = deprecated(), error_call)
Arguments
call |
an unevaluated expression, as produced by |
data |
A lazy data frame backed by a database query. |
env |
environment in which to search for local values |
vars |
Symbol substitution
partial_eval()
needs to guess if you're referring to a variable on the
server (remote), or in the current environment (local). It's not possible to
do this 100% perfectly. partial_eval()
uses the following heuristic:
If the tbl variables are known, and the symbol matches a tbl variable, then remote.
If the symbol is defined locally, local.
Otherwise, remote.
You can override the guesses using local()
and remote()
to force
computation, or by using the .data
and .env
pronouns of tidy evaluation.
Examples
lf <- lazy_frame(year = 1980, id = 1)
partial_eval(quote(year > 1980), data = lf)
ids <- c("ansonca01", "forceda01", "mathebo01")
partial_eval(quote(id %in% ids), lf)
# cf.
partial_eval(quote(id == .data$id), lf)
# You can use local() or .env to disambiguate between local and remote
# variables: otherwise remote is always preferred
year <- 1980
partial_eval(quote(year > year), lf)
partial_eval(quote(year > local(year)), lf)
partial_eval(quote(year > .env$year), lf)
# Functions are always assumed to be remote. Use local to force evaluation
# in R.
f <- function(x) x + 1
partial_eval(quote(year > f(1980)), lf)
partial_eval(quote(year > local(f(1980))), lf)
Pivot data from wide to long
Description
pivot_longer()
"lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
tidyr::pivot_wider()
.
Learn more in vignette("pivot", "tidyr")
.
While most functionality is identical there are some differences to
pivot_longer()
on local data frames:
the output is sorted differently/not explicitly,
the coercion of mixed column types is left to the database,
-
values_ptypes
NOT supported.
Note that build_longer_spec()
and pivot_longer_spec()
do not work with
remote tables.
Usage
## S3 method for class 'tbl_lazy'
pivot_longer(
data,
cols,
...,
cols_vary,
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes,
values_transform = NULL
)
Arguments
data |
A data frame to pivot. |
cols |
Columns to pivot into longer format. |
... |
Additional arguments passed on to methods. |
cols_vary |
Unsupported; included for compatibility with the generic. |
names_to |
A string specifying the name of the column to create
from the data stored in the column names of |
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep , names_pattern |
If |
names_ptypes |
A list of column name-prototype pairs. |
names_transform , values_transform |
A list of column name-function pairs. |
names_repair |
What happens if the output has invalid column names? |
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
values_drop_na |
If |
values_ptypes |
Not supported. |
Details
The SQL translation basically works as follows:
split the specification by its key columns i.e. by variables crammed into the column names.
for each part in the split specification
transmute()
data
into the following columns
id columns i.e. columns that are not pivotted
key columns
value columns i.e. columns that are pivotted
combine all the parts with
union_all()
Examples
# See vignette("pivot") for examples and explanation
# Simplest case where column names are character data
memdb_frame(
id = c("a", "b"),
x = 1:2,
y = 3:4
) %>%
tidyr::pivot_longer(-id)
Pivot data from long to wide
Description
pivot_wider()
"widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer()
.
Learn more in vignette("pivot", "tidyr")
.
Note that pivot_wider()
is not and cannot be lazy because we need to look
at the data to figure out what the new column names will be.
If you have a long running query you have two options:
(temporarily) store the result of the query via
compute()
.Create a spec before and use
dbplyr_pivot_wider_spec()
- dbplyr's version oftidyr::pivot_wider_spec()
. Note that this function is only a temporary solution untilpivot_wider_spec()
becomes a generic. It will then be removed soon afterwards.
Usage
## S3 method for class 'tbl_lazy'
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL
)
dbplyr_pivot_wider_spec(
data,
spec,
...,
names_repair = "check_unique",
id_cols = NULL,
id_expand = FALSE,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL,
error_call = current_env()
)
Arguments
data |
A lazy data frame backed by a database query. |
... |
Unused; included for compatibility with generic. |
id_cols |
A set of columns that uniquely identifies each observation. |
id_expand |
Unused; included for compatibility with the generic. |
names_from , values_from |
A pair of
arguments describing which column (or columns) to get the name of the
output column ( If |
names_prefix |
String added to the start of every variable name. |
names_sep |
If |
names_glue |
Instead of |
names_sort |
Should the column names be sorted? If |
names_vary |
When
|
names_expand |
Should the values in the |
names_repair |
What happens if the output has invalid column names? |
values_fill |
Optionally, a (scalar) value that specifies what each
|
values_fn |
A function, the default is |
unused_fn |
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by The default drops all unused columns from the result. This can be a named list if you want to apply different aggregations to different unused columns.
This is similar to grouping by the |
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character |
error_call |
The execution environment of a currently
running function, e.g. |
Details
The big difference to pivot_wider()
for local data frames is that
values_fn
must not be NULL
. By default it is max()
which yields
the same results as for local data frames if the combination of id_cols
and value
column uniquely identify an observation.
Mind that you also do not get a warning if an observation is not uniquely
identified.
The translation to SQL code basically works as follows:
Get unique keys in
names_from
column.For each key value generate an expression of the form:
value_fn( CASE WHEN (`names from column` == `key value`) THEN (`value column`) END ) AS `output column`
Group data by id columns.
Summarise the grouped data with the expressions from step 2.
Examples
memdb_frame(
id = 1,
key = c("x", "y"),
value = 1:2
) %>%
tidyr::pivot_wider(
id_cols = id,
names_from = key,
values_from = value
)
Extract a single column
Description
This is a method for the dplyr pull()
generic. It evaluates the query
retrieving just the specified column.
Usage
## S3 method for class 'tbl_sql'
pull(.data, var = -1, name = NULL, ...)
Arguments
.data |
A lazy data frame backed by a database query. |
var |
A variable specified as:
The default returns the last column (on the assumption that's the column you've created most recently). This argument is taken by expression and supports quasiquotation (you can unquote column names and column locations). |
name |
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as |
... |
< |
Value
A vector of data.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:5, y = 5:1)
db %>%
mutate(z = x + y * 2) %>%
pull()
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- magrittr
Metadata about a remote table
Description
remote_name()
gives the unescaped name of the remote table, or NULL
if it
is a query (created by sql()
) or already escape (created by ident_q()
).
remote_table()
gives the remote table or the query.
remote_query()
gives the text of the query, and remote_query_plan()
the query plan (as computed by the remote database). remote_src()
and
remote_con()
give the dplyr source and DBI connection respectively.
Usage
remote_name(x, null_if_local = TRUE)
remote_table(x, null_if_local = TRUE)
remote_src(x)
remote_con(x)
remote_query(x, cte = FALSE, sql_options = NULL)
remote_query_plan(x, ...)
Arguments
x |
Remote table, currently must be a tbl_sql. |
null_if_local |
Return |
cte |
|
sql_options |
|
... |
Additional arguments passed on to methods. |
Value
The value, or NULL
if not remote table, or not applicable.
For example, computed queries do not have a "name"
Examples
mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp")
remote_name(mf)
remote_src(mf)
remote_con(mf)
remote_query(mf)
mf2 <- dplyr::filter(mf, x > 3)
remote_name(mf2)
remote_src(mf2)
remote_con(mf2)
remote_query(mf2)
Replace NAs with specified values
Description
This is a method for the tidyr::replace_na()
generic.
Usage
## S3 method for class 'tbl_lazy'
replace_na(data, replace = list(), ...)
Arguments
data |
A pair of lazy data frame backed by database queries. |
replace |
A named list of values, with one value for each column that has NA values to be replaced. |
... |
Unused; included for compatibility with generic. |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
df <- memdb_frame(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% tidyr::replace_na(list(x = 0, y = "unknown"))
Edit individual rows in the underlying database table
Description
These are methods for the dplyr rows_insert()
, rows_append()
,
rows_update()
, rows_patch()
, rows_upsert()
, and rows_delete()
generics.
When in_place = TRUE
these verbs do not generate SELECT
queries, but
instead directly modify the underlying data using INSERT
, UPDATE
, or
DELETE
operators. This will require that you have write access to
the database: the connection needs permission to insert, modify or delete
rows, but not to alter the structure of the table.
The default, in_place = FALSE
, generates equivalent lazy tables (using
SELECT
queries) that allow previewing the result without actually
modifying the underlying table on the database.
Usage
## S3 method for class 'tbl_lazy'
rows_insert(
x,
y,
by = NULL,
...,
conflict = c("error", "ignore"),
copy = FALSE,
in_place = FALSE,
returning = NULL,
method = NULL
)
## S3 method for class 'tbl_lazy'
rows_append(x, y, ..., copy = FALSE, in_place = FALSE, returning = NULL)
## S3 method for class 'tbl_lazy'
rows_update(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE,
returning = NULL
)
## S3 method for class 'tbl_lazy'
rows_patch(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE,
returning = NULL
)
## S3 method for class 'tbl_lazy'
rows_upsert(
x,
y,
by = NULL,
...,
copy = FALSE,
in_place = FALSE,
returning = NULL,
method = NULL
)
## S3 method for class 'tbl_lazy'
rows_delete(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE,
returning = NULL
)
Arguments
x |
A lazy table.
For |
y |
A lazy table, data frame, or data frame extensions (e.g. a tibble). |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
copy |
If |
in_place |
Should |
returning |
Columns to return. See |
method |
A string specifying the method to use. This is only relevant for
|
unmatched |
For One of:
|
Value
A new tbl_lazy
of the modified data.
With in_place = FALSE
, the result is a lazy query that prints visibly,
because the purpose of this operation is to preview the results.
With in_place = TRUE
, x
is returned invisibly,
because the purpose of this operation is the side effect of modifying rows
in the table behind x
.
Examples
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbExecute(con, "CREATE TABLE Ponies (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
cutie_mark TEXT
)")
ponies <- tbl(con, "Ponies")
applejack <- copy_inline(con, data.frame(
name = "Apple Jack",
cutie_mark = "three apples"
))
# The default behavior is to generate a SELECT query
rows_insert(ponies, applejack, conflict = "ignore")
# And the original table is left unchanged:
ponies
# You can also choose to modify the table with in_place = TRUE:
rows_insert(ponies, applejack, conflict = "ignore", in_place = TRUE)
# In this case `rows_insert()` returns nothing and the underlying
# data is modified
ponies
Subset, rename, and reorder columns using their names
Description
These are methods for the dplyr select()
, rename()
, and relocate()
generics. They generate the SELECT
clause of the SQL query.
These functions do not support predicate functions, i.e. you can
not use where(is.numeric)
to select all numeric variables.
Usage
## S3 method for class 'tbl_lazy'
select(.data, ...)
## S3 method for class 'tbl_lazy'
rename(.data, ...)
## S3 method for class 'tbl_lazy'
rename_with(.data, .fn, .cols = everything(), ...)
## S3 method for class 'tbl_lazy'
relocate(.data, ..., .before = NULL, .after = NULL)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.fn |
A function used to transform the selected |
.cols |
< |
.before , .after |
< |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1, y = 2, z = 3)
db %>% select(-y) %>% show_query()
db %>% relocate(z) %>% show_query()
db %>% rename(first = x, last = z) %>% show_query()
Simulate database connections
Description
These functions generate S3 objects that have been designed to simulate the action of a database connection, without actually having the database available. Obviously, this simulation can only be incomplete, but most importantly it allows us to simulate SQL generation for any database without actually connecting to it.
Usage
simulate_dbi(class = character(), ...)
Details
Simulated SQL always quotes identifies with `x`
, and strings with
'x'
.
SQL escaping.
Description
These functions are critical when writing functions that translate R functions to sql functions. Typically a conversion function should escape all its inputs and return an sql object.
Usage
sql(...)
is.sql(x)
as.sql(x, con)
Arguments
... |
Character vectors that will be combined into a single SQL expression. |
x |
Object to coerce |
con |
Needed when |
Generate SQL from R expressions
Description
Low-level building block for generating SQL from R expressions.
Strings are escaped; names become bare SQL identifiers. User infix
functions have %
stripped.
Usage
sql_expr(x, con = sql_current_con())
sql_call2(.fn, ..., con = sql_current_con())
Arguments
x |
A quasiquoted expression |
con |
Connection to use for escaping. Will be set automatically when called from a function translation. |
.fn |
Function name (as string, call, or symbol) |
... |
Arguments to function |
Details
Using sql_expr()
in package will require use of globalVariables()
to avoid R CMD check
NOTES. This is a small amount of additional pain,
which I think is worthwhile because it leads to more readable translation
code.
Examples
con <- simulate_dbi() # not necessary when writing translations
sql_expr(f(x + 1), con = con)
sql_expr(f("x", "y"), con = con)
sql_expr(f(x, y), con = con)
x <- ident("x")
sql_expr(f(!!x, y), con = con)
sql_expr(cast("x" %as% DECIMAL), con = con)
sql_expr(round(x) %::% numeric, con = con)
sql_call2("+", quote(x), 1, con = con)
sql_call2("+", "x", 1, con = con)
Options for generating SQL
Description
Options for generating SQL
Usage
sql_options(cte = FALSE, use_star = TRUE, qualify_all_columns = FALSE)
Arguments
cte |
If |
use_star |
If |
qualify_all_columns |
If |
Value
A <dbplyr_sql_options> object.
Examples
library(dplyr, warn.conflicts = FALSE)
lf1 <- lazy_frame(key = 1, a = 1, b = 2)
lf2 <- lazy_frame(key = 1, a = 1, c = 3)
result <- left_join(lf1, lf2, by = "key") %>%
filter(c >= 3)
show_query(result)
sql_options <- sql_options(cte = TRUE, qualify_all_columns = TRUE)
show_query(result, sql_options = sql_options)
Generate SQL for Insert, Update, Upsert, and Delete
Description
These functions generate the SQL used in rows_*(in_place = TRUE)
.
Usage
sql_query_insert(
con,
table,
from,
insert_cols,
by,
...,
conflict = c("error", "ignore"),
returning_cols = NULL,
method = NULL
)
sql_query_append(con, table, from, insert_cols, ..., returning_cols = NULL)
sql_query_update_from(
con,
table,
from,
by,
update_values,
...,
returning_cols = NULL
)
sql_query_upsert(
con,
table,
from,
by,
update_cols,
...,
returning_cols = NULL,
method = NULL
)
sql_query_delete(con, table, from, by, ..., returning_cols = NULL)
Arguments
con |
Database connection. |
table |
Table to update. Must be a table identifier.
Use a string to refer to tables in the current schema/catalog or
|
from |
Table or query that contains the new data. Either a table identifier or SQL. |
insert_cols |
Names of columns to insert. |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
returning_cols |
Optional. Names of columns to return. |
method |
Optional. The method to use. |
update_values |
A named SQL vector that specify how to update the columns. |
update_cols |
Names of columns to update. |
Details
Insert Methods
"where_not_exists"
The default for most databases.
INSERT INTO x_name SELECT * FROM y WHERE NOT EXISTS <match on by columns>
"on_conflict"
Supported by:
Postgres
SQLite
This method uses the ON CONFLICT
clause and therefore requires a unique
index on the columns specified in by
.
Upsert Methods
"merge"
The upsert method according to the SQL standard. It uses the MERGE
statement
MERGE INTO x_name USING y ON <match on by columns> WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...
"on_conflict"
Supported by:
Postgres
SQLite
This method uses the ON CONFLICT
clause and therefore requires a unique
index on the columns specified in by
.
"cte_update"
Supported by:
Postgres
SQLite
Oracle
The classical way to upsert in Postgres and SQLite before support for
ON CONFLICT
was added. The update is done in a CTE clause and the unmatched
values are then inserted outside of the CTE.
Value
A SQL query.
Examples
sql_query_upsert(
con = simulate_postgres(),
table = ident("airlines"),
from = ident("df"),
by = "carrier",
update_cols = "name"
)
Helper function for quoting sql elements.
Description
If the quote character is present in the string, it will be doubled.
NA
s will be replaced with NULL.
Usage
sql_quote(x, quote)
Arguments
x |
Character vector to escape. |
quote |
Single quoting character. |
Examples
sql_quote("abc", "'")
sql_quote("I've had a good day", "'")
sql_quote(c("abc", NA), "'")
Create an sql translator
Description
When creating a package that maps to a new SQL based src, you'll often want to provide some additional mappings from common R commands to the commands that your tbl provides. These three functions make that easy.
Usage
sql_substr(f = "SUBSTR")
sql_str_sub(subset_f = "SUBSTR", length_f = "LENGTH", optional_length = TRUE)
sql_paste(default_sep, f = "CONCAT_WS")
sql_paste_infix(default_sep, op, cast)
sql_variant(
scalar = sql_translator(),
aggregate = sql_translator(),
window = sql_translator()
)
sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv()))
sql_infix(f, pad = TRUE)
sql_prefix(f, n = NULL)
sql_aggregate(f, f_r = f)
sql_aggregate_2(f)
sql_aggregate_n(f, f_r = f)
sql_not_supported(f)
sql_cast(type)
sql_try_cast(type)
sql_log()
sql_cot()
sql_runif(rand_expr, n = n(), min = 0, max = 1)
base_scalar
base_agg
base_win
base_no_win
base_odbc_scalar
base_odbc_agg
base_odbc_win
Arguments
f |
the name of the sql function as a string |
scalar , aggregate , window |
The three families of functions than an SQL variant can supply. |
... , .funs |
named functions, used to add custom converters from standard
R functions to sql functions. Specify individually in |
.parent |
the sql variant that this variant should inherit from.
Defaults to |
pad |
If |
n |
for |
f_r |
the name of the r function being translated as a string |
Helper functions
sql_infix()
and sql_prefix()
create default SQL infix and prefix
functions given the name of the SQL function. They don't perform any input
checking, but do correctly escape their input, and are useful for
quickly providing default wrappers for a new SQL variant.
See Also
win_over()
for helper functions for window functions.
sql()
for an example of a more customised sql
conversion function.
Examples
# An example of adding some mappings for the statistical functions that
# postgresql provides: http://bit.ly/K5EdTn
postgres_agg <- sql_translator(.parent = base_agg,
cor = sql_aggregate_2("CORR"),
cov = sql_aggregate_2("COVAR_SAMP"),
sd = sql_aggregate("STDDEV_SAMP", "sd"),
var = sql_aggregate("VAR_SAMP", "var")
)
# Next we have to simulate a connection that uses this variant
con <- simulate_dbi("TestCon")
sql_translation.TestCon <- function(x) {
sql_variant(
base_scalar,
postgres_agg,
base_no_win
)
}
translate_sql(cor(x, y), con = con, window = FALSE)
translate_sql(sd(income / years), con = con, window = FALSE)
# Any functions not explicitly listed in the converter will be translated
# to sql as is, so you don't need to convert all functions.
translate_sql(regr_intercept(y, x), con = con)
Database src
Description
Since can generate a tbl()
directly from a DBI connection we no longer
recommend using src_dbi()
.
Usage
src_dbi(con, auto_disconnect = FALSE)
Arguments
con |
An object that inherits from DBI::DBIConnection, typically generated by DBI::dbConnect |
auto_disconnect |
Should the connection be automatically closed when
the src is deleted? Set to |
Value
An S3 object with class src_dbi
, src_sql
, src
.
Create a "sql src" object
Description
Deprecated: please use directly use a DBIConnection
object instead.
Usage
src_sql(subclass, con, ...)
Arguments
subclass |
name of subclass. "src_sql" is an abstract base class, so you
must supply this value. |
con |
the connection object |
... |
fields used by object |
Summarise each group to one row
Description
This is a method for the dplyr summarise()
generic. It generates the
SELECT
clause of the SQL query, and generally needs to be combined with
group_by()
.
Usage
## S3 method for class 'tbl_lazy'
summarise(.data, ..., .by = NULL, .groups = NULL)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.groups |
When In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to |
Value
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db %>%
summarise(n()) %>%
show_query()
db %>%
group_by(g) %>%
summarise(n()) %>%
show_query()
Create a local lazy tibble
Description
These functions are useful for testing SQL generation without having to
have an active database connection. See simulate_dbi()
for a list
available database simulations.
Usage
tbl_lazy(df, con = NULL, ..., name = "df")
lazy_frame(..., con = NULL, .name = "df")
Examples
library(dplyr)
df <- data.frame(x = 1, y = 2)
df_sqlite <- tbl_lazy(df, con = simulate_sqlite())
df_sqlite %>% summarise(x = sd(x, na.rm = TRUE)) %>% show_query()
Create an SQL tbl (abstract)
Description
Generally, you should no longer need to provide a custom tbl()
method.
The default tbl.DBIConnect
method should work in most cases.
Usage
tbl_sql(subclass, src, from, ..., vars = NULL, check_from = deprecated())
Arguments
subclass |
name of subclass |
... |
needed for agreement with generic. Not otherwise used. |
vars |
Provide column names as a character vector
to avoid retrieving them from the database.
Mainly useful for better performance when creating
multiple |
check_from |
Use dplyr verbs with a remote database table
Description
All data manipulation on SQL tbls are lazy: they will not actually
run the query or retrieve the data unless you ask for it: they all return
a new tbl_dbi
object. Use compute()
to run the query and save the
results in a temporary in the database, or use collect()
to retrieve the
results to R. You can see the query with show_query()
.
Usage
## S3 method for class 'src_dbi'
tbl(src, from, ...)
Arguments
src |
A |
from |
Either a table identifier or a literal Use a string to identify a table in the current schema/catalog. We
recommend using |
... |
Passed on to |
Details
For best performance, the database should have an index on the variables
that you are grouping by. Use explain()
to check that the database is using
the indexes that you expect.
There is one verb that is not lazy: do()
is eager because it must pull
the data into R.
Examples
library(dplyr)
# Connect to a temporary in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Add some data
copy_to(con, mtcars)
DBI::dbListTables(con)
# To retrieve a single table from a source, use `tbl()`
con %>% tbl("mtcars")
# Use `I()` for qualified table names
con %>% tbl(I("temp.mtcars")) %>% head(1)
# You can also use pass raw SQL if you want a more sophisticated query
con %>% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8"))
# If you just want a temporary in-memory database, use src_memdb()
src2 <- src_memdb()
# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.
if (requireNamespace("Lahman", quietly = TRUE)) {
batting <- copy_to(con, Lahman::Batting)
batting
# Basic data manipulation verbs work in the same way as with a tibble
batting %>% filter(yearID > 2005, G > 130)
batting %>% select(playerID:lgID)
batting %>% arrange(playerID, desc(yearID))
batting %>% summarise(G = mean(G), n = n())
# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting %>%
select(playerID:lgID, AB, R, G) %>%
mutate(
R_per_game1 = R / G,
R_per_game2 = R * 1.0 / G
)
# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))
# You can see the query that dplyr creates with show_query()
batting %>%
filter(G > 0) %>%
group_by(playerID) %>%
summarise(n = n()) %>%
show_query()
}
Infrastructure for testing dplyr
Description
Register testing sources, then use test_load()
to load an existing
data frame into each source. To create a new table in each source,
use test_frame()
.
Usage
test_register_src(name, src)
test_register_con(name, ...)
src_test(name)
test_load(
df,
name = unique_table_name(),
srcs = test_srcs$get(),
ignore = character()
)
test_frame(..., srcs = test_srcs$get(), ignore = character())
Examples
## Not run:
test_register_src("sqlite", {
DBI::dbConnect(RSQLite::SQLite(), ":memory:", create = TRUE)
})
test_frame(x = 1:3, y = 3:1)
test_load(mtcars)
## End(Not run)
Translate an expression to SQL
Description
dbplyr translates commonly used base functions including logical
(!
, &
, |
), arithmetic (^
), and comparison (!=
) operators, as well
as common summary (mean()
, var()
), and transformation (log()
)
functions. All other functions will be preserved as is. R's infix functions
(e.g. %like%
) will be converted to their SQL equivalents (e.g. LIKE
).
Learn more in vignette("translation-function")
.
Usage
translate_sql(
...,
con,
vars_group = NULL,
vars_order = NULL,
vars_frame = NULL,
window = TRUE
)
translate_sql_(
dots,
con,
vars_group = NULL,
vars_order = NULL,
vars_frame = NULL,
window = TRUE,
context = list()
)
Arguments
... , dots |
Expressions to translate. |
con |
An optional database connection to control the details of
the translation. The default, |
vars_group , vars_order , vars_frame |
Parameters used in the |
window |
Use |
context |
Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list. |
Examples
con <- simulate_dbi()
# Regular maths is translated in a very straightforward way
translate_sql(x + 1, con = con)
translate_sql(sin(x) + tan(y), con = con)
# Note that all variable names are escaped
translate_sql(like == "x", con = con)
# In ANSI SQL: "" quotes variable _names_, '' quotes strings
# Logical operators are converted to their sql equivalents
translate_sql(x < 5 & !(y >= 5), con = con)
# xor() doesn't have a direct SQL equivalent
translate_sql(xor(x, y), con = con)
# If is translated into case when
translate_sql(if (x > 5) "big" else "small", con = con)
# Infix functions are passed onto SQL with % removed
translate_sql(first %like% "Had%", con = con)
translate_sql(first %is% NA, con = con)
translate_sql(first %in% c("John", "Roger", "Robert"), con = con)
# And be careful if you really want integers
translate_sql(x == 1, con = con)
translate_sql(x == 1L, con = con)
# If you have an already quoted object, use translate_sql_:
x <- quote(y + 1 / sin(t))
translate_sql_(list(x), con = simulate_dbi())
# Windowed translation --------------------------------------------
# Known window functions automatically get OVER()
translate_sql(mpg > mean(mpg), con = con)
# Suppress this with window = FALSE
translate_sql(mpg > mean(mpg), window = FALSE, con = con)
# vars_group controls partition:
translate_sql(mpg > mean(mpg), vars_group = "cyl", con = con)
# and vars_order controls ordering for those functions that need it
translate_sql(cumsum(mpg), con = con)
translate_sql(cumsum(mpg), vars_order = "mpg", con = con)
Generate SQL expression for window functions
Description
win_over()
makes it easy to generate the window function specification.
win_absent()
, win_rank()
, win_aggregate()
, and win_cumulative()
provide helpers for constructing common types of window functions.
win_current_group()
and win_current_order()
allow you to access
the grouping and order context set up by group_by()
and arrange()
.
Usage
win_over(
expr,
partition = NULL,
order = NULL,
frame = NULL,
con = sql_current_con()
)
win_rank(f, empty_order = FALSE)
win_aggregate(f)
win_aggregate_2(f)
win_cumulative(f)
win_absent(f)
win_current_group()
win_current_order()
win_current_frame()
win_rank_tdata(f)
Arguments
expr |
The window expression |
partition |
Variables to partition over |
order |
Variables to order by |
frame |
A numeric vector of length two defining the frame. |
f |
The name of an sql function as a string |
empty_order |
A logical value indicating whether to order by NULL if |
Examples
con <- simulate_dbi()
win_over(sql("avg(x)"), con = con)
win_over(sql("avg(x)"), "y", con = con)
win_over(sql("avg(x)"), order = "y", con = con)
win_over(sql("avg(x)"), order = c("x", "y"), con = con)
win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con)
Override window order and frame
Description
These allow you to override the PARTITION BY
and ORDER BY
clauses
of window functions generated by grouped mutates.
Usage
window_order(.data, ...)
window_frame(.data, from = -Inf, to = Inf)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
Variables to order by |
from , to |
Bounds of the frame. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10)
db %>%
window_order(y) %>%
mutate(z = cumsum(y)) %>%
show_query()
db %>%
group_by(g) %>%
window_frame(-3, 0) %>%
window_order(z) %>%
mutate(z = sum(y)) %>%
show_query()