ds.merge.Rd
Merges (links) two data frames together based on common values in defined vectors in each data frame.
ds.merge(
x.name = NULL,
y.name = NULL,
by.x.names = NULL,
by.y.names = NULL,
all.x = FALSE,
all.y = FALSE,
sort = TRUE,
suffixes = c(".x", ".y"),
no.dups = TRUE,
incomparables = NULL,
newobj = NULL,
datasources = NULL
)
a character string specifying the name of the first data frame to be merged. The length of the string should be less than the specified threshold for the nfilter.stringShort which is one of the disclosure prevention checks in DataSHIELD.
a character string specifying the name of the second data frame to be merged. The length of the string should be less than the specified threshold for the nfilter.stringShort which is one of the disclosure prevention checks in DataSHIELD.
a character string or a vector of names specifying
of the column(s) in data frame x.name
for merging.
a character string or a vector of names specifying
of the column(s) in data frame y.name
for merging.
logical. If TRUE then extra rows will be added to the output,
one for each row in x.name
that has no matching row in y.name
.
If FALSE the rows with data from both data frames are included in the output.
Default FALSE.
logical. If TRUE then extra rows will be added to the output,
one for each row in y.name
that has no matching row in x.name
.
If FALSE the rows with data from both data frames are included in the output.
Default FALSE.
logical. If TRUE the merged result is sorted on elements
in the by.x.names
and by.y.names
columns. Default TRUE.
a character vector of length 2 specifying the suffixes to be used for making unique common column names in the two input data frames when they both appear in the merged data frame.
logical. Suffixes are appended in more cases to avoid duplicated column names in the merged data frame. Default TRUE (FALSE before R version 3.5.0).
values that cannot be matched. This is intended to
be used for merging on
one column, so these are incomparable values of that column.
For more information see match
in native R merge
function.
a character string that provides the name for the output
variable that is stored on the data servers. Default merge.newobj
.
a list of DSConnection-class
objects obtained after login. If the datasources
argument is not specified
the default set of connections will be used: see datashield.connections_default
.
ds.merge
returns the merged data frame that is written on the server-side.
Also, two validity messages are returned to the client-side
indicating whether the new object has been created in each data source and if so whether
it is in a valid form.
This function is similar to the native R function merge
.
There are some changes compared with the native R function
in choosing which variables to use to merge the data frames, the function merge
is very flexible. For example, you can choose to merge
using all vectors that appear in both data frames. However, for ds.merge
in DataSHIELD it is required that all the vectors which dictate the merging
are explicitly identified for both data frames using the by.x.names
and
by.y.names
arguments.
Server function called: mergeDS
if (FALSE) { # \dontrun{
## Version 6, for version 5 see the Wiki
# connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
#Create two data frames with a common column
ds.dataFrame(x = c("D$LAB_TSC","D$LAB_TRIG","D$LAB_HDL","D$LAB_GLUC_ADJUSTED"),
completeCases = TRUE,
newobj = "df.x",
datasources = connections)
ds.dataFrame(x = c("D$LAB_TSC","D$GENDER","D$PM_BMI_CATEGORICAL","D$PM_BMI_CONTINUOUS"),
completeCases = TRUE,
newobj = "df.y",
datasources = connections)
# Merge data frames using the common variable "LAB_TSC"
ds.merge(x.name = "df.x",
y.name = "df.y",
by.x.names = "df.x$LAB_TSC",
by.y.names = "df.y$LAB_TSC",
all.x = TRUE,
all.y = TRUE,
sort = TRUE,
suffixes = c(".x", ".y"),
no.dups = TRUE,
newobj = "df.merge",
datasources = connections)
# clear the Datashield R sessions and logout
datashield.logout(connections)
} # }