ds.cov.Rd
This function calculates the covariance of two variables or the variance-covariance matrix for the variables of an input data frame.
ds.cov(
x = NULL,
y = NULL,
naAction = "pairwise.complete",
type = "split",
datasources = NULL
)
a character string providing the name of the input vector, data frame or matrix.
a character string providing the name of the input vector, data frame or matrix. Default NULL.
a character string giving a method for computing covariances in the
presence of missing values. This must be set to 'casewise.complete'
or
'pairwise.complete'
. Default 'pairwise.complete'
. For more information see details.
a character string that represents the type of analysis to carry out.
This must be set to 'split'
or 'combine'
. Default 'split'
. For more information see details.
a list of DSConnection-class
objects obtained after login.
If the datasources
argument is not specified
the default set of connections will be used: see datashield.connections_default
.
ds.cov
returns a list containing the number of missing values in each variable, the number of missing values
casewise or pairwise depending on the argument naAction
, the covariance matrix, the number of used complete cases
and an error message which indicates whether or not the input variables pass the disclosure controls. The first disclosure
control checks that the number of variables is not bigger than a percentage of the individual-level records (the allowed
percentage is pre-specified by the 'nfilter.glm'). The second disclosure control checks that none of them is dichotomous
with a level having fewer counts than the pre-specified 'nfilter.tab' threshold. If any of the input variables do not pass
the disclosure controls then all the output values are replaced with NAs. If all the variables are valid and pass
the controls, then the output matrices are returned and also an error message is returned but it is replaced by NA.
In addition to computing covariances; this function produces a table outlining the number of complete cases and a table outlining the number of missing values to allow for the user to decide about the 'relevance' of the covariance based on the number of complete cases included in the covariance calculations.
If the argument y
is not NULL, the dimensions of the object have to be
compatible with the argument x
.
If naAction
is set to 'casewise.complete'
, then the function omits all the rows
in the whole data frame that include at least one cell with a missing value before the calculation of covariances.
If naAction
is set to 'pairwise.complete'
(default),
then the function divides the input data frame to
subset data frames formed by each pair between two variables
(all combinations are considered) and omits the rows
with missing values at each pair separately and then calculates the covariances of those pairs.
If type
is set to 'split'
(default), the covariance of two variables or the
variance-covariance matrix of an input data frame and the number of
complete cases and missing values are returned for every single study.
If type is set to 'combine'
, the pooled covariance, the total number of complete cases
and the total number of missing values aggregated from all the involved studies, are returned.
Server function called: covDS
if (FALSE) { # \dontrun{
## Version 6, for version 5 see the Wiki
# Connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
# Log onto the remote Opal training servers
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Calculate the covariance between two vectors
ds.assign(newobj='labhdl', toAssign='D$LAB_HDL', datasources = connections)
ds.assign(newobj='labtsc', toAssign='D$LAB_TSC', datasources = connections)
ds.assign(newobj='gender', toAssign='D$GENDER', datasources = connections)
ds.cov(x = 'labhdl',
y = 'labtsc',
naAction = 'pairwise.complete',
type = 'combine',
datasources = connections)
ds.cov(x = 'labhdl',
y = 'gender',
naAction = 'pairwise.complete',
type = 'combine',
datasources = connections[1]) #only the first Opal server is used ("study1")
# clear the Datashield R sessions and logout
datashield.logout(connections)
} # }