This function identifies missing values and replaces them by a value or values specified by the analyst.

ds.replaceNA(x = NULL, forNA = NULL, newobj = NULL, datasources = NULL)

Arguments

x

a character string specifying the name of the vector.

forNA

a list or a vector that contains the replacement value(s), for each study. The length of the list or vector must be equal to the number of servers (studies).

newobj

a character string that provides the name for the output object that is stored on the data servers. Default replacena.newobj.

datasources

a list of DSConnection-class objects obtained after login. If the datasources argument is not specified the default set of connections will be used: see datashield.connections_default.

Value

ds.replaceNA returns to the server-side a new vector or table structure with the missing values replaced by the specified values. The class of the vector is the same as the initial vector.

Details

This function is used when the analyst prefers or requires complete vectors. It is then possible the specify one value for each missing value by first returning the number of missing values using the function ds.numNA but in most cases, it might be more sensible to replace all missing values by one specific value e.g. replace all missing values in a vector by the mean or median value. Once the missing values have been replaced a new vector is created.

Note: If the vector is within a table structure such as a data frame the new vector is appended to table structure so that the table holds both the vector with and without missing values.

Server function called: replaceNaDS

Author

DataSHIELD Development Team

Examples

if (FALSE) {

  ## Version 6, for version 5 see the Wiki
  # Connecting to the Opal servers

    require('DSI')
    require('DSOpal')
    require('dsBaseClient')

    builder <- DSI::newDSLoginBuilder()
    builder$append(server = "study1", 
                   url = "http://192.168.56.100:8080/", 
                   user = "administrator", password = "datashield_test&", 
                   table = "CNSIM.CNSIM1", driver = "OpalDriver")
    builder$append(server = "study2", 
                   url = "http://192.168.56.100:8080/", 
                   user = "administrator", password = "datashield_test&", 
                   table = "CNSIM.CNSIM2", driver = "OpalDriver")
    builder$append(server = "study3",
                   url = "http://192.168.56.100:8080/", 
                   user = "administrator", password = "datashield_test&", 
                   table = "CNSIM.CNSIM3", driver = "OpalDriver")
    logindata <- builder$build()

  # Log onto the remote Opal training servers
  connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D") 

  # Example 1: Replace missing values in variable 'LAB_HDL' by the mean value 
  # in each study
  
  # Get the mean value of  'LAB_HDL' for each study
  mean <- ds.mean(x = "D$LAB_HDL",
                  type = "split",
                  datasources = connections)

  # Replace the missing values using the mean for each study
  ds.replaceNA(x = "D$LAB_HDL",
               forNA = list(mean[[1]][1], mean[[1]][2], mean[[1]][3]),
               newobj = "HDL.noNA",
               datasources = connections)
               
  # Example 2: Replace missing values in categorical variable 'PM_BMI_CATEGORICAL'
  # with 999s
 
  # First check how many NAs there are in 'PM_BMI_CATEGORICAL' in each study
  ds.table(rvar = "D$PM_BMI_CATEGORICAL", 
          useNA = "always")   
          
  # Replace the missing values with 999s
  ds.replaceNA(x = "D$PM_BMI_CATEGORICAL", 
               forNA = c(999,999,999), 
               newobj = "bmi999")
               
  # Check if the NAs have been replaced correctly
  ds.table(rvar = "bmi999", 
          useNA = "always")   
 
  # Clear the Datashield R sessions and logout  
  datashield.logout(connections) 
}