ds.mdPattern.RdThis function is a client-side wrapper for the server-side mdPatternDS function. It generates a missing data pattern matrix similar to mice::md.pattern but with disclosure control applied to prevent revealing small cell counts.
ds.mdPattern(x = NULL, type = "split", datasources = NULL)a character string specifying the name of a data frame or matrix on the server-side containing the data to analyze.
a character string specifying the output type. If 'split' (default), returns separate patterns for each study. If 'combine', attempts to pool patterns across studies.
a list of DSConnection-class objects obtained
after login. If the datasources argument is not specified, the default set of
connections will be used: see datashield.connections_default.
For type='split': A list with one element per study, each containing:
The missing data pattern matrix for that study
Logical indicating if all patterns meet disclosure requirements
A message describing the validity status
For type='combine': A list containing:
The pooled missing data pattern matrix across all studies
Logical indicating if all pooled patterns meet disclosure requirements
A message describing the validity status
The function calls the server-side mdPatternDS function which uses mice::md.pattern to analyze missing data patterns. Patterns with counts below the disclosure threshold (default: nfilter.tab = 3) are suppressed to maintain privacy.
Output Format: - Each row represents a missing data pattern - Pattern counts are shown in row names (e.g., "150", "25") - Columns show 1 if the variable is observed, 0 if missing - Last column shows the total number of missing values per pattern - Last row shows the total number of missing values per variable
Disclosure Control:
Suppressed patterns (count below threshold) are indicated by: - Row name: "suppressed(<N>)" where N is the threshold - All pattern values set to NA - Summary row also suppressed to prevent back-calculation
Pooling Behavior (type='combine'):
When pooling across studies, the function uses a conservative approach for disclosure control:
1. Identifies identical missing patterns across studies 2. EXCLUDES suppressed patterns from pooling - patterns suppressed in ANY study are not included in the pooled count 3. Sums counts only for non-suppressed identical patterns 4. Re-validates pooled counts against disclosure threshold
Important: This conservative approach means: - Pooled counts may be underestimates if some studies had suppressed patterns - This prevents disclosure through subtraction (e.g., if study A shows count=5 and pool shows count=7, one could deduce study B has count=2, violating disclosure) - Different patterns across studies are preserved separately in the pooled result
if (FALSE) { # \dontrun{
## Version 6, for version 5 see the Wiki
# Connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Get missing data patterns for each study separately
patterns_split <- ds.mdPattern(x = "D", type = "split", datasources = connections)
# View results for study1
print(patterns_split$study1$pattern)
# var1 var2 var3
# 150 1 1 1 0 <- 150 obs complete
# 25 0 1 1 1 <- 25 obs missing var1
# 25 0 0 25 <- Summary: 25 missing per variable
# Get pooled missing data patterns across studies
patterns_pooled <- ds.mdPattern(x = "D", type = "combine", datasources = connections)
print(patterns_pooled$pattern)
# Example with suppressed patterns:
# If study1 has a pattern with count=2 (suppressed) and study2 has same pattern
# with count=5 (valid), the pooled result will show count=5 (conservative approach)
# A warning will indicate: "Pooled counts may underestimate the true total"
# Clear the Datashield R sessions and logout
datashield.logout(connections)
} # }