Generates a valid subset of a table or a vector

The function uses the R classical subsetting with squared brackets '[]' and allows also to subset using a logical operator and a threshold. The object to subset from must be a vector (factor, numeric or character) or a table (data.frame or matrix).

ds.subset(
  x = NULL,
  subset = "subsetObject",
  completeCases = FALSE,
  rows = NULL,
  cols = NULL,
  logicalOperator = NULL,
  threshold = NULL,
  datasources = NULL
)

Arguments

x: a character, the name of the dataframe or the factor vector and the range of the subset.
subset: the name of the output object, a list that holds the subset object. If set to NULL the default name of this list is 'subsetObject'
completeCases: a character that tells if only complete cases should be included or not.
rows: a vector of integers, the indices of the rows to extract.
cols: a vector of integers or a vector of characters; the indices of the columns to extract or their names.
logicalOperator: a boolean, the logical parameter to use if the user wishes to subset a vector using a logical operator. This parameter is ignored if the input data is not a vector.
threshold: a numeric, the threshold to use in conjunction with the logical parameter. This parameter is ignored if the input data is not a vector.
datasources: a list of DSConnection-class objects obtained after login. If the <datasources> the default set of connections will be used: see datashield.connections_default.

Value

no data are return to the user, the generated subset dataframe is stored on the server side.

Details

(1) If the input data is a table the user specifies the rows and/or columns to include in the subset; the columns can be referred to by their names. Table subsetting can also be done using the name of a variable and a threshold (see example 3). (2) If the input data is a vector and the parameters 'rows', 'logical' and 'threshold' are all provided the last two are ignored (i.e. 'rows' has precedence over the other two parameters then). IMPORTANT NOTE: If the requested subset is not valid (i.e. contains less than the allowed number of observations) all the values are turned into missing values (NA). Hence an invalid subset is indicated by the fact that all values within it are set to NA.

Author

Gaye, A.

Examples