The DataSHIELD Interface (DSI) defines a set of S4 classes and generic methods that can be implemented for accessing a data repository supporting the DataSHIELD infrastructure: controlled R commands to be executed on the server side are garanteeing that non disclosive information is returned to client side.
Learn more about DataSHIELD.
Class Structure
The DSI classes are:
-
DSObjecta common base class for all DSI, -
DSDriverdrives the creation of a connection object, -
DSConnectionallows the interaction with the remote server; DataSHIELD operations such as aggregation and assignment return a result object; DataSHIELD setup status check can be performed (dataset access, configuration comparision), -
DSResultwraps access to the result, which can be fetched either synchronously or asynchronously depending on the capabilities of the data repository server. -
DSSessionrepresents the remote R session and is used to get its state when it is started.
All classes are virtual: they cannot be instantiated directly and instead must be subclassed. See DSOpal for a reference implementation of DSI based on the Opal data warehouse. See also DSLite for a server-less implementation of DSI for local datasets.
These S4 classes and generic methods are meant to be used for implementing connection to a DataSHIELD-aware data repository.
Higher Level Functions
In addition to these S4 classes, DSI provides functions to handle a list of remote data repository servers:
-
datashield.loginanddatashield.logoutwill make use of theDSDriverparadigm to createDSConnections to the data repositories, -
datashield.sessionswill ensure that the remote R sessions are up and running before any operation in the remote R sessions. -
datashield.aggregateanddatashield.assignwill perform typical DataSHIELD operations onDSConnections, which result will be fetched throughDSResultobjects, -
datashield.connections,datashield.connections_defaultanddatashield.connections_findare functions for managing the list ofDSConnectionobjects that will be discovered and used by the client-side analytic functions. -
datashield.errorswill report the last R errors that may have occurred after adatashield.assignordatashield.aggregatecall. - Other data management functions are provided by the
DSConnectionobjects:-
datashield.workspaces,datashield.workspace_save,datashield.workspace_restoreanddatashield.workspace_rmallow to manage R images of the remote DataSHIELD sessions (to speed up data analysis sessions), -
datashield.symbolsanddatashield.symbol_rmoffer a minimalistic management of the R symbols living in the remote DataSHIELD sessions, -
datashield.tables,datashield.table_statuslist the tables and their accessibility across a set of data repositories, -
datashield.resources,datashield.resource_statuslist the resources and their accessibility across a set of data repositories, -
datashield.pkg_status,datashield.method_statusanddatashield.methodsare utility functions to explore the DataSHIELD setup across a set of data repositories. -
datashield.profileslists the DataSHIELD profiles that can be selected at login time.
-
These datashield.* functions are meant to be used by DataSHIELD packages developers and users.
Options
Some options can be set to modify the behavior of the DSI:
-
datashield.envis the R environment in which theDSConnectionobject list is to be looking for. Default value is the Global Environment:globalenv(). -
datashield.progressis a logical to enable the visibility of the progress bars. Default value isTRUE. -
datashield.progress.clearis a logical to make the progress bar disappear after it has been completed. Default value isFALSE. -
datashield.errors.stopis a logical to alter error handling behavior: ifTRUEan error is raised when at least one server has failed, otherwise a warning message is issued. Default value isTRUE. -
datashield.errors.printis a logical for controlling the error print in the console: ifTRUEthe errors are automatically printed in rich text, otherwise subsequent call todatashield.errors()is required to get the details of the errors. Default value isFALSE. -
datashield.polling.sleep.0time in seconds to wait before checking async calls completion, before ~1 second. Default value 50 milliseconds. -
datashield.polling.sleep.1base time in seconds to wait before checking async calls completion, after ~1 second. Default value is 1 second. -
datashield.polling.sleep.10time in seconds to wait before checking async calls completion, after ~10 seconds. Default value is x2 the base time (2 seconds). -
datashield.polling.sleep.60time in seconds to wait before checking async calls completion, after ~1 minute. Default value is x10 the base time (10 seconds). -
datashield.polling.sleep.600time in seconds to wait before checking async calls completion, after ~10 minutes. Default value is x60 the base time (1 minute). -
datashield.polling.sleep.3600time in seconds to wait before checking async calls completion, after ~1 hour. Default value is x600 the base time (10 minutes).