Analysis within DataSHIELD is conducted using R, a statistical programming language. DataSHIELD analysis is similar to standard analysis using R, but contains some important differences which are explained in this article.
When you conduct analysis in R, you can save objects in an environment (also referred to as workspace). 'Objects' refer to do any type of data structure, e.g. strings, vectors, matrices or data frames. For a given project the, environment will contain all objects you have loaded or created. You can see the contents in the 'environment' tab in RStudio, or by running ls()
. Your environment is normally saved when you exit RStudio, and reloaded when you restart.
In DataSHIELD, you have two environments. Your first environment is the server-side environment. This environment contains all of the individual level data to which you have remote access. Unlike with R, you can't directly view the objects in this environment, as this would allow you to see individual level data which would undermine the purpose of DataSHIELD. Instead, you can view limited, non-disclosive information about the objects in this environment. For example, ds.ls()
will list all of the objects present. Other tutorials in this wiki explain which information you can return and how.
The second environment you have access to is the local environment. Unlike normal analysis within R, this environment won't contain invididual-level data. Instead, it will contain summary data, such as descriptive statistics and output from models.
When conducting analysis with DataSHIELD, it is important to regularly save both workspaces. Saving the server-side workspace allows you to save any manipulations you have conducted (e.g. recoding variables). Saving the local workspace allows you to reuse statistics returned by DataSHIELD without re-running analysis. How to save these workspaces is explained in the tutorial on session management.