This page is designed for researchers new to DataSHIELD. It gives a high-level overview of the key steps involved, from the complete beginning to a finished analysis. Sections contain additional links with more detailed explanation where necessary. Below are diagrams depicting the key stages in using DataSHIELD:
Most data sources are not fully open-access, so you will need to request access to the data. There may be a fee to be paid, and the data owner may require you to complete a Data Access Agreement. Each data owner will have their own procedure and you will need to contact them individually to enquire.
Once you have permission to access the data, you will need to request DataSHIELD login credentials from each data owner. They should provide you with:
Here there are two main options. Some consortia use a Central Analysis Server (CAS) which contains an RStudio environment with all required packages pre-installed. If you are using a CAS, you will need to request login details for this from the administrator.
If you are not using a CAS, you will need to install RStudio and DataSHIELD packages on your local computer.
Open RStudio (either locally or on the CAS) and follow these steps.
Once you have logged in, you need to access the specific tables or resources to which you have been given access by the data owner. You do this by 'assigning' the remote data to a session within that server. This page explains in greater detail and gives examples of the R code.
DataSHIELD contains many options for data analysis. We have a separate analysis tutorials section of the wiki dedicated to tutorials on using DataSHIELD as a researcher.
Once summary statistics have been returned by DataSHIELD, they can be manipulated using any function or package within R. They can also be saved in your local RStudio workspace and exported as images or tables. This tutorial explains the difference between server-side and client-side workspaces, and here you can learn about exporting results.