This tutorial demonstrates the basics of session management.
To view the objects that currently exist in your remote session, use:
ds.ls()
After you have been working on one or more data sources (e.g. manipulating data), you can save the manipulations on the server.
For example, suppose you have created an extra variable called data
:
ds.assign("data$mpg", symbol = "data", datasources = connections)
ds.ls()
We can save the remote workspace on each server so that next time we start we do not need to recreate this object.
datashield.workspace_save(connections, "my_remote_workspace")
Now disconnect from DataSHIELD:
datashield.disconnect(connections)
Now login again, but this time specify that you want to restore the previous workspace:
connections <- datashield.login(logindata, restore = "my_remote_workspace")
Now we can check and see that the object we previously created still exists:
ds.ls()
This is not specific to DataSHIELD, but it is helpful to see how the two interact. Imagine we have performed a DataSHIELD operation which returns some summary information, e.g. a mean:
my_mean <- ds.mean("data$variable")
We can now save our local workspace using:
save.image("my_local_workspace")
This will save the workspace on your computer in the current working directory. We can save this information locally because it does not disclose individual participant data. If you want to save it somewhere else, simply change the file path. To check what the current working directory is, you can use:
getwd()
To show that this has saved, we can use the following command to clear the workspace:
rm(list = ls())
ls()
You'll see that now the workspace is empty. We can restore the workspace we previously saved:
load("my_local_workspace")
ls()
Notice that the object "my_mean" is now there again. Saving your local workspace alongside your remote workspace is good practice. This saves time because you can reuse DataSHIELD output (e.g. descriptive statistics, model output) rather than having to rerun DataSHIELD analyses every time.
When you are given access to remote data, you will be given access to specific tables. The data owner should inform you of the table names, but it can be helpful to check which tables you have access to. You can do this with the following command:
datashield.tables(connections)
The installation of remote packages is managed by the data owner. If you are working within a consortium, this should be managed so that all the packages you need are available. However, if you need to check what is installed on each remote server, you can run:
datashield.pkg_status(connections)
Profiles are collections of server-side packages. For more information on profiles, you can read here. To check which profiles are installed on the servers, you can run:
datashield.profiles(connections)