This tutorial explains how to produce summary statistics (e.g., means, counts) using the core DataSHIELD functionality and additional functions from the dsHelper
package.
library(dsBaseClient)
library(dsHelper)
Two important functions allow you to view the dimensions and column names of a data frame.
We can view the mean and variance of a variable using:
ds.mean(x = "iris$Sepal.Length")
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (meanDS(iris$Sepal.Length)) [-----------------------------------------] 0% / 0s Getting aggregate server1 (meanDS(iris$Sepal.Length)) [==========>---------------------] 33% / 0s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 0s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 0s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 1s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 1s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 1s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 1s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Waiting... (meanDS(iris$Sepal.Length)) [==============>-------------------------------] 33% / 1s Checking server2 (meanDS(iris$Sepal.Length)) [=============>---------------------------] 33% / 1s Getting aggregate server2 (meanDS(iris$Sepal.Length)) [====================>-----------] 67% / 1s Aggregated (meanDS(iris$Sepal.Length)) [===============================================] 100% / 1s
## $Mean.by.Study
## EstimatedMean Nmissing Nvalid Ntotal
## server1 5.843333 0 150 150
## server2 5.843333 0 150 150
##
## $Nstudies
## [1] 2
##
## $ValidityMessage
## ValidityMessage
## server1 "VALID ANALYSIS"
## server2 "VALID ANALYSIS"
ds.var(x = "iris$Sepal.Length")
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("Sepal.Length", iris)) [--------------------------------------] 0% / 0s Getting aggregate server1 (exists("Sepal.Length", iris)) [=========>-------------------] 33% / 0s Checking server2 (exists("Sepal.Length", iris)) [============>-------------------------] 33% / 0s Getting aggregate server2 (exists("Sepal.Length", iris)) [==================>----------] 67% / 0s Aggregated (exists("Sepal.Length", iris)) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (varDS(iris$Sepal.Length)) [------------------------------------------] 0% / 0s Getting aggregate server1 (varDS(iris$Sepal.Length)) [==========>----------------------] 33% / 0s Checking server2 (varDS(iris$Sepal.Length)) [=============>----------------------------] 33% / 0s Getting aggregate server2 (varDS(iris$Sepal.Length)) [=====================>-----------] 67% / 0s Aggregated (varDS(iris$Sepal.Length)) [================================================] 100% / 0s
## $Variance.by.Study
## EstimatedVar Nmissing Nvalid Ntotal
## server1 0.6856935 0 150 150
## server2 0.6856935 0 150 150
##
## $Nstudies
## [1] 2
##
## $ValidityMessage
## ValidityMessage
## server1 "VALID ANALYSIS"
## server2 "VALID ANALYSIS"
Summaries of categorical variables can be retrieved using the table
function:
ds.table("iris$Species")
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("Species", iris)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (exists("Species", iris)) [==========>-----------------------] 33% / 0s Checking server2 (exists("Species", iris)) [=============>-----------------------------] 33% / 0s Getting aggregate server2 (exists("Species", iris)) [======================>-----------] 67% / 0s Aggregated (exists("Species", iris)) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (asFactorDS1("iris$Species")) [---------------------------------------] 0% / 0s Getting aggregate server1 (asFactorDS1("iris$Species")) [=========>--------------------] 33% / 0s Checking server2 (asFactorDS1("iris$Species")) [============>--------------------------] 33% / 0s Getting aggregate server2 (asFactorDS1("iris$Species")) [===================>----------] 67% / 0s Aggregated (asFactorDS1("iris$Species")) [=============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Checking server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [>---] 33% / 0s Getting aggregate server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
##
## Data in all studies were valid
##
## Study 1 : No errors reported from this study
## Study 2 : No errors reported from this study
## $output.list
## $output.list$TABLE_rvar.by.study_row.props
## study
## iris$Species server1 server2
## setosa 0.5 0.5
## versicolor 0.5 0.5
## virginica 0.5 0.5
## NA NaN NaN
##
## $output.list$TABLE_rvar.by.study_col.props
## study
## iris$Species server1 server2
## setosa 0.3333333 0.3333333
## versicolor 0.3333333 0.3333333
## virginica 0.3333333 0.3333333
## NA 0.0000000 0.0000000
##
## $output.list$TABLE_rvar.by.study_counts
## study
## iris$Species server1 server2
## setosa 50 50
## versicolor 50 50
## virginica 50 50
## NA 0 0
##
## $output.list$TABLES.COMBINED_all.sources_proportions
## iris$Species
## setosa versicolor virginica NA
## 0.333 0.333 0.333 0.000
##
## $output.list$TABLES.COMBINED_all.sources_counts
## iris$Species
## setosa versicolor virginica NA
## 100 100 100 0
##
##
## $validity.message
## [1] "Data in all studies were valid"
The function ds.summary
is analogous to base summary()
and returns concise summary statistics based on the variable type:
ds.summary("iris$Sepal.Length")
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("Sepal.Length", iris)) [--------------------------------------] 0% / 0s Getting aggregate server1 (exists("Sepal.Length", iris)) [=========>-------------------] 33% / 0s Checking server2 (exists("Sepal.Length", iris)) [============>-------------------------] 33% / 0s Getting aggregate server2 (exists("Sepal.Length", iris)) [==================>----------] 67% / 0s Aggregated (exists("Sepal.Length", iris)) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Aggregating server1 (classDS("iris$Sepal.Length")) [===========>-----------------------] 33% / 0s Aggregating server2 (classDS("iris$Sepal.Length")) [======================>------------] 67% / 0s Checking server1 (classDS("iris$Sepal.Length")) [--------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Sepal.Length")) [=========>-------------------] 33% / 0s Checking server2 (classDS("iris$Sepal.Length")) [============>-------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Sepal.Length")) [==================>----------] 67% / 0s Aggregated (classDS("iris$Sepal.Length")) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (isValidDS(iris$Sepal.Length)) [--------------------------------------] 0% / 0s Getting aggregate server1 (isValidDS(iris$Sepal.Length)) [=============>---------------] 50% / 0s Aggregated (isValidDS(iris$Sepal.Length)) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (lengthDS("iris$Sepal.Length")) [-------------------------------------] 0% / 0s Getting aggregate server1 (lengthDS("iris$Sepal.Length")) [=============>--------------] 50% / 0s Aggregated (lengthDS("iris$Sepal.Length")) [===========================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server1 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (quantileMeanDS(iris$Sepal.Length)) [=======================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (isValidDS(iris$Sepal.Length)) [--------------------------------------] 0% / 0s Getting aggregate server2 (isValidDS(iris$Sepal.Length)) [=============>---------------] 50% / 0s Aggregated (isValidDS(iris$Sepal.Length)) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (lengthDS("iris$Sepal.Length")) [-------------------------------------] 0% / 0s Getting aggregate server2 (lengthDS("iris$Sepal.Length")) [=============>--------------] 50% / 0s Aggregated (lengthDS("iris$Sepal.Length")) [===========================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server2 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (quantileMeanDS(iris$Sepal.Length)) [=======================================] 100% / 0s
## $server1
## $server1$class
## [1] "numeric"
##
## $server1$length
## [1] 150
##
## $server1$`quantiles & mean`
## 5% 10% 25% 50% 75% 90% 95% Mean
## 4.600000 4.800000 5.100000 5.800000 6.400000 6.900000 7.255000 5.843333
##
##
## $server2
## $server2$class
## [1] "numeric"
##
## $server2$length
## [1] 150
##
## $server2$`quantiles & mean`
## 5% 10% 25% 50% 75% 90% 95% Mean
## 4.600000 4.800000 5.100000 5.800000 6.400000 6.900000 7.255000 5.843333
ds.summary("iris$Species")
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("Species", iris)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (exists("Species", iris)) [==========>-----------------------] 33% / 0s Checking server2 (exists("Species", iris)) [=============>-----------------------------] 33% / 0s Getting aggregate server2 (exists("Species", iris)) [======================>-----------] 67% / 0s Aggregated (exists("Species", iris)) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Aggregating server1 (classDS("iris$Species")) [============>---------------------------] 33% / 0s Aggregating server2 (classDS("iris$Species")) [==========================>-------------] 67% / 0s Checking server1 (classDS("iris$Species")) [-------------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Species")) [==========>-----------------------] 33% / 0s Checking server2 (classDS("iris$Species")) [=============>-----------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Species")) [======================>-----------] 67% / 0s Aggregated (classDS("iris$Species")) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (isValidDS(iris$Species)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (isValidDS(iris$Species)) [================>-----------------] 50% / 0s Aggregated (isValidDS(iris$Species)) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (lengthDS("iris$Species")) [------------------------------------------] 0% / 0s Getting aggregate server1 (lengthDS("iris$Species")) [===============>-----------------] 50% / 0s Aggregated (lengthDS("iris$Species")) [================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server1 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (table1DDS(iris$Species)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (table1DDS(iris$Species)) [================>-----------------] 50% / 0s Aggregated (table1DDS(iris$Species)) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (isValidDS(iris$Species)) [-------------------------------------------] 0% / 0s Getting aggregate server2 (isValidDS(iris$Species)) [================>-----------------] 50% / 0s Aggregated (isValidDS(iris$Species)) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (lengthDS("iris$Species")) [------------------------------------------] 0% / 0s Getting aggregate server2 (lengthDS("iris$Species")) [===============>-----------------] 50% / 0s Aggregated (lengthDS("iris$Species")) [================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server2 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (table1DDS(iris$Species)) [-------------------------------------------] 0% / 0s Getting aggregate server2 (table1DDS(iris$Species)) [================>-----------------] 50% / 0s Aggregated (table1DDS(iris$Species)) [=================================================] 100% / 0s
## $server1
## $server1$class
## [1] "factor"
##
## $server1$length
## [1] 150
##
## $server1$categories
## [1] "setosa" "versicolor" "virginica"
##
## $server1$`count of 'setosa'`
## [1] 50
##
## $server1$`count of 'versicolor'`
## [1] 50
##
## $server1$`count of 'virginica'`
## [1] 50
##
##
## $server2
## $server2$class
## [1] "factor"
##
## $server2$length
## [1] 150
##
## $server2$categories
## [1] "setosa" "versicolor" "virginica"
##
## $server2$`count of 'setosa'`
## [1] 50
##
## $server2$`count of 'versicolor'`
## [1] 50
##
## $server2$`count of 'virginica'`
## [1] 50
Whilst core DataSHIELD functions return all the information you need, sometimes many lines of code are required and the output can be messy. To help with this, the dsHelper
package allows you to do common operations in a more streamlined way and return neater results.
For example, using dsHelper
you can summarise multiple variables within a dataframe:
dh.getStats(
df = "iris",
vars = c("Sepal.Length", "Sepal.Width", "Species")
)
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("iris")) [----------------------------------------------------] 0% / 0s Getting aggregate server1 (exists("iris")) [=============>-----------------------------] 33% / 0s Checking server2 (exists("iris")) [================>-----------------------------------] 33% / 0s Getting aggregate server2 (exists("iris")) [============================>--------------] 67% / 0s Aggregated (exists("iris")) [==========================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Sepal.Length")) [--------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Sepal.Length")) [=========>-------------------] 33% / 0s Checking server2 (classDS("iris$Sepal.Length")) [============>-------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Sepal.Length")) [==================>----------] 67% / 0s Aggregated (classDS("iris$Sepal.Length")) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Sepal.Width")) [---------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Sepal.Width")) [=========>--------------------] 33% / 0s Checking server2 (classDS("iris$Sepal.Width")) [============>--------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Sepal.Width")) [===================>----------] 67% / 0s Aggregated (classDS("iris$Sepal.Width")) [=============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Species")) [-------------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Species")) [==========>-----------------------] 33% / 0s Checking server2 (classDS("iris$Species")) [=============>-----------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Species")) [======================>-----------] 67% / 0s Aggregated (classDS("iris$Species")) [=================================================] 100% / 0s
## Warning: Automatic coercion from integer to character was deprecated in purrr 1.0.0.
## ℹ Please use an explicit call to `as.character()` within `map_chr()` instead.
## ℹ The deprecated feature was likely used in the base package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server1 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server2 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (dimDS("iris")) [-----------------------------------------------------] 0% / 0s Getting aggregate server1 (dimDS("iris")) [==============>-----------------------------] 33% / 0s Checking server2 (dimDS("iris")) [=================>-----------------------------------] 33% / 0s Getting aggregate server2 (dimDS("iris")) [============================>---------------] 67% / 0s Aggregated (dimDS("iris")) [===========================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server1 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server2 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (quantileMeanDS(iris$Sepal.Width)) [----------------------------------] 0% / 0s Getting aggregate server1 (quantileMeanDS(iris$Sepal.Width)) [===========>-------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (quantileMeanDS(iris$Sepal.Width)) [----------------------------------] 0% / 0s Getting aggregate server2 (quantileMeanDS(iris$Sepal.Width)) [===========>-------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (varDS(iris$Sepal.Length)) [------------------------------------------] 0% / 0s Getting aggregate server1 (varDS(iris$Sepal.Length)) [===============>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (varDS(iris$Sepal.Length)) [------------------------------------------] 0% / 0s Getting aggregate server2 (varDS(iris$Sepal.Length)) [===============>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (varDS(iris$Sepal.Width)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (varDS(iris$Sepal.Width)) [================>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (varDS(iris$Sepal.Width)) [-------------------------------------------] 0% / 0s Getting aggregate server2 (varDS(iris$Sepal.Width)) [================>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## $categorical
## # A tibble: 12 × 10
## variable cohort category value cohort_n valid_n missing_n perc_valid perc_missing perc_total
## <chr> <chr> <fct> <int> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Species combined setosa 100 300 300 0 33.3 0 33.3
## 2 Species combined versicolor 100 300 300 0 33.3 0 33.3
## 3 Species combined virginica 100 300 300 0 33.3 0 33.3
## 4 Species combined <NA> 0 300 NA NA NA NA 0
## 5 Species server1 setosa 50 150 150 0 33.3 0 33.3
## 6 Species server1 versicolor 50 150 150 0 33.3 0 33.3
## 7 Species server1 virginica 50 150 150 0 33.3 0 33.3
## 8 Species server1 <NA> 0 150 NA NA NA NA 0
## 9 Species server2 setosa 50 150 150 0 33.3 0 33.3
## 10 Species server2 versicolor 50 150 150 0 33.3 0 33.3
## 11 Species server2 virginica 50 150 150 0 33.3 0 33.3
## 12 Species server2 <NA> 0 150 NA NA NA NA 0
##
## $continuous
## # A tibble: 6 × 15
## variable cohort mean std.dev perc_5 perc_10 perc_25 perc_50 perc_75 perc_90 perc_95 valid_n cohort_n missing_n
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sepal.Length server1 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 150 150 0
## 2 Sepal.Length server2 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 150 150 0
## 3 Sepal.Width server1 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 150 150 0
## 4 Sepal.Width server2 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 150 150 0
## 5 Sepal.Length combin… 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 300 300 0
## 6 Sepal.Width combin… 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 300 300 0
## # ℹ 1 more variable: missing_perc <dbl>
This returns a list of two tibbles, separated into continuous and categorical information.
For categorical variables, information is returned on counts, percentages, and missingness within each category.
For continuous variables, information is returned on mean, standard deviation, quantiles, and missingness.
(Section content not provided. Placeholder.)
An important fact to note in DataSHIELD is that these results can be assigned to an object within a local R session. This is because such results do not disclose individual-level data. By saving these results to local R objects, we can reuse them (e.g., to make tables and graphs for publications).
my_stats <- dh.getStats(
df = "iris",
vars = c("Sepal.Length", "Sepal.Width", "Species")
)
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (exists("iris")) [----------------------------------------------------] 0% / 0s Getting aggregate server1 (exists("iris")) [=============>-----------------------------] 33% / 0s Checking server2 (exists("iris")) [================>-----------------------------------] 33% / 0s Getting aggregate server2 (exists("iris")) [============================>--------------] 67% / 0s Aggregated (exists("iris")) [==========================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Sepal.Length")) [--------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Sepal.Length")) [=========>-------------------] 33% / 0s Checking server2 (classDS("iris$Sepal.Length")) [============>-------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Sepal.Length")) [==================>----------] 67% / 0s Aggregated (classDS("iris$Sepal.Length")) [============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Sepal.Width")) [---------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Sepal.Width")) [=========>--------------------] 33% / 0s Checking server2 (classDS("iris$Sepal.Width")) [============>--------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Sepal.Width")) [===================>----------] 67% / 0s Aggregated (classDS("iris$Sepal.Width")) [=============================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (classDS("iris$Species")) [-------------------------------------------] 0% / 0s Getting aggregate server1 (classDS("iris$Species")) [==========>-----------------------] 33% / 0s Checking server2 (classDS("iris$Species")) [=============>-----------------------------] 33% / 0s Getting aggregate server2 (classDS("iris$Species")) [======================>-----------] 67% / 0s Aggregated (classDS("iris$Species")) [=================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server1 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (levelsDS(iris$Species)) [--------------------------------------------] 0% / 0s Getting aggregate server2 (levelsDS(iris$Species)) [=================>-----------------] 50% / 0s Aggregated (levelsDS(iris$Species)) [==================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (dimDS("iris")) [-----------------------------------------------------] 0% / 0s Getting aggregate server1 (dimDS("iris")) [==============>-----------------------------] 33% / 0s Checking server2 (dimDS("iris")) [=================>-----------------------------------] 33% / 0s Getting aggregate server2 (dimDS("iris")) [============================>---------------] 67% / 0s Aggregated (dimDS("iris")) [===========================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server1 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [----] 0% / 0s Getting aggregate server2 (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [] ... Aggregated (tableDS(rvar.transmit = "iris$Species", cvar.transmit = NULL, ) [==========] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server1 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (quantileMeanDS(iris$Sepal.Length)) [---------------------------------] 0% / 0s Getting aggregate server2 (quantileMeanDS(iris$Sepal.Length)) [===========>------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (quantileMeanDS(iris$Sepal.Width)) [----------------------------------] 0% / 0s Getting aggregate server1 (quantileMeanDS(iris$Sepal.Width)) [===========>-------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (quantileMeanDS(iris$Sepal.Width)) [----------------------------------] 0% / 0s Getting aggregate server2 (quantileMeanDS(iris$Sepal.Width)) [===========>-------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (varDS(iris$Sepal.Length)) [------------------------------------------] 0% / 0s Getting aggregate server1 (varDS(iris$Sepal.Length)) [===============>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (varDS(iris$Sepal.Length)) [------------------------------------------] 0% / 0s Getting aggregate server2 (varDS(iris$Sepal.Length)) [===============>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server1 (varDS(iris$Sepal.Width)) [-------------------------------------------] 0% / 0s Getting aggregate server1 (varDS(iris$Sepal.Width)) [================>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
## [-------------------------------------------------------------------------------------] 0% / 0s Checking server2 (varDS(iris$Sepal.Width)) [-------------------------------------------] 0% / 0s Getting aggregate server2 (varDS(iris$Sepal.Width)) [================>-----------------] 50% / 0s Aggregated (...) [=====================================================================] 100% / 0s
my_stats
## $categorical
## # A tibble: 12 × 10
## variable cohort category value cohort_n valid_n missing_n perc_valid perc_missing perc_total
## <chr> <chr> <fct> <int> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Species combined setosa 100 300 300 0 33.3 0 33.3
## 2 Species combined versicolor 100 300 300 0 33.3 0 33.3
## 3 Species combined virginica 100 300 300 0 33.3 0 33.3
## 4 Species combined <NA> 0 300 NA NA NA NA 0
## 5 Species server1 setosa 50 150 150 0 33.3 0 33.3
## 6 Species server1 versicolor 50 150 150 0 33.3 0 33.3
## 7 Species server1 virginica 50 150 150 0 33.3 0 33.3
## 8 Species server1 <NA> 0 150 NA NA NA NA 0
## 9 Species server2 setosa 50 150 150 0 33.3 0 33.3
## 10 Species server2 versicolor 50 150 150 0 33.3 0 33.3
## 11 Species server2 virginica 50 150 150 0 33.3 0 33.3
## 12 Species server2 <NA> 0 150 NA NA NA NA 0
##
## $continuous
## # A tibble: 6 × 15
## variable cohort mean std.dev perc_5 perc_10 perc_25 perc_50 perc_75 perc_90 perc_95 valid_n cohort_n missing_n
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sepal.Length server1 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 150 150 0
## 2 Sepal.Length server2 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 150 150 0
## 3 Sepal.Width server1 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 150 150 0
## 4 Sepal.Width server2 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 150 150 0
## 5 Sepal.Length combin… 5.84 0.83 4.6 4.8 5.1 5.8 6.4 6.9 7.25 300 300 0
## 6 Sepal.Width combin… 3.06 0.44 2.34 2.5 2.8 3 3.3 3.61 3.8 300 300 0
## # ℹ 1 more variable: missing_perc <dbl>