Below we have written an example aggregate-type server-side function. It
receives a call from the corresponding client-side function, performs a disclosure check, and returns the levels of the input factor variable prefixed with a message. Again, this page first shows the entire code and then breaks down each part with an explanation.
#' @title Returns the levels of a factor vector with a fun message
#' @param x a factor vector
#' @param fun_message a fun message
#' @return a list, the factor levels present in the vector
#' @importFrom dsBase checkPermissivePrivacyControlLevel
#' @author Tim Cadman
#' @export
funLevelsDS <- function(x, fun_message){
checkPermissivePrivacyControlLevel(c('permissive', 'banana')) ## Check privacy mode setting
data <- eval(parse(text=x), envir = parent.frame()) ## Load object withing function
levels_out <- levels(data) ## Get levels
.checkLevelsDisclosureRisk(data, levels_out) ## Check disclosure issues
return(
paste(
fun_message, ": ", paste(levels_out, collapse = ", ") ## Paste message to levels
)
)
}
#' Check Disclosure Risk for Levels
#'
#' This function checks whether the levels of a variable can be safely returned
#' based on the disclosure risk settings.
#'
#' @param input A vector of values from which levels are derived.
#' @param levels_out A vector containing the levels to be checked.
#'
#' @return No return value; throws an error if disclosure risk is too high.
#' @keywords internal
#' @importFrom dsBase listDisclosureSettingsDS
.checkLevelsDisclosureRisk <- function(input, levels_out) {
levels_density <- .getDensitySetting()
threshold <- .calculateThreshold(input, levels_density)
.throwErrorIfRisk(input, levels_out, threshold)
}
#' Throw an Error if Disclosure Risk is Too High
#'
#' Internal function that throws an error if the number of levels exceeds the
#' threshold set by disclosure risk settings.
#'
#' @param input A vector of values.
#' @param levels_out A vector of levels being checked.
#' @param threshold A numeric value representing the maximum allowed levels.
#'
#' @return No return value; throws an error if the disclosure risk is exceeded.
#' @keywords internal
#' @importFrom cli cli_abort
.throwErrorIfRisk <- function(input, levels_out, threshold) {
if (threshold < length(levels_out)) {
cli_abort(
c(
"x" = "The levels cannot be returned due to a disclosure risk",
"i" = "The length of the variable is {length(input)} and the number of levels is {length(levels_out)}",
"i" = "Based on current disclosure settings the maximum number of levels that can be returned is {threshold}",
call = NULL
)
)
}
}
#' Calculate Disclosure Risk Threshold
#'
#' This function calculates the threshold for the number of allowed levels
#' based on the input length and density setting.
#'
#' @param input A vector of values.
#' @param levels_density A numeric value representing the density setting.
#'
#' @return A numeric threshold for the maximum number of allowed levels.
#' @keywords internal
.calculateThreshold <- function(input, levels_density) {
input_length <- length(input)
return(input_length * levels_density)
}
#' Get Disclosure Density Setting
#'
#' Retrieves the `nfilter.levels.density` setting from DataSHIELD's disclosure settings.
#'
#' @return A numeric value representing the density setting.
#' @keywords internal
#' @importFrom dsBase listDisclosureSettingsDS
.getDensitySetting <- function() {
thr <- dsBase::listDisclosureSettingsDS()
return(as.numeric(thr$nfilter.levels.density))
}
As described in the client-side information, here we provide the roxygen comments describing function meta-data. Note: The serverside function does not include a datasources
argument, but it should include all other parameters within the client-side function.
#' @title Returns the levels of a factor vector with a fun message
#' @param x a factor vector
#' @param fun_message a fun message
#' @return a list, the factor levels present in the vector
#' @importFrom dsBase checkPermissivePrivacyControlLevel
#' @author Tim Cadman
#' @export
funLevelsDS <- function(x, fun_message){
The DataSHIELD naming convention is for the server-side function to be named myFunctionDS
in camelCase.
checkPermissivePrivacyControlLevel(c('permissive', 'banana'))
DataSHIELD allows the data owner to set a privacy control level. Certain privacy control levels block the use of certain functions which could potentially be used in an attack to disclose individual level data. All DataSHIELD server-side functions should include this line at the start of the function unless they are permitted in all privacy modes. The strings passed to checkPermissivePrivacyControlLevel
describe all the modes for which the current function is permitted. If the server is running in a mode other than these, the function will be blocked.
data <- eval(parse(text=x), envir = parent.frame())
The input to the client-side function describing the factor variable was a string. In order to perform operations on the data object represented by that string, this data is retrieved within the server-side function.
levels_out <- levels(data) ## Get levels
This part contains the key operation of the server-side function, to retrieve the factor levels of the input variable. This should be replaced by whatever functionality you want to implement.
.checkLevelsDisclosureRisk(data, levels_out) ## Check disclosure
Here we call a function defined after the main function, which checks that the number of levels being returned is not be disclosive. Even though the previous step fetched the factor levels, if this check fails these levels will not be returned to the user.
return(
paste(
fun_message, ": ", paste(levels_out, collapse = ", ") ## Paste message to levels
)
)
Finally, we return the value to client-side function. In this trivial example, we paste the 'fun message' passed from the client-side to the factor levels we have retrieved.
#' Check Disclosure Risk for Levels
#'
#' This function checks whether the levels of a variable can be safely returned
#' based on the disclosure risk settings.
#'
#' @param input A vector of values from which levels are derived.
#' @param levels_out A vector containing the levels to be checked.
#'
#' @return No return value; throws an error if disclosure risk is too high.
#' @keywords internal
#' @importFrom dsBase listDisclosureSettingsDS
#' @noRd
.checkLevelsDisclosureRisk <- function(input, levels_out) {
levels_density <- .getDensitySetting()
threshold <- .calculateThreshold(input, levels_density)
.throwErrorIfRisk(input, levels_out, threshold)
}
This function performs three steps:
#' Get Disclosure Density Setting
#'
#' Retrieves the `nfilter.levels.density` setting from DataSHIELD's disclosure settings.
#'
#' @return A numeric value representing the density setting.
#' @keywords internal
#' @importFrom dsBase listDisclosureSettingsDS
.getDensitySetting <- function() {
thr <- dsBase::listDisclosureSettingsDS()
return(as.numeric(thr$nfilter.levels.density))
}
This function calls the dsBase function listDisclosureSettingsDS
and takes the value nfilter.levels.density
.
#' Calculate Disclosure Risk Threshold
#'
#' This function calculates the threshold for the number of allowed levels
#' based on the input length and density setting.
#'
#' @param input A vector of values.
#' @param levels_density A numeric value representing the density setting.
#'
#' @return A numeric threshold for the maximum number of allowed levels.
#' @keywords internal
#' @noRd
.calculateThreshold <- function(input, levels_density) {
input_length <- length(input)
return(input_length * levels_density)
}
This function takes the input factor and the density value obtained in the previous function and calculates the maximum number of levels that can be safely returned.
#' Throw an Error if Disclosure Risk is Too High
#'
#' Internal function that throws an error if the number of levels exceeds the
#' threshold set by disclosure risk settings.
#'
#' @param input A vector of values.
#' @param levels_out A vector of levels being checked.
#' @param threshold A numeric value representing the maximum allowed levels.
#'
#' @return No return value; throws an error if the disclosure risk is exceeded.
#' @keywords internal
#' @importFrom cli cli_abort
#' @noRd
.throwErrorIfRisk <- function(input, levels_out, threshold) {
if (threshold < length(levels_out)) {
cli_abort(
c(
"x" = "The levels cannot be returned due to a disclosure risk",
"i" = "The length of the variable is {length(input)} and the number of levels is {length(levels_out)}",
"i" = "Based on current disclosure settings the maximum number of levels that can be returned is {threshold}",
call = NULL
)
)
}
}
Finally, evaluate whether this threshold is exceeded and if so throw an error using CLI formatting. Again, note that whilst this server-side function could have been written as one code block, refactoring it into smaller chunks helps better following what is happening, and makes it easier to test and debug.
Note that within this function we perform disclosure checks based on the density level values. We have a separate section of the wiki discussing which disclosure checks you should consider adding.