Here we demonstrate how to write tests for the server-side package. We suggest you employ two general principles:
Here is the complete test file for the server-side function dsFunLevels
:
library(testthat)
options(
nfilter.tab = 3,
nfilter.subset = 3,
nfilter.glm = 0.33,
nfilter.string = 80,
nfilter.stringShort = 20,
nfilter.kNN = 3,
nfilter.levels.density = 0.33,
nfilter.levels.max = 40,
nfilter.noise = 0.25,
nfilter.privacy.old = 5)
x <- factor(
c(
rep("apple", 30),
rep("banana", 40),
rep("cherry", 50)
)
)
test_that("funLevelsDS returns expected levels with message", {
options(datashield.privacyControlLevel = "banana")
expect_equal(
funLevelsDS("x", "Here are the levels"),
"Here are the levels: apple, banana, cherry"
)
})
test_that("funLevelsDS blocks the function if privacy level is not banana or permissive", {
options(datashield.privacyControlLevel = "non-permissive")
expect_error(
funLevelsDS("x", "Here are the levels"),
"BLOCKED: The server is running in 'non-permissive' mode which has caused this method to be blocked"
)
options(datashield.privacyControlLevel = "banana")
})
test_that(".getDensitySetting returns the correct disclosure setting", {
expect_equal(
.getDensitySetting(),
0.33)
})
test_that(".calculateThreshold calculates correct threshold", {
input <- c(1, 2, 3, 4, 5)
expect_equal(.calculateThreshold(input, 0.2), 1)
})
test_that(".throwErrorIfRisk throws error when disclosure risk is too high", {
input <- c("A", "B", "C", "D")
levels_out <- c("A", "B", "C", "D")
threshold <- 2
expect_error(
.throwErrorIfRisk(input, levels_out, threshold),
"The levels cannot be returned due to a disclosure risk"
)
})
test_that(".throwErrorIfRisk does not throw error when risk is low", {
input <- c("A", "A", "B")
levels_out <- c("A", "B")
threshold <- 2
expect_silent(
.throwErrorIfRisk(input, levels_out, threshold)
)
})
test_that(".checkLevelsDisclosureRisk throws error when threshold exceeded", {
input <- c("A", "A", "B", "B", "C", "D")
levels_out <- c("A", "B", "C", "D")
expect_error(
.checkLevelsDisclosureRisk(input, levels_out),
"The levels cannot be returned due to a disclosure risk"
)
})
test_that(".checkLevelsDisclosureRisk does not throw error when within threshold", {
input <- c("A", "A", "A", "A", "B", "B", "B", "B", "B")
levels_out <- c("A", "B")
expect_silent(
.checkLevelsDisclosureRisk(input, levels_out)
)
})
Now lets break down each part of this file to understanding what we are testing and how:
library(testthat)
options(
nfilter.tab = 3,
nfilter.subset = 3,
nfilter.glm = 0.33,
nfilter.string = 80,
nfilter.stringShort = 20,
nfilter.kNN = 3,
nfilter.levels.density = 0.33,
nfilter.levels.max = 40,
nfilter.noise = 0.25,
nfilter.privacy.old = 5)
The option nfilter.levels.density
will be used in some of these tests.
x <- factor(
c(
rep("apple", 30),
rep("banana", 40),
rep("cherry", 50)
)
)
This will be used in the happy flow. We would expect this input factor to not result in an error because the number of levels is far fewer than the number of rows.
test_that("funLevelsDS returns expected levels with message", {
options(datashield.privacyControlLevel = "banana")
expect_equal(
funLevelsDS("x", "Here are the levels"),
"Here are the levels: apple, banana, cherry"
)
})
With datashield.privacyControlLevel
set to 'banana' and nfilter.levels.density
set to 0.3, we test that this function passes and returns the correct message.
test_that("funLevelsDS blocks the function if privacy level is not banana or permissive", {
options(datashield.privacyControlLevel = "non-permissive")
expect_error(
funLevelsDS("x", "Here are the levels"),
"BLOCKED: The server is running in 'non-permissive' mode which has caused this method to be blocked"
)
options(datashield.privacyControlLevel = "banana")
})
Here we set the privacy contol level to 'non-permissive', and check that the function is correctly blocked. We then reset the privacy level to 'banana' to continue the tests.
We have now tested the happy and sad path for the main function. We also want to test that the additional functions we wrote which we call within the main function are working correctly.
test_that(".getDensitySetting returns the correct disclosure setting", {
expect_equal(
.getDensitySetting(),
0.33)
})
This is a simple test to check that the value for nfilter.levels.density
we specified in options
is correctly returned.
test_that(".calculateThreshold calculates correct threshold", {
input <- c(1, 2, 3, 4, 5)
expect_equal(.calculateThreshold(input, 0.2), 1)
})
This is another simple test to check that the correct threshold is being returned. Given an input vector length 5, and a threshold of 0.2, we expect that the maximum number of levels permitted is 1.
test_that(".throwErrorIfRisk does not throw error when risk is low", {
input <- c("A", "A", "A", "B", "B", "B", "B", "B", "B")
levels_out <- c("A", "B")
threshold <- 3
expect_silent(
.throwErrorIfRisk(input, levels_out, threshold)
)
})
Here we test that .throwErrorIfRisk
does not throw an error if the number of levels is lower than the threshold value.
test_that(".throwErrorIfRisk throws error when disclosure risk is too high", {
input <- c("A", "B", "C", "D")
levels_out <- c("A", "B", "C", "D")
threshold <- 2
expect_error(
.throwErrorIfRisk(input, levels_out, threshold),
"The levels cannot be returned due to a disclosure risk"
)
})
We also test that .throwErrorIfRisk
does return the correct error if the number of levels is higher than the threshold value.
test_that(".checkLevelsDisclosureRisk does not throw error when within threshold", {
input <- c("A", "A", "A", "A", "B", "B", "B", "B", "B")
levels_out <- c("A", "B")
expect_silent(
.checkLevelsDisclosureRisk(input, levels_out)
)
})
Here we check that the over function checking disclosure does not throw an error if the ratio of input length to number of levels is sufficiently high.
test_that(".checkLevelsDisclosureRisk throws error when threshold exceeded", {
input <- c("A", "A", "B", "B", "C", "D")
levels_out <- c("A", "B", "C", "D")
expect_error(
.checkLevelsDisclosureRisk(input, levels_out),
"The levels cannot be returned due to a disclosure risk"
)
})
Finally, we check that out disclosure check function does throw the expected error when the number of levels to return is too high.
This may seem like a lot of tests! It's true, we have tested many eventualities and you could argue that all of these tests are not necessary. However, writing robust tests like this will probably pay off in the long run. It is especially important within DataSHIELD, as it is vital to ensure that your function is not disclosive. Furthermore, generative AI models such as chatGPT are extremely effective at writing unit tests. All the tests in this page were written in under an hour with the assistance of chatGPT.