Unit testing with R or Python

Example of unit testing library
Club Bioinfo
Author

Laurent Modolo

Published

November 5, 2020

cc_by_sa

cc_by_sa

What is unit testing ?

Unit testing is a programming method which consists of writing simple test(s) for each function. Each unit test is going to check compare the results of a function for a set of parameters to the expected results.

Ideally, you want to have tests for each of your functions. The percentage of function tested is referred as code coverage.

Why unit tests? My code is working!

Unit testing makes you code more robust

Robust code:

  • will not break easily upon changes (e.g., new R version, package updates, bug fixes, new features, etc.)
  • can be refactored simply
  • can be extended without breaking the rest
  • can be tested

With a growing codebase, you will need unit tests or you are going to spend a loot of time with the debugger afterward. On the long-run unit tests will save you a loot of time. You can even do test-driven design (TDD), which consist of writing your test first and then stop writing (improving) your function when it passes the test.

Note that a 100% is not necessarily a goal to reach because for some functions don’t need to be covered with unit tests, like functions with edge effect. For example, it’s difficult to write unit tests to check graphical output.

Writing Unit Tests

You don’t want to have to write unit tests for your unit test so, it’s best to trust well-written unit tests libraries like testthat in R or doctest in Python. You can find unit testing library for almost every language.

Unit test with R

library(testthat)
square <- function(x){
  x * x
}

test_that("single number", {
  expect_equal(square(2), 4)
})

Most of the time you are going to write different tests for your function to check different properties.

library(testthat)
square <- function(x){
  x * x
}

test_that("single number", {
  expect_equal(square(2), 4)
  expect_equal(square(3), 4)
  expect_equal(square(-2), 4)
})

test_that("vectors", {
  expect_equal(square(c(2,4)), c(4,16))
})

test_that("test NA", {
  expect_true(is.na(square(NA)))
})

Of course in real life you don’t want to run your tests when you run your code. So you are going to write your test in a separate R files”:

A square_code.R file:

square <- function(x){
  x * x
}

A tests/test_square_code.R file:

source("../square_code.R", chdir = TRUE)
library(testthat)
test_that("single number", {
  expect_equal(square(2), 4)
  expect_equal(square(3), 4)
  expect_equal(square(-2), 4)
})

test_that("vectors", {
  expect_equal(square(c(2,4)), c(4,16))
})

test_that("test NA", {
  expect_true(is.na(square(NA)))
})

You can put all of your test code files (with name starting with test_) in a tests folder and cal

testthat::test_dir('tests')

Unit test with Python

In Python the doctest library allows you to write your test directly in the documentation of function. The advantage is two-fold: you write your unitest and example of your function usage for its documentation.

def square(x):
  """
  function returning x^2
  :param x: number
  :return number
  >>> square(2)
  4
  """
  return x * x
  
if __name__ == "__main__":
  import doctest
  doctest.testmod()

We can also run different test for each function

def square(x):
  """
  function returning x^2
  :param x: number
  :return number
  >>> square(2)
  4
  >>> square(3)
  4
  >>> square(-2)
  4
  >>> import numpy as np
  >>> square(
  ...   np.array([2,4]))
  array([ 4, 16])
  >>> np.isnan(square(float("nan")))
  True
  """
  return x * x
  
if __name__ == "__main__":
  import doctest
  doctest.testmod()

Of course in real life you don’t want to run your tests when you run your code. And you may need the __main__ function to do something other than testing. So like for R you call your test form a separate test file:

import doctest
import square_code

doctest.testmod(square_code)

Property based testing

Unit Tests are great but you may want to cover more cases than what a few examples can give you. Instead of checking example, you may want to check for property. When you do property based test, you are going to generate a range of examples and test each of them. If your property fail for a given case, the property-based testing will return the counterexample.

The computational load of property-based testing can be significantly higher than the one of unit testing.

Property based testing With R

In R, you can use the hedgehog library to perform property based testing. hedgehog nicely overloads the testthat packages function.

rev_two_time <- function(x){
  return(rev(rev(x)))
}

library(hedgehog)
test_that( "Reverse of reverse is identity",
  forall(
    gen.c( gen.element(1:100) ),
    function(xs){expect_equal(rev_two_time(xs), xs)}
  )
)

Here we generate 100 vectors and test that reverting a vector twice gives us back the original vector.

You have to read the documentation for a list of the different generator.

Generate uniformly distributed numbers:

dummy_comp <- function(x){
  return(log(x)/x)
}

library(hedgehog)
test_that( "Reverse of reverse is identity",
  forall(
    gen.unif(from=-1, to=100),
    function(xs){expect_false(is.nan(dummy_comp(xs)))}
  )
)

Generate data.frame

gen.df.of <- function(n)
  generate(for (x in
    list( as = gen.c(of = n, gen.element(1:10) )
        , bs = gen.c(of = n, gen.element(10:20) )
        )
    ) as.data.frame(x)
  )

gen.df <-
  generate(for (e in gen.element(1:100)) {
    gen.df.of(e)
  })

test_that( "All data frames are of length 1",
  forall( gen.df, function(x){expect_equal(nrow(x), 1)})
)

Property based testing With Python

In Python, you can use the hypothesis library to perform property based testing. Hypothesis use decorators to specify the generator to generate the examples to run.

from hypothesis import given
import hypothesis.strategies as st


@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x + 1


@given(x=st.integers(), y=st.integers())
def test_ints_cancel(x, y):
    assert (x + y) - y == x


@given(st.lists(st.integers()))
def test_reversing_twice_gives_same_list(xs):
    # This will generate lists of arbitrary length (usually between 0 and
    # 100 elements) whose elements are integers.
    ys = list(xs)
    ys.reverse()
    ys.reverse()
    assert xs == ys
    
if __name__ == "__main__":
    test_ints_are_commutative()
    test_ints_cancel()
    test_reversing_twice_gives_same_list()