Mutation Testing

How good are your unit tests? Are they actually effective at catching mistakes?

Test quality has been a long sought after metric. Many tools offer code coverage as a solution, although as most developers would know, like a mathematical divergence test, bad coverage is bad, but good coverage could really just go either way.

This year’s Devconf featured a particularly interesting talk by 17 year old Felix Wu from Germany. He presented a tooling concept called “Mutation Testing”. Mutation testing provides us with an objective measure of the quality of the tests.

You may be wondering how it achieves this objective measure, and the answer is remarkably simple: It makes small changes to your code and tests to see whether or not your tests fail as a result. Each change is referred to as a mutant and will typically target things like return values, literals, and operators. When a test fails after a mutation, this is a good result and the mutant is said to have been killed off. If the unit test passes, this is a bad result and the mutant is said to have survived. The ratio of surviving and killed mutants as a percentage thus gives us an objective measure of how sensitive our tests are at picking up potential bugs.

For example, lets imagine we had an relational check method:
public bool GreaterThan(int a, int b)
{
return a > b;
}
Mutation testing might effect some of the following mutants:
1: return true;
2: return false;
3: return a <= b; 4: return a >= b;
If our unit tests simply performed a test of say:
Assert.True(GreaterThan(2,1));
It’s then easy to see that mutations 1 and 4 will go undetected. To make the mutation testing achieve a better result, one would need to test multiple scenarios around the boundary condition:
Assert.True(GreaterThan(2,1));
Assert.False(GreaterThan(1,1));

Mutation testing is of course not a new concept, and a scholarly search for the topic on Google yield a well cited paper from 1980, and Wikipedia suggests that it was first proposed in 1971. However, despite this, such testing is not common knowledge and an abundance of tooling has not been available. Issues contributing to this are concerns including the number of program copies required to be useful, and the subsequent execution time required.

So why now? Today’s trends around unit testing and micro services help to mitigate concerns by keeping the code under test small. Code analysis tools such and Roslyn and equivalents for other platforms give developers the ability to make changes to the production code.

If you’re interested in measuring the quality of your unit tests, go checkout Stryker for C# and javascript, or PITest for Java.

Written by Geoffrey Lydall