Unit Testing Fraud - Why Code Coverage is a Lie
April 19, 2021
(Updated on February 20, 2022)
April 19, 2021
(Updated on February 20, 2022)
Code coverage identifies the percentage of your codebase that was exercised by unit tests. This percentage, which is displayed in a code coverage report, is formed through the branches, statements, functions and lines of code covered by a test suite.
When working on a product delivery team, developers are often required to ensure that the code they write is supported by associated unit/integration/e2e tests. Some teams even go as far as requiring each incoming branch meet a certain code coverage threshold trusting that the high percentage eliminates bugs from the production codebase.
But herein lies the problem; code coverage is a fraud.
The coverage report can be helpful in identifying untested code, but targeting a percentage of coverage does not equal quality-tested code.
Let me say this first: code coverage itself is not bad ( I’ll talk about some of the benefits in a few moments ), but it provides a false sense of security.
From experience in a delivery team where high code coverage is part of the DoD, I’ve seen the pitfalls of such a requirement.
Many developers hate writing unit tests and don’t see the value of spending time writing code to test the code they wrote ( or will write -- TDD for the win ). Testing cannot be done when a developer’s mind is elsewhere -- it takes mental effort. When a coverage threshold is put in place, the mindset of developers shift; they leverage the report to determine if they are testing enough rather than writing tests that evaluate intended functionality and exploit edge cases.
I was on a delivery team with a handful of other developers and found a bug in production. All of the tests passed and the code coverage report said that the component in question had 100% code coverage. When I opened the test file, I saw something like this. Do you see what's wrong?
As a unit test, this doesn't do diddly squat ( time for a git blame ). But from a coverage perspective, this test runs the component code, calls the injected service and, as a result, bumps up the coverage. The 100% code coverage indicator just means that every line of code was run at least once, not that the tests running those lines made valuable test assertions.
The developer gamed the system — they wrote a ‘unit test’, pushed their code to production and ended up with a bug that the product owner got to see on demo day. Nice!
It should be difficult to achieve a very high coverage percentage when tests have as little structure as the one above, but faking it may get some developers past 60%-65% coverage.
Turns out, the developer who wrote the test above said that the end of the iteration was quickly approaching and the feature they committed to delivering couldn’t be merged because the pipelines were failing. It’s not an excuse, but it did shine a light on the inadequacy of the requirement.
Some companies believe that having a high code coverage removes the need for code reviews. I strongly disagree - the faulty test would have been easily spotted if a code review was put in place.
The whole idea of a coverage report is that it can help developers identify what portions of their codebase is covered by unit tests. And visa versa. The first thing I do when rolling onto an existing project is run a coverage report to determine where my focus should be when writing tests. Since coverage reports include the coverage percentage for branches, statements and functions, it's easy to pinpoint which files/components are lacking associated tests.
Some clients feel necessity for high coverage percentage for their product, and that’s okay. As developers and consultants, our job is to help the client understand what they need in an application solution. While a high coverage percentage isn’t the most important when considering all of the ingredients of a successful project, it’s certainly something we can add to the Definition of Done ( DoD ).
If a team is utilizing CI/CD pipelines, a flow can automatically fail if the branch has low code coverage, keeping the production branch more pure.
If a team follows Test Driven Development, coverage percentages should be within 80%-90% at all times automatically. Some TDD enthusiasts believe that 100% coverage should always be attainable — but because of weird quirks of a language or framework, that may not be the case.
I understand my opinion may step on some toes, but let me wrap up this article with a summary:
Share this article
A periodic update about my life, recent blog posts, how-tos, and discoveries.
No spam - unsubscribe at any time!