Mutation Testing
.What Is Mutation Testing?
Pitest defines it as:
Mutation testing is conceptually quite simple. Faults (or mutations) are automatically seeded into your code, then your tests are run. If your tests fail, then the mutation is killed, if your tests pass, then the mutation lived. The quality of your tests can be gauged from the percentage of mutations killed.
Mutation testing isn't frequently discussed in Spring Boot circles. Some reasons for its limited popularity are:
- Performance concerns: Mutation testing is computationally expensive, especially for large Spring codebases.
- Complexity vs. value perception: Many teams question whether the additional insights justify the setup complexity and runtime costs.
- CI/CD impact: The long execution time can disrupt fast feedback loops in CI/CD pipelines.
There is slowly increasing interest, particularly among teams with mature testing practices, but be prepared for longer build times.
Why Mutation Testing Doesn't Make Sense In Groovy Projects
Groovy’s dynamic typing, runtime method dispatch, operator overloading, and heavy use of AST transformations (@Canonical, @Builder, @Slf4j, Spock internals, etc.) mean that PIT mutates generated bytecode that often does not resemble the source code you wrote. As a result, many mutants are either meaningless, unreachable, or survive for reasons unrelated to test quality.
Mutation testing is a great fit for Java and Kotlin, but in Groovy projects produces noise instead of insight.
PIT Mutation Testing
- Java
- Kotlin
Let's set up Gradle plugin for PIT Mutation Testing
- Java
- Kotlin
Run the pitest task. When finished you will get a HTML report at build/reports/pitest/index.html.
- Java
- Kotlin
Project Summary
| Number of Classes | Line Coverage | Mutation Coverage | Test Strength |
|---|---|---|---|
| 6 | 95% 87/91 | 84% 21/25 | 87% 21/24 |
Breakdown by Package
| Name | Number of Classes | Line Coverage | Mutation Coverage | Test Strength |
|---|---|---|---|---|
| dev.pollito.spring_java.config.advice | 1 | 94% 18/19 | 100% 4/4 | 100% 4/4 |
| dev.pollito.spring_java.config.log | 3 | 94% 51/54 | 78% 15/19 | 83% 15/18 |
| dev.pollito.spring_java.sakila.film.adapter.in.rest | 1 | 100% 8/8 | 100% 1/1 | 100% 1/1 |
| dev.pollito.spring_java.sakila.film.domain.port.in | 1 | 100% 10/10 | 100% 1/1 | 100% 1/1 |
Project Summary
| Number of Classes | Line Coverage | Mutation Coverage | Test Strength |
|---|---|---|---|
| 4 | 96% 80/83 | 92% 26/28 | 96% 26/27 |
Breakdown by Package
| Name | Number of Classes | Line Coverage | Mutation Coverage | Test Strength |
|---|---|---|---|---|
| dev.pollito.spring_kotlin.config.advice | 1 | 95% 19/20 | 100% 1/1 | 100% 1/1 |
| dev.pollito.spring_kotlin.config.log | 3 | 96% 61/63 | 92% 25/27 | 96% 25/26 |
FindByIdPortInImpl and FilmController contain no "logic" in the eyes of the default Pitest mutators.
- No Conditionals: There are no
if,when, or loops. - No Math: There are no
+,-,*,/. - No Void Calls: Calling a constructor (which returns a value), not a void method.
- Constructor Arguments: The default Pitest mutators do not change hardcoded strings or numbers passed as arguments to constructors.
- Due to Kotlin's strict null-safety, Pitest skips generating a null mutant for simple non-nullable return types.
The classes are missing in the report because they are too simple.
What Each Metric Means
| Metric | What It Measures | Why It Matters | Good Target |
|---|---|---|---|
| Line Coverage | % of code lines executed during tests | Easy to achieve but misleading - high numbers don't mean good tests | 80%+ (industry standard) |
| Mutation Coverage | % of mutations killed out of all created | The real deal - shows how many bugs your tests would catch | ~70%+ (indicates solid tests) |
| Test Strength | Killed Mutations / Covered Mutations | Effectiveness of tests on code they touch | 80%+ (meaningful assertions) |
Focus on mutation coverage and test strength over line coverage. A 70% mutation coverage is infinitely more valuable than 95% line coverage with weak assertions. If test strength is low but line coverage is high, your tests are executing code without verifying behavior.