Introduction
Fixing defects is expensive, mostly because of the time it takes: it takes time for testers to uncover the defect and describe it in enough detail for the programmers to be able to re-create it. It takes time for programmers to determine the causes of the defects, looking through code they have not seen for months. It takes time for everyone to argue whether something is really a defect, to wonder how the programmers could be so stupid, and to demand that the testers leave the programmers alone to do their job. Much of this wasted time could be avoided if the programmers simply tested their own code.
A Unit Test is the name generally given to the test that programmers do.
Some programmers test their code by setting breakpoints at specific lines, running the application in debug mode, stepping through code line by line, and examining the values of certain variables. Strictly speaking, this is unit testing, because a programmer is testing her own code. There are several drawbacks to this kind of testing (it requires a debugging tool, setting breakpoints, knowing expected values…) and is manual.
Test Properties
Kent Beck defines the 12 properties of a test in his article Test Desiderata.
There are different kinds of tests (unit testing, acceptance testing, integration testing, …) and not all properties may be present in all tests. Depending on the type of tests some properties will have more importance than others.
These are the 12 test properties:
- Behavioral — tests should be sensitive to changes in the behavior of the code under test. If the behavior changes, the test result should change.
- Structure-insensitive — tests should not change their result if the structure of the code changes.
- Readable — tests should be comprehensible for reader, invoking the motivation for writing this particular test.
- Writable — tests should be cheap to write relative to the cost of the code being tested.
- Fast — tests should run quickly.
- Deterministic — if nothing changes, the test result shouldn’t change.
- Automated — tests should run without human intervention.
- Isolated — tests should return the same results regardless of the order in which they are run.
- Composable — if tests are isolated, then I can run 1 or 10 or 100 or 1,000,000 and get the same results.
- Specific — if a test fails, the cause of the failure should be obvious.
- Predictive — if the tests all pass, then the code under test should be suitable for production.
- Inspiring — passing the tests should inspire confidence
There is youtube video* for each property, a five minute conversation between Kent Beck and Kelly Sutton explaining the property.
*Video for property number 10. Specific has an errata on the title.
Introduction to TDD
Clean code that works is the goal of Test-driven Development (TDD).
Test-Driven Development (TDD) is a technique for building software that guides development by writing tests. It was developed by Kent Beck in the late 1990’s as part of Extreme Programming.
TDD relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass.
TDD provides you with a suite of tests that are so comprehensive that virtually no bug can escape it, that can be executed in a matter of minutes, that any programmer can run. A suite of test that never gets out of date of the system.
That suite is composed by micro tests.
Microtests
A Unit Test is the name generally given to the test that programmers do. The problem with this term is that is overloaded and overused, and it causes more confusion than it provides clarity. There is no good definition of what is a unit. So, there is little chance that we will agree on what unit testing means.
Microtest is a particular kind of unit test, some times called TDD unit tests. Microtests concept expresses what unit tests should be in modern times, since Agile appeared in software development.
Some of the properties of a microtest are:
- Short, typically under a dozen lines of code.
- Automated.
- Fast, it runs in an extremely short time, milliseconds per test.
- It does not test the object inside the running app, but instead in a purpose-built testing application.
- It invokes only a tiny portion of the code.
- In combination with all other microtests of an app, it serves as a ‘gateway-to-commit’. A developer is encouraged to commit anytime all microtests run green.
- It provides precise feedback on any errors that it encounters.
- It does not connect to external components (database, webserver…), using a variety of slip-and-fake techinques, also known as Test Double.
- It rarely involves construction of more than a few classes of object, usually under five.
Micro tests follow most of the 12 properties described at the beginning of this post. Most important properties in a micro test are: structure-insensitive, writable, fast, deterministic, automated, isolated, composable, specific and predictive.
By default, any mention to test or unit test in this post will be referred to microtest. In case of referring to other types of test it will be specified.
Why TDD
A programmer that is asked to adopt TDD might say that he spends the whole day coding to add value. That if he does TDD, then he has to do automated tests, and this is also coding. This means that he will have to spend more time coding, so that TDD means more work. This is the lump of coding fallacy.
The Lump of Coding Fallacy
The problem with the previous analysis is that a programmer is not just coding the whole day.
The work performed by a programmer is not a single behavior, it is made up by three different ones:
- Programming. This behavior has two parts: coding and designing
- Studying. In order to be able to change the source, code must be understood. Code knowledge is achieved by studying it.
- GAK Activity. GAK stands for Geek At Keyboard. This behavior includes all tasks performed in order to make the code work: inspection, testing (manually) and debugging.
The three behaviors are totally intermingled throughout the day. And that is seen as a big lump. The truth is, they actually take up very different proportions of our programming day.
Programming the computer, the best part of the day, is often the very smallest part. The GAK activity, much of which is just waiting around for things to run, or clicking through screens and typing in data in order to get to the part where you wanted to see something, that is the largest part of the day by quite a bit. And studying, the scanning and the reading, well, it’s somewhere in the middle. Basically, those are the proportions.
The lump of coding fallacy is absolutely right about one thing: automated tests are more code that has to be written.
On the other hand, the lump of coding fallacy is totally wrong about the rest of the picture.
First, study time will go down after TDD. Studying the same amount of code will get faster, because the tests make the study easier (see Benefits of TDD section).
Finally, the GAK time is the big benefit. TDD reduces the amount of time you spend in GAK by 80% or 90% (see Benefits of TDD section).
So, after TDD, behavior proportions have changed and total amount of time needed has decreased. In other words, more code is written (automated tests) and far from losing productivity, it is actually gained it.
You can watch the whole explanation of the Lump of Coding Fallacy by Gee Paw in following video:
Five Underplayed Premises Of TDD
Money
The first premise of TDD is what we call the money premise: We are in this for the money. We make money in software by shipping more value faster. That’s how we do it. That’s where the money is in software.
TDD is the best way we have devised so far to actually do that. TDD is about more value faster.
Judgement
The second premise of TDD is the judgment premise, and it says: We rely every day, all the time, on individual human judgment.
We turn human words into actual running programs. If you’re doing TDD, you’re going to be required to make active, individualized judgments. The judgment premise says we are absolutely, routinely, every day, all the time happily reliant on individual humans using their individual best judgment to guide us through the process.
Correlation
The correlation premise says: Internal quality and productivity are correlated. They go up together and they go down together. They have a direct relationship.
Internal quality is things that you could only tell by studying the code. It is stuff like is the code scannable, readable, the factors that make it easy to change. Is it well-factored, broken into chunks that we can manage and change independently of each other? Is it well-tested? These are the sorts of things that go into making internal quality.
You cannot trade internal quality for productivity. Because, given that the first two factors in your daily output are skills you have and how hard is the domain, the third most important factor is where do I start? And where do I start incorporates all those things we call internal quality.
Chaining
The fourth premise is the premise is: The chaining premise.
Programs are always built out of smaller pieces and mid-sized pieces and then larger pieces. And there’s a chain. We call it a dependency chain.
The way to test a chain is to test each individual link in that chain, assuming that dependencies (rest of links) work. Chain tests are the cheapest tests.
Steering
The steering premise says: Testability helps steer our designs. When we steer the development of our project all the way through, it tests and testability are first class participants in that process.
We are constantly considering questions of how am I going to test this, and how have I tested it so far? All the way through, from the very first line of code we write to the very last line of code that we write.
Why Underplayed?
There is a reason for calling these premises underplayed. It is because when you are outside TDD, those premises are arguable, debatable, wranglable at length.
But inside TDD, they are almost invisible to us. They are the air we breathe. So when you go out there on the internet and you start studying TDD, you know, you’re studying people who have already stood inside those five premises. As a result, they hardly see them anymore. That means they don’t pay a lot of attention to explaining to them.
Accidental Complication
Essential complication depends on how hard the problem to be solved is. Accidental complication appears because we are not so good at our jobs (we take shortcuts, we don’t refactor this time, we cut corners…) as we have to add value.
The cost of a feature is a function of the cost coming from the essential complication (because the problem is hard) and the cost of accidental complication.
For example, as the code rotting spreads, accidental complication grows, so the cost of the feature grows too.
Most of the time, cost of a feature is dominated by the cost from accidental complication. This means that cost of the feature has almost nothing to do with how hard it is, and almost everything to do with how much the design is good or bad.
Refactoring is how you reduce accidental complication.
If you don’t clean the kitchen, then you have to clean the garage:
In the following video J.B. Rainsberger explains impact of accidental complication in software development and how TDD can help to mitigate it… and he does it in 7 minutes and 26 seconds!
How to implement TDD
The three Laws of TDD
- First Law: You are not allowed to write any production code until you have written a unit test that fails due to its absence.
- Second Law: You are not allowed to write more of a unit test than is sufficient to fail (and failing to compile is failing).
- Third Law: You are not allowed to write more production code than is sufficient to cause he currently failing test to pas.
The TDD mantra
Following these three laws will lock you into a cycle that is perhaps 30 seconds long. You will first write a test until that test fails, then you will write production code until that test passes, and you will repeat this cycle until you are done.
But this simple cycle misses one important point of TDD that is the elimination of duplication or refactoring, in order to keep things clean.
In other words, you have to follow the three laws of TDD, but you also have to refactor frequently, and this is the TDD mantra.
Red
It is hard to focus in more than one thing at a time, so the first thing to focus is defining the problem. Write a little test that doesn’t work, and perhaps doesn’t even compile at first. By first making test fail and then pass, you demonstrate that test works.
Green
Solve the problem created in the previous step (test fails). Make the test work quickly, with the minimum steps, committing whatever sins necessary in the process.
«Oh, let’s play golf: Golf is when you try to make a test pass in as few keystrokes as possible»
Uncle Bob
Refactor
Eliminate all of the duplication created in merely getting the test to work. Refactor is never scheduled, is the kind of things you have all the time. Every time that all tests pass, check if code can be refactored. Refactoring includes the tests.
Creative work requires iteration and rework.TDD gives you a chance to learn all of the lessons that the code has to teach you. If you only slap together the first thing you think of, then you never have time to think of a second, better thing.
TDD must be adopted by programmers and become an arbitrary discipline: like a surgeon subscribing for surgery or a pilot following a check list.
Following the TDD mantra creates impossibly tiny loops. Each cycle in the loop is clean, is quick, is easy… and is fun. Programmer is always a few minutes ago from everything goes.
Clean Tests
Tests are as important as production code and you must keep your tests clean.
Who test the tests
TDD works similar to accountants mechanisms: Double Entry Book Keeping, where each transaction is entered twice (as an asset and as liability&equity).
In TDD, every behavior is entered twice: as a test and as production code. There are two streams of code:
- Tests test the production code.
- Production code test the tests.
F.I.R.S.T. Principles
- Fast: Tests are fast.
- Isolated and Independent: Tests don’t depend on other tests, can be run in any order.
- Repeatable: Tests don’t depend on environment variables (network, memory…)
- Self Verifying: Tests have a binary result: Pass/Fail (no interpretation is required).
- Timely: Tests are written first.
The real meaning of FIRST: Tests come first: they are written first, refactored first, have higher priority than production code. Without test, the code must rot.
Rules
Clean tests should follow next rules:
Single Assert rule
Each unit test should have one and only one assert. This rule refers to having only one logical assert, it may contain several physical asserts:
This is sometimes called the Triple-A rule: Every unit test should be broken up into these three parts:
- Arrange: create the data and context for the test. This is usually done (or partially done) in the setup function.
- Act: call the function to be tested.
- Assert: verifies that the function being tested is doing what is supposed to do. It is a logical assertion not a single physical call to assert.
The goal of this rule is to be sure that when you write a test every action is tested independently. The input of one test to cannot be the output of the previous test, tests must be independent and isolated.
Test public methods only
Don’t test protected or private methods. Testing private methods implies a design error: you can’t test through the public interface, therefore private method does more than public can ask.
Test names
The WHEN_THEN convention. Examples:
- unboundSymbol_notReplaced
- nullSymbol_noAction
Coverage
Goal is 100%. Coverage should be measured, plotted and visible. Coverage should not be mandated.
How to start with TDD
TDD doesn’t come in a day. It takes some lessons and some practice. There’s a lot of course material out there. Read a little, and really try the various exercises. Start with some toy code. Then find a small problem in your day job that has few or no dependencies on other classes. Do this two or three times. And again, notice what happens. If you like the result, well, at that point, you’re ready to get serious about TDD.
Write the Test List
Kent Beck describes the technique of writing a test list first in his book Test-Driven Development: By Example. Kent’s technique resembled the more general concept of maintaining a «Task Inbox» as described by David Allen in his seminal work, Getting Things Done.
Have always ready a piece of paper where you can write anything that pops into your head that might distract you from the current task:
«Your brain is for having ideas, not storing them»
David Allen
It is important to have a space to write things down and get them out of your head quickly.
- Write down any example that comes to mind
- Write simpler examples
- Locate the simplest case, called the «kernel»
- Write examples for all the variations, edge cases and cool ideas
Start with degenerate tests
In order to write degenerate tests, you have to think about anything «silly» that doesn’t respect the contract between your API and the callers, anything that doesn’t really make sense. Once you covered those cases, you can move forward with the normal test cases.
First test to be implemented should be the simplest most degenerate test. Never try to go for the gold in the first test cases.
Ideally all degenerate tests should be implemented first, however in real life, you will discover new degenerate testse while working in some other test. But always try to not go too complex, too fast.
Incremental algorithm
By implementing simple tests, you add functionality gradually.
«As the tests get more specific, the code gets more generic»
Uncle Bob
Through a sequence of incremental generalizations you get the solution to all problems.
The Rhythm of TDD
- Quickly add a test.
- Run all tests and see the new one fail.
- Make a little change.
- Run all tests and see them all succeed.
- Refactor to remove duplication.
After applying the Rythim of TDD you will notice:
- How each test can cover a small increment of functionality
- How small and ugly the changes can be to make the new tests run
- How often the tests are run
- How many teensy-weensy steps make up the refactorings
Getting Stuck
Getting stuck is a technical term that means there’s nothing incremental you can do to pass the currently failing test.
Getting that test to pass forces you to write a whole big bunch of production code. In extreme cases, you just have to write the whole damned algorithm.
It is a symptom of a problem: maybe you wrote the wrong test or you made production code too specific… or maybe both.
A good test-driven developer always approaches the problem from the outside in, testing the most degenerate things first. All of the error conditions and boundary conditions, all of the simple stuff around the periphery of the problem before he goes for the guts of the algorithm.
Getting Unstuck
The greatest risks must be mitigated first. The greatest risk to test-driven development is getting stuck. That risk is mitigated by assiduously avoiding the behaviors that lead to it, specifically the behaviors that leap into the complexity of the problem as opposed to gradually increasing and incrementally stepping into that complexity.
If you are getting stuck, probably you have to step back and undo last loop cycles, looking for better tests: simpler and degenerate tests.
Tests drives you to the solution
It would appear that if you write the tests in the correct order and continuously generalize the production code, the algorithm all but writes itself.
Before starting
Big rectangle represent every possible behavior of every possible program. By doing TDD, you write a test, and that test constrains our behavior
Desired area represents the desired behavior, these are our requirements, this what the tests are trying to force us to turn our program into.
Code represents the production code. And of course the production code initially fails, even the very first test. That’s why the production code is outside the desired behavior of the tests.
Making first failing test pass
Bring the production code’s behavior inside the bounds of the constraints of the test. Now that new behavior barely intersects with the desired goal at all.
Adding more passed tests
Then write another test further constraining the behaviors. Make that test pass by generalizing the production code, but this does much more than just make the test pass, it expands the behavior of the production code.
And continue this, and so it goes, step by step tightening the constraints of the tests. Step by step expanding the behavior of the program, by generalizing it, gradually approaching the desired goal. Until finally the goal is met.
Final Goal
When the goal is met, the tests still allow many undesired behaviors. This is the nature of tests. Tests cannot fully constrain a program. They can add constraints but they can’t specify its final behavior.
Tests can only prove a program wrong, they can never prove a program right.
And if this is so, then how can you meet the desired goal of the program?
The craftsman
The desired behavior is achieved because there is a human being in the loop: you. You’re smart. You know what the desired behavior is, and so you gradually increase the generality of the code until that desired behavior is achieved.
And this takes patience, it takes creativity, it takes endurance, care and drive. In short, it takes a programmer.
But not just any kind of programmer, it takes a craftsman to do this well.
Benefits of TDD
Reduce debug time
Debug time will approach to zero.
By applying TDD: everything always worked a minute or so ago. How much debugging do you think you would do? Don’t spend time at debugging, spend it at writing working code.
Low level documentation
Complete and reliable low level documentation.
The tests are a low level design document:
- They are written in a language that you understand
- They are utterly unambiguous
- They are so formal that they execute
- They can’t get out of sync with the application code
Improved design
Writing tests first makes production code testable.
The only way to test lines of code is to access it from the tests, and the only way to access them from tests is to decouple the functions that contain them. The act of writing your tests first causes you to have a system that is far less coupled than otherwise.
In short, you get a better design simply by writing your tests first.
Courage to change and clean code
Code rots, because you are afraid to clean it. So, to keep the system clean, you have to eliminate the fear of change. If you are not afraid to make changes in the code, then you are not afraid to clean it. Only a suite of tests that you trust can eliminate that fear.
This suite of tests is provided by TDD. The tests stop the code from rotting.
Conclusions
It’s hard to do more than one thing at a time. It’s hard to focus on making the code work and keeping it clean, with TDD you do these things at separate times: first make the code work and then clean it up. That’s why TDD does not slow you down, it speeds you up. Because the only way to go fast is to go well. And you need to keep your code clean in order to go well.
TDD is a predictable way to develop where tests drives you to the solution.
TDD is about you, it is a personal decision.
What’s next: TCR
Test and Commit or Revert (aka TCR) is a new programming workflow. Every time tests run correctly the code is committed. In case any test fails the code is reverted (and you lose your last changes).
Commit or revert is done automtically.
In this way you allways have all tests passing.
This workflow guides you how to increment functionality.
References
- TDD & The Lump Of Coding Fallacy by Geepaw Hill
- 7 minutes and 26 seconds by JB Rainsberger
- Test Desiderata by Kent Beck
- One page intro to micro tests by Geepaw Hill
- The World’s Best Intro to TDD, Level 1: TDD Done Right by JB Rainsberger
- Five Underplayed Premises Of TDD by Geepaw Hill
- TCR by Kent beck
- test && commit || revert. What?! by Kari Eline Strandjord