COBRA Testing
What type of test should I write? I should always aim to have COBRA tests. Where COBRA stands for:
C |
Clean |
All of your tests should be clean. Follow the principles of Clean code (from Uncle Bob Martin) and all the other guidelines you follow in your production code. All test methods should communicate their intents cleanly. You should organize your test methods. You should break down your test classes if they are too big. (Isn’t you class under test too big if the test is too big?) Follow general advice of coding even if it is not production code. |
O |
Optimized |
Your test code should be optimized. This means its effectiveness and its scope as well. Effectiveness is easy: it should be fast. Very fast. Its scope is a bit trickier, as David Scott Bernstein puts it: your tests should test only one thing and that thing should be only tested by one test. So if you change something and multiple tests can fail, you have a scope problem. |
B |
Believing |
Integration tests and end-to-end tests are great, you can see your whole application moving. But the downside is that you must wait for your whole application to move. Most of your tests should avoid it. You should use mocks to inject expected behavior instead of trying to create the exact scenario in codes that belongs to other things you don’t want to test. You can only believe your dependencies if they have tests you can rely on and they stay faithful to their promises that are guarded with tests. So write your tests the same way: let others believe in your codes. |
R |
Reliable |
If your tests sometimes fail but don’t indicate any problem, than they are not reliable. If they don’t exercise your code correctly, they are not reliable. If you change them when something breaks, they are not reliable. If they are not reliable, you will not care. If you don’t care, you just waste time and energy by running them. Write reliable tests so if they pass, you can know, you are good to go. |
A |
Accelerating |
Your tests should make your work faster. If you find that you are trying to avoid test execution sou you can work faster, than they are not accelerating you. Your tests could be slow, or not optimized (maybe because they are not believing in others) so you don’t feel the extra boost they can give you in your everyday work. Fix them. Make them be a COBRA, so running them every time you save your code would mean the fastest and most reliable feedback of doing things right. It will set your wild developer dreams free! |
Hopefully: nobody. If this wannabe manifesto is clear enough, than all the following words are only making this thing unnecessary longer. Anyway, I will try to add more explanation to my ideas, if you are interested, then please go on, find some details.
What makes a test clean? There are simple rules (rather tips) and not so simple ones. Consider the following:
By this I don’t mean the class or method under test but the case. Think about this example: You have a class called Cup . If you have a test called CupTest you know clearly that it belongs to the Cup but what has happened to your Software when CupTest says failed ?
Let’s go on. You might have a method called fill on Cup . And you might have a test method called testFill in CupTest . If this fails at least we know that something has happened around filling the cup.
Make the failing test called testFillThrowsExceptionOnTooMuchLiquid . This is much better. If this fails we know something has happened with the case where we expect exceptions to be thrown in case too much liquid is poured into it.
Can it be better? Well here comes the debatable part where you must take your engineering hat on and make you decisions based on your knowledge, customs and taste. (Taste? Did I wrote taste here? Sure I did! But it is not possible to formalize taste! Yest, that is true. There are a lot of gray areas in making “perfect” software so we can’t automate them, and still have high paying jobs. So be happy with that I can not (and I believe no one else could) give you a silver bullet.)
Imagine your test is called onOverfillCupReportsError . Now we are basically at the previous state. This method name might be a bit of a less mouth full. It is shorter for sure, and it has got rid of the word test (it is also a matter of taste) which is just clutter: If I have a failing test in the class CupTest I don’t need to see the word test in its name to recognise it is a test. (In java I usually have a @Test annotation over it, ot it has a test structure, or it ends with expectations, etc.) It also got rid of the word liquid which is irrelevant. (If I can overfill the cup with solid object, like sugar, than it is required to be expressed somehow in the description of the case that we are talking about liquids.) We have removed the specification of exception which is also debatable if is good or not. What is much better is the too much is now defined to be overfilling of the cup. (Think about it: 2 dls of vodka can be to much but can fit nicely in a teacup.)
Beside the name of the test method in modern tools you have a way to express your case much better (with descriptions that can contain spaces and real sentences). Use these options especially if you have a complicated case to explain. But still try to be firm and expressive. Describe your case with few words which makes it easier to understand. This is especially true if you case is complex to understand than a short summary can help a lot. (Why is your case complicated? Isn’t it trying to test something which is too much for a unit test? Shouldn’t it be an integration case? Shouldn’t your classes be more specific and better structured? Doesn’t your class have too much dependencies and / or responsibilities? Sometimes your test problems are symptoms of bad design in your test code.)
Look at your test code. It should have 3 stages:
If you are not developing a truly pure functional software you will have a state. This is everything that has to be true before the action happens. Take this example case: “logged in user can not register”. (This test description does not tell how the user is prevented from registering, this can be good or bad the details would decide.) This description tells to have a setup, where we have a logged in user. So in the setup we will ensure we have a user (who is the actor who will be prevented from registering) and that the user is logged in.
There are many tools to set up the initial state of a test. (Henc I had a lot of keywords to write in the first place.) You might have shared states or default states (default test configurations) that lets you leave this part of your case empty. This does not mean that you don’t have a state, it only means it is defined elsewhere.
Even though you have many tools to deal with this you still want to keep it small. Every state setup you need to do pops the questions: Why do you need that? Why do you do this this way? What do you setting up in the first place? If you have to create a session factory, a password encrypter, a lifecycle scheduler just to get your logged in user, your test will be over complicated and fragile. (This also shows you might need to refactor your code.)
Try mock your dependencies. This can save you a lot of lines of code. You can just tell your mock with one line to perform the action you crave and don’t bother with other cases. If you have to setup many mocks because your method under test uses the first mock to get the second, to get the third and execute a method on it, than can be a sign that the API of your code is either not the best or the intention of your method is too farm from its state/representation. (If it needs 3 objects deep a method, why doesn’t it depend on that 3rd object directly?) Sometimes you have to make these bad choices (do you use a 3rd party API?) but your test can show you if something could be better in your design. (Like wrapping that 3rd party in a better cleaner API of you and depend on your wrapper good wrapper code, and only have these problems in a centralized test of your wrapper.) See more at the “tell don’t ask” object oriented design pattern (rather principle) and the “train wrecks” code smell in refactoring.
This is the one thing you want to test. This is the only one thing you are testing. If your test has two actions in it, and one of them must happen before the other (and you always expect it to happen correctly), than your first action is only part of your setup phase.
Having multiple statements in your action phase can show several things. If you don’t expect all the statements to pass, than they shout for their own test cases. If you can not tell which action is needed for the case to happen, you don’t understand your code or case enough. If one action means multiple calls, than your API is not good enough (you have sequential coupling).
Think about it: even in cases where you must have a sequential coupling, like handling a file, you can have cleaner tests. The easy way is see that opening the file doing something and then closing the file most of the time can be tested in an acceptable clean code.
You can write tests around all of the issues of open (like missing file, missing privileges, etc) in open tests, and later on accept that all of your other test cases can open a file, and have this thing in the setup phase.
If your code is responsible for closing the resource, have tests about close in their separate cases and deal with the issues there. After that your action will always have two calls: the tested call then the closing call, but you don’t expect the second call to fail, it has its tests elsewhere.
You can even build an API over this coupling that implements the strategy pattern, where the file is always opened before the action and closed after. In this case your tests (and the users of your API) can focus on what they do instead of how they do it.
This is the part where you validate everything has happened as you have expected. If you miss this part, why do you have a test in the first place?
When I was learning about testing I was taught to only have one check in a test case. This is the one thing this scenario tries to validate. If you have to validate multiple things, write more tests. But this is not what I have seen in real life. The problem with multiple checks is that you don’t know automatically why your test has failed, and you don’t know that your failure is the only failing case or the upcoming checks would have failed also. Even though we tend to write multiple checks to our tests. This is because we are either too lazy to write multiple tests with the same setup (here I must mention to keep your tests DRY - a.k.a: Don’t Repeat Yourself) or we find it too slow (in execution time) to repeat the test case.
For the first case everybody has a solution up their sleeves: extract the common code to a well named class or method and us it. The second case can show that we have fallen to the territory of integration tests instead of unit tests, and we could do wonders with some mocks. Anyhow we have solutions in modern test frameworks to execute multiple checks in one test case and report about all the failures and passes which make this problem better, but try to make your validation phase to a single assert.
What is also important is that in this part of the code you can provide extra description about your testcase. More exactly about: why have you failed a test, or what were you expecting. Here you can also find two schools. One of them write positive messages, like: “after doubling content volume should be 4”, then you will have the comparison, that we were expecting 4 bot we got X. The other way is to write negative sentences: “after doubling volume is not right”, then the comparisons about expectations and reality.
Whit people where the debate is strong one says the reader is interested about what is the desired state, the other side says they want the test to log why they have failed, so their test report should sound like the list of crimes of the code. I am happy with both of them, I even like to shuffle them, whatever feels more natural.
Your code can be optimized in several ways. First of all, you have the lines of code which you want to keep at a sane value. Then you have the execution time. But there is a third thing: what does it test?
A lot of things were already mentioned in the Clean section of this longer explanation. Most of them you don’t even need: you are a software engineer, you know how to write good code, aren’t you? If you need more input, you can find much better materials in bookstores.
There is an aspect of test length that is good to talk about. Imagine a class which does very few things. Let’s say it has three public methods, all of them are only 2 lines long, the whole class is less than thirty lines. This does not sound to bee too much is it? I think it is not, at least, not for me. When we write causosly and thoroughly our tests we end up with 300 lines of code in our test class. How can it be?
Even a small class can have a wide aspect. We can have multiple reasons why a particular (small public) method fails. If we pragmatically write separate test cases for all the issues we can end up with a very long testclass. The problem with long test classes are the same as with long classes. We don’t want to scroll up and down to find what can be the problem, when we are investigating something.
First of all: if covering all the aspects of a class requires more tests that is convenient to handle, this might be a symptom of a too wide scope class under test. It might be a good idea to refactor the above mentioned class and separate its responsibilities.
But of course there can be an exception: what if the small class is a facade which sole purpose is to bring a wide range of responsibilities under one object. Than you can still separate your test. You can have a testclass for one aspect, and other test classes for other aspects. If that would mean too much lines of code duplicated between the test classes, then clean it up and introduce a sherable object. (If you don’t know how, read Robert C. Martin’s excellent books about it.)
When is a test fast enough? Everybody has their number: a test should finish under X (milli)seconds. I don’t know what is the best value for X (I like to keep my tests under one second) my rule of thumb is: your test should be faster than how fast you lose focus. Executing the tests during development should be an automatic side track of your brain. If you have to stop and wait for the tests than you will lose interest and they will lose their purpose (helping you during development).
What should you do if you found that your tests are not fast enough? Of course, you should sit down and optimize them. Why do they run slow? What do they do that takes them so long? Am I saying that you should profile your tests? Yes! If you can not tell by looking at your (test)code why it takes so long, than sit down and profile them. You can say that you do not have time for this, but it is just an excuse.
What I have seen is that all codes are written for eternity. They should be that good. And if a code will stick to you for the rest of your work, its tests should be fast enough for you to forget about the code and it tests on the first place. (Ultimately you can get rid of the tests but then you will face the issue of fearing to change the code, because you don’t know when it will break.) Not stopping to speed up a test is like not stopping to fix the gearshift in your car. You can do it, but in the long run you can get further with a fully working car.
So how fast? If you keep losing focus during test run, then you need faster tests. That fast!
So a test code should only test one thing, and that thing should only be tested by one test… With other words: any lines changed or removed should break one test and should break one test only. Do you feel that this is impossible? You are right, but bear with me.
Only think about a constructor: if all of your tests creates an instance, you can break all the tests, by breaking the constructor. So we can never reach this heavenly state. This does not mean we should not aim for that.
If you have one small part of your code that is hit by all the tests, that’s OK. If 90 percent of your code is hit by 90 percent of your tests, that is not OK. This means you write integration tests, and you should not have 90 percent integration tests. (Because your test execution will be slow, and you will lose focus.)
If it keeps happening to you that you change one thing and another classes test fail, and you wonder how is it possible, than your tests have too wide focus. (Except for the test that belongs to the class, which you were changing. That test should have been broken, informing you about why is it bad, what you have done.)
If you are working on class A and a change breaks test A and B that is not necessarily bad, if your tests run fast enough and the failure message of test A explains how to fix your code to make test A and B pass again. You have problems, when this phenomenon makes you go slower. Either by taking it too long to execute the tests or by not having a failing test in A’s own test, and you have to go to test B to understand the situation.
No matter what you do your test will exercise many things beside its scope. (The test framework, the execution environment, etc.) But you should consider all of these rock solid and lightning fast. If it is not true, than try to get around the problem, so at least your test will be better, faster, more reliable.
We have arrived at a hard topic. First of all, it is hard, because this letter was changed along the way to an adjective so you can ask the questions about your tests the same way. (Is it clean? Is it optimized? Is it believing? Is it reliable? Is it accelerating? - all yes, we have a COBRA test.) The other problem with believing is that it is strongly coupled with reliable. If you believe in others, it can help you a lot, but you can only believe in reliable people. Two sides of the same coin.
It was mentioned earlier that we want to write fast, focused tests. With other words: we want to write unit tests. Every time I have seen a slow test it was because of its scope. Even in cases where the scope has only touched a single method and the method was too slow the problem was with the scope of the method. Every time the problem could have been solved by mocking sometimes it was so much work that we have agreed to leave it that way. But every time we were too lazy to fix the problem it was because the design of the underlying code was too bad, and we were too lazy to fix that. (And not to mention: we were too afraid to change it, since it was in production and it was working, only the tests were bad, and it would have been hard, since it had bad and slow tests, etc. These are only excuses. Bad excuses.)
Unfortunately I have also faced the issue, when a mocked solution was rejected. It was rejected, because it was not the real system, but just a patchwork that imitates the real system, and who knows if it still works with the real system. (And it also has happened, that the real system was broken later and these tests did pass, but more on that on the reliable section.) All in all, mocks were found fragile and untrustworthy. Even when ten lines of setup was replaced with one line of mock setup and test execution time has fallen to a fracture I have faced doubt.
We had a system, where we had class A under test that used class B. We had a test case where class A has tried to do something with class B that has resulted in an exception. The test was about that class A’s method handles the exception properly. Such a scenario in a well designed mock framework can be set up in one line:
when(b.method()).thenThrow(exception) .
But my colleague insisted to set up B correctly to throw the exception. We also had to understand the inner works of B to force it to fail. After running a code coverage tool on the tests that were written to test A we had found a worst test coverage on class A than class B. I was told this is good, because if anything goes wrong in class B, than even the tests on class A can catch it. This sounds good, isn’t it?
Well we had a problem that our tests were running slow but when we have checked code coverage we have found that basically any testclass gave around 55% coverage on all the classes in the code. Even though we did not catch more bugs in general. All these test classes were executing the code without exercising them. You must understand the very important difference between these two things. Take this very easy example:
Take this example, which comes from real life. (I have seen the guy who has heard about this.) Imagine that we have a method that gets a list and finds the maximum value of it. (First of all, if you are not developing in an ancient language, than you have a built in solution for it in your standard library. Use it. Trust it, and don’t write tests for it. On the other hand if you happen to work with an ancient language, than you have a solution for it in a library. Use it. Trust it, and don’t exercise it in your tests.) We had a class under test that were using this findMax method. The tests were slow. Most of the execution time was in the maximum search. After we have introduced a mocked version of the method (that always have returned the first element of a list), tests have started to fail.
It has turned out the method had a side effect: it has also sorted the list before it has returned the (last) element (after the sort). It is still easy to mock: your mock will replace the incoming list with a predefined sorted list, or even better: use very short lists. Sorting three or five elements (or just two?) is basically free and the same thing as sorting a list with a million. After a little digging it has turned out that there are multiple places in the code where findMax is executed just for the side effect, to get the list sorted. We could even find execution paths where the same list wes resorted several times by different methods. A fast mock has helped but not as much as using small lists, instead of really long ones.
On the next day, my colleague came to me to revert the code from yesterday. He has started itching about not having the same level of connection in the system. The previous solution was using the real algorithm, with real data. If that works, the production code works. That was true but we were talking about a sort and a maximum search (in a newly sorted list), which is nearly as basic as incrementing a variable. But what if someone replaces the algorithm with something that is slower or requires more resources? How will we know?
Not believing is basically boils down to this fear: if I am not exercising my code as it will be used in production than something else will go wrong and I will not know about it. Let it go. Trust in your fellow engineers.
If everything goes wrong, and there is nobody you can trust (then look for a psychiatrist, they can help), than trust yourself. Cut the untrustable code from your daily work as it would be cancer. Separate it with a good wrapper (or facade, or whatever you need) and write good test for your code that exercises it thoroughly. Then use your code everywhere and mock that code in your tests.
Let me tell a story: I have read and instantly fell in love with Robert C. Martin’s Clean Code. Later, I have read Working Effectively with Legacy Code from Michael C. Feathers and it felt like blasphemy. Some years later life gave me the chance to chat with Michael Feathers in person. I could not resist to challenge him.
We have came down to a situation where he advises how to deal with (untestable) codes that are tied together by a singleton. This is fairly easy: make the singleton setable. When you hear it for the first time it is very hard to swallow. It goes against every idea you have learned before: a singleton (beside it is very likely an antipattern) must be a unique instance, there is no need to be set anything. And of course there is the problem of responsibilities and so on. And he has sayed, that yes, all of it is true, but it helps. I have insisted that we have made the code less clean, and he answered that it is true but it is better, because now it is tested, and we can work with it.
I did not give up. He has lots of ideas in his book where he simply lifts up barriers that are in the way which were not lifted up before just because everybody agreed it is a bad thing. These things never have business value, only engineering ideas (like thinking that a singleton must be the same in the tests as are in the production code), so they were invented to help communicate intentions between software developers, but they have got into way by time. (Don’t forget, he is talking about legacy code, that we try not to change as little as possible.) So as I have said I did not give up, and throw in a new question: what if somebody changes my singleton in production runtime. He was like a zen master: Tell them not to. But what if they forget? Create a setter method that clearly communicates your intentions, like: setInstanceForTesting . But what if somebody set it just because he can? And he has looked into my eyes and told me: There is always the possibility to wait for him in the parking lot and beat him up.
Of course, he was joking and he has explained to me that the value of making the code testable and than fixing the issue is much more than the cost of the change. (And again: this is true for legacy codes, not your everyday working set.) I have started to believe that none of my colleagues are evil, and the techniques I have read in the book has worked for me marvelously.
By the way: have you ever implemented a Hamcrest Matcher? That interface has one of my favourite method names:
_dont_implement_Matcher___instead_extend_BaseMatcher_() .
When I have first showed this draft to one of my friends, I have got the critique that R stands for Repeatable in FIRST (Fast, Isolated, Repeatable, Self-validating, Timely - write your tests FIRST!), I should use the same. If you are as old as I am, you had countless job interviews (on either side) where somebody has popped the question: What do the letters stand for in test FIRST? So I know what he meant. I just feel there is so much more to this issue than what repetable means.
I must admit FIRST is great. If you choose to follow only one practice in your life and you choose Test FIRST, you will be OK. And repeatable is very important in COBRA as well. Your test will never be reliable if it is not repeatable. What makes it repeatable? First of all, it does not leave trash behind, so if you could run it once, you can rerun it, and will get the same results. If it ever becomes green, it should always be green (until something is changed in the code). If it becomes red it should always become red until something is changed in the code to make it green.
The important part in the above sentence is the “ something in the code” part. If the something you change is resetting a database then you are not doing it right. Your unit tests should exercise your code not your environment, therefore changes in the environment should not affect your results. How do you separate your class from your external dependency in a test case? With some testing double, of course. The next question is usually this: but how will I know if my testdouble is imitating the real thing acceptably? The answer is: by reading the documentation and being very pedantic.
Take this scenario: you have introduced a layer between your application and the database. In your tests you mock this layer and write what to answer to different select statements. You make a spelling mistake in your SQLs in one of your classes, like: SLECT * FROM mytable . How will your testdouble spot it? First of all, when you set up your mock you should not copy the SQL statement, but retype it. This way it is less likely you make the same mistake twice. On the other hand you have the opportunity to use in-memory databases in your test, which is an other testdouble written by someone else. (Basically you have cut the problematic part out of your code and separated it in an other well tested codebase which is rarely changes - only when you decide to update.) All in all it is better to have a repeatable test which does some mistake reliably than have an unreliable test. Why? Because sooner or later your mistake will turn out and then you will find the root cause, and fix the test then the code (optimally you will fix the test first) and you will have a reliable test (that you don’t have to check again). Not to mention if your test is not reliable, its cases when it is read will become noise, and you will end up in a situation where you don’t even know if green means good or not.
So what else do I mean by reliable beside repeatability? We also mean that the tests do a fairly good job of exercising your functionalities. What does this mean? Let’s state: code coverage has nothing to do with being well tested. You can have a high coverage without being well tested and some rare cases you can have a very poor coverage and still protecting the biggest value in your code with well defined test cases.
First, discuss the second case, because it is rather a design smell. Imagine a class that has a bunch of getters and setters, a very detailed toString, hashCode and equals, and beside them it has a crucial algorithm in one method. We can write tests for this method and cover only its code, and have very valuable insight of whether our code is good or not. Why is it possible? First of all the lots of uncovered areas I have mentioned can be considered clutter / boilerplate / garbage. All of them can be generated either by a language feature, a library, a framework or your IDE. They are so basic, they have no value to test. The second thing that pops to our eyes, if only testing the second method is enough why we have all these code in the first place? Why do we have an equals, if it is not important if it works or not? Why do we have a hashcode if we don’t care about it? (Maybe we are working in Java where we need an equals and than we have to have a hashCode as well, but in this case we must make sure that the hashCode is working as well.)
And the toString? Well, it is trickier. If it has a business value, than test that business value. (Just an example: you might have a User object that holds a username and a password field. It is a very valid requirement that the password is kept hidden when somebody prints out the object. In this case you should have a test that checks for the password in the String representation of your User object, and breaks if it finds it.) Sometimes it only has some added value for the development. In this case you can have an idea what values you want to see in the String, so your tests can look for that. Or your toString behaves differently based on object state, then test those cases.
The one last thing is this: if we can test an object thoroughly by just one method, ignoring all the getters and setters, then why do we have them in the same class? If all the states have no impact on the method, it shouts for its own class. If the states of the object do affect the method, for a good test coverage of the functionality we have to alter the states of the object. By this, even if only by just a side effect the setters are tested as well. (Their value is tested: they are such a basic code, they don’t deserve their own test case but they impact on the software is guarded by tests.) In the end, we can not make a good test case for the getters (except if the effects of the method can only be tested through getting values, than we have the side effect testing just like in setters case). Remember our thesis was that: the value of the code is well tested by testing only one method of the class. So everything is well tested and the code of the getters are not executed? Then delete them. They are probably only leaking information out of your class. You don’t need them.
So go back and look at the case when we have a high code coverage value with a poor test coverage. Take the easiest example: you have a ten line code which is an algorithm that has a tracelog after every step. Your tests will (most probably) exercise the algorithm only and only execute the logging. Of course you will have a code coverage over all of your code lines, but your tests will pass even if you delete all the logging lines. This is the difference between executed and exercised lines.
Let’s get away from logging for now. Imagine a method that has five lines which are all private method calls, and in the end it is returning the result of the last call: a single boolean. We have two tests: one for the true case and one for the false case. It is obvious that all five lines have executed (two times) since we got to the last call, but it is possible that we could delete some of the calls and the tests would still pass. So, just as like logging, we are sitting in a case where something is not exercised properly. But of course we can have lines that are implementation details: those are not have to be tested. (They are desired to be executed during testing but it is OK if you don’t have specific cases that forces those lines to be executed exactly that way.) Imagine a class that propagates events to a list of objects. It is not important whether you use a for loop, a while loop, a for each loop or you writes goto-s to achieve the goal: all object gets the event. Or in this case you can handle exceptions one-by-one, or with one try-catch block around the whole method. The only important thing is that: does your code survive an error case or not?
Take this example: you have class, called Accountant . It has a public paySalary method, that takes an Employee , calculates its wages and transfers the money to the employee's bank account. This is the primary function of this method, this is what is obvious from the name. But paySalary can do so much more, like calculating the taxes and also transfering that to the authorities. Now we have described a case where we have a legit method, that doesn't even have a return value, all that it does is private. Of course, we could make the important methods public (or just package private in case of Java), so we could cover them with tests but it feels as bad as it sounds. Or we could introduce some inner states where we store the last transferred amounts that we could access in the tests but we should design our code the other way around: if something has a good reason why to introduce an accessible interstate, than we can use it in our tests.
What we could do, is cutting Accountant 's responsibilities into smaller chunks. We could introduce a dependency: Bank , which will be responsible for money transfers. Now, in our tests we can use a test double for this class, and verify, if it is handled correctly: all the right amounts are sent to the right accounts. With this we have made the design better (so I think).
But having all the banking details be buried into the Accountant could have been a design choice also. We might want to hide how to send money from the company unless it is done through an accountant. If this is the case, only publish the interface of Bank, and still keep the implementation hidden in Accountant. You only need a way to link the two together, which you can solve with factories, different constructors, etc. The important thing is, you can make private details public and testable in meaningful ways without destroying your design. Not to mention if these private details are exercised correctly, than your tests will become more reliable.
Now we have got to the last bit of reliability. This will be shocking: your tests should not change by the implementation of your class under test. As I imagine now, I have two kinds of readers, one who does not understand what I am talking about, and one who does not understand, why am I talking about it in the first place. When I was learning about tests, I have learned about the concept of regression testing. This is basically a fancy name of executing the already existing tests. However it is quite easy to find commit pairs in repositories where some change is followed by a "fix tests" commit message. I know this can be reasoned endlessly but the thing that should have been fixed was the commit on the first place.
First in first: if a "fix tests" commit is really valid, it can only mean two things: the tests were unreliable and have failed for no reason, or you knowingly and deriberalitly have introduced a breaking change. Why am I saying that? Because if your tests were reliable than they were enforcing some values of your class or method. Somebody might have built some belief on that.
When I was learning about Microservices I have came across the concept of contract testing. A contract test is nothing more than a test where an other team writes down how they use your API, and you are free to make any changes until you break these tests. Whenever that happens you can not fix these tests, but only ask for the other team to fix them for you. Basically this way they are informed about the change and you can only go on when they have a method on how to work with your new API. Sounds great, doesn't it? We should always do that! Well, let's discuss it!
The thing that validates this in case of Microservices us that: they are not in control of what you publish. They are using your API directly and - if you have done your homework well - you can replace your system without they are ever noticing a thing. When we go back from super-modern Microservices based isolated architectures to good old binary linked software we have one more thing to keep us safe: semantic versioning. Beside all other possible solutions this is the most well known and best spread way to automatically select between acceptable and not acceptable releases. Basically every modern build tool nowadays provide you a way to say: give me all the compatible enhancements. This means I am happy to get bug fixes, new features but I want to decide when to upgrade for a breaking change. With other words, contract tests are needed in Microservices developments because in this case depending developers are not in charge of controlling of our updates.
So if I have to fix a test this means I have broke one of my earlier promises. Every test is basically a promise that my code behaves in a certain way in certain circumstances. If I have done my homework well, all test promises an important value of my code. This value can be so basic like an input validation.
Imagine that you are creating a class with a method that has some input parameters, one of them is a filename. In your implementation you check for the file if it exists and throw a NoSuchFileException if the file does not exist. (For the sake of Java developers, think of this as an unchecked exception.) You also write a test where you state: my class works well, if it throws a NoSuchFileException when this method is called with a filename that does not exist. Later (much later, several releases later) you decide that in the case of a missing file you can create one with some default values. This is a breaking change. You have removed one of the values of your code (maybe for a greater but different value), you have broken one of your promises.
Why is it so dangerous? After all, you have removed a case from your class that could lead to an error (handling) case. First of all, someone might have implemented an error handling around your earlier promise, that you only accept existing files. Some might have expected his code to stop from executing when it calls your method with an inappropriate filename. (This also has a smell, but let’s move on.) Everybody has to check every invocation if they can work with the new behaviour or not.
Take this example further: what if you have changed the above mentioned behaviour and now some other class’s test has failed. (First of all, that test is not optimized, fix it.) Now we have some possibilities: one dimension is the filename was coming from the private methods of the class (like generated from some logic) or it was an input parameter. The other dimension is the throwing of the exception is important or not important. Handle this last case first. If the exception throwing is not important, no matter where the filename is coming from, the right question is why do you have a test for it in the first place? So go back to the case where we explicitly don’t want to continue the execution if the file does not exist. If this is an incoming file than why do we let our dependency to check it instead of us? Fix this. (Maybe because our dependency extends the filename with some path information that we can not provide to check, then we get to the following case.) If the filename is the product of our inner methods and we believe that it is right (or we can not validate it) and it is important to stop execution for wrong file names, than we have perfectly described a broken case. This is an example of a broken API, we had mentioned above.
So we can never fix a test? Of course no! We are not perfect, we are but men, we make mistakes. It is possible that the test is testing something important in a bad way. (We have a bug in our test.) Than fix the test first then change the code after. If we can not fix our test (make it behave like it should) before changing our code then we are about to introduce an API breaking change. So say it again: when we are changing something that requires the fixing of a bug in a test, then we should fix the test first, which means to reword our test to assert the same value in a new way that passes with the current code as well and only then change the code under test. So the keywords here are that: the new test is passing with the old code (as well) and it asserts the same value. Then this is a valid test fix.
If we handle our tests with care. If we assert all the values of our code with them. If our test results are exact. If we only change them when we can assert the same value and when we knowingly and deliberately introduce breaking changes (in this case we also inform our users about it) then we have reliable tests.
This is it. We have come to a full circle. This is why we are doing all of the above: to have accelerating tests. We are doing extra work (writing tests) to help our own work: so we can sleep well at night we can trust in our commit, etc. Tests can only help us if we execute them, and we trust their results. The best is to execute them regularly, maybe after typing any valid line.
Usually I don’t write more than five or ten lines without executing them. Basically I don’t write two methods before executing the first one. (Refactoring with a valid and automatized refactor tool and generating code does not count.) The last time I was not doing that was when I was learning programming in Basic on a Commodore 64. Back then I did not have a text editor, I could only execute my code when I have written the last END to the end of my code and whatever line number was before it was the maximum lines of code of my application. This was the first moment where I could execute my program. If something has turned out to be bad, I had a problem. If I wanted to insert a line between two existing lines I just had to write it in with a line number that is between the linen numbers of the two lines. If there were no more integers between them? Bad luck, start from the beginning. I still remember how easier it became when I could edit a file. I have started to execute nearly empty programs which were growing fatter and fatter by time and executions.
The same thing is happening with automated tests. The important difference is that the checking is done by a computer which is much more rigorous and does not bother to check parts that I (think I) have not touched at all. Basically I write tests to have an angel sitting next to me to tell whenever I have made a mistake or everything is as I wanted it to be. (To tell the truth I have a Batman figure on my desk who is judging me with his looks whether I am breaking my own rules or not.) This removes the doubt from my work and helps me focus on what is important.
This is how tests can accelerate my work. My problem here is it is much harder to write down how to write accelerating tests than explaining how to spot non accelerating ones. The number one sign is that you try to avoid test execution so you don’t have to stop when you are productive. In this sentence, pay attention to the productive part. Because if you always execute your tests before you go and fetch coffee, then you do execute them regularly (especially if you refill your cup on a regular basis), but I don’t think they are accelerating you.
When a testsuite really helps during development you don’t mind to execute it during the flow. Maybe you are not executing all your tests only the tests that are connected to the codepart you are working on, but test execution should be as natural and often as hitting the save button (or keyboard shortcut). This is especially true when you are refactoring. If you have a testsuite that is reliable you can know after every change whether your changes has introduced any regression or not.
If you try to execute your tests and you are always stopped by non related tests breaking and by going there to understand how they use your object or how their class under test uses your object, than you can know, that those tests are nor optimized neither believing but at least they have shown you a broken promise in your code. So are those tests accelerating? No, they are keeping you back. We also know that they have stopped you from doing something wrong so, if you can understand the problem quickly than you can come up with a different solution and you save much more debugging time. In this manner, yes they are accelerating. The key question here is whether the class’s own tests have shown the error or not? If they have showed and explained the problem, then you are probably good to go. (Except if you find it painfully slow to run all tests and you also don’t have the courage to execute only the tests that are for this code segment. Than you have to do something with your testsuite.)
We are not in the eighties anymore, we are not writing software that should run on a separate machine, save to a file and done. In our focus we are writing software that are self deployed onto many devices, communicates over the internet with third party servers, mines data from a database, and must be able to put to sleep any time because the user has to get off the bus now. We have also evolved into an unbelievable community. No other industry has managers and workers where they go to events to share their best advice with their concurrence, or keeping help hotlines to answer each other’s questions. Not to mention the endless supply of open source libraries to work with. We are depending on solutions of others to go fast. This also means we have a lot of third party dependencies in our code as well.
All these third party dependencies can make our tests less accelerating. (No, not because they only seem to be nice but they are evil to the bone.) Why is that? Because using a database is slow, communicating over the network is slow, emulating a device event is slow, firing up a browser to execute JavaScript in it is slow. No matter what. Not to mention that if something is complex it is more likely to have some glitches. You don’t want to find those glitches in someone else's code, especially because you can not do anything about them. (Not to mention you should have separate cases for the third party is working correctly and is not. And you want to decide when you want which.) So, when you are testing you want a rock solid environment, so you don’t have to care about the problems of others.
If all those are true, if your tests are running fast, telling you what has gone wrong and why, always get to the same result, and it is generally good to work with them, and you are glad for having them, than you have accelerating tests. You are good to go, you COBRA is ready.
Let’s make some things clear:
Please, let me talk about how the name came together. First I was mocking around what are the most important aspects of a test that you must face than I had some pretty good words to summarize them, than I have started to play around with the starting letters and after minor changes I have arrived to COBRA, which is good, because: