Preventing Brittle Tests (and Production Code)

Have you written unit tests only later to find they slow you down because your changes cause a lot of test breakage? You think, these tests are not living up to the promises I heard. So you toss the tests and go back to business as usual.

Hey, not so fast. Maybe its not test in general, but your tests or your code. The first project I used TDD on in 1999/2000 we ran into this problem. Some were ready to give up. But I wanted the tests to work and looked for what was wrong. In our excitement we kept copying, pasteing and tweaking the test cases. They were an ugly (in hindsite) mass of duplication. Small changes made for ripple effects through the tests. But I could see, it was our fault.

Someone on Quora asked me “How do you make unit tests less brittle”. Here is a more complete answer based on having written my own bad tests and seen many learners of TDD and unit testing go the wrong direction with their designs and tests.

The short answer has two parts.
1) Refactor your tests, removing duplication and expressing intent.
2) Test through its public interface and its interactions with other code. Test the behavior. Don’t rely on internal access to the thing being tested.

Now, the long answer.

Removing duplication

Removing duplication, does not mean code duplication, but rather idea duplication. There is a subtle difference. Test should provide an example of the intended behavior. The 4 steps of the test should be clearly expressed using the four-phase test-pattern (setup, exercise, check, cleanup).

Take these two tests for example. (I am using C example code, but much the same discussion could be made around using friend or package scope access to the internals. I am not saying never do that, just realize you are making fragile tests when you do.)

TEST(LightScheduler, no_lights_controlled_at_wrong_time)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1199);
    LightScheduler_Wakeup();
    LONGS_EQUAL(NO_LIGHT_ID, LightControllerSpy_GetLastId());
    LONGS_EQUAL(NO_LIGHT_STATE, LightControllerSpy_GetLastState());
}
 
TEST(LightScheduler, light_turns_on_at_the_scheduled_time)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1200);
    LightScheduler_Wakeup();
    LONGS_EQUAL(3, LightControllerSpy_GetLastId());
    LONGS_EQUAL(LIGHT_ON, LightControllerSpy_GetLastState());
}

They are not long, but they are tedious and have a lot of duplication. Here is a refactoring that removes the code duplication but left the tests unintelligible.

TEST(LightScheduler, no_lights_controlled_at_the_wrong_time)
{
    WhatTheHeckDoICallThisFunction(3, EVERYDAY, 1200, 
                                  SUNDAY, 1199, 
                                  NO_LIGHT_ID,NO_LIGHT_STATE);
}

TEST(LightScheduler, light_turns_on_at_the_scheduled_time)
{
    WhatTheHeckDoICallThisFunction(3, EVERYDAY, 1200, 
                                  SUNDAY, 1199, 
                                  3,LIGHT_ON);
}

I was not serious about the name, but what would you call it? When you can’t name a test case well (or a function, module, file, class…), there is something wrong. With a focus on showing the 4 phase test pattern and removing duplicate ideas we might get this.

TEST(LightScheduler, no_lights_controlled_at_the_wrong_time)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    TransitionClockTo(SUNDAY, 1199);
    LIGHTS_ARE_UNCHANGED;
}

TEST(LightScheduler, light_turns_on_at_the_scheduled_time)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    TransitionClockTo(SUNDAY, 1200);
    THEN_LIGHT_IS_ON(3);
}

These tests tell their story. A light is scheduled, some time later the clock ticks to an interesting time, then verify that the system reacts correctly. Differences stand out between tests. They are easy to name. They focus on behavior. They will be able to avoid being brittle as implementation and requirements change. The only vulnerability of this test case I see, is if the LightScheduler_AddTurnOn API changes. Then you’ll have to touch each test case.

Avoiding internal knowledge

Another root cause of brittle tests is tests that know too much about the implementation details. Test generally should avoid accessing an object’s internal state. Your unit tests should be gray box while being created via TDD. You know the test that is needed to drive the next line of code you want to write. But you write the test by exercising behaviors. They are created with knowledge of the implementation. But you don’t let the internal details sneak into the tests. If you do not let internal implementation details into the test, you can later change the implementation (for refactoring or more suitable algorithm) and the tests do not care.

Take a CircularBuffer for example. If your tests have access to the internal counters and indexes, and some need comes that makes you change the meaning and usage of those counters and indexes, your tests will all be broken.

In C, if you let the data structure for the CircularBuffer be known outside of the CircularBuffer itself, tests can be written exploiting that knowledge. For example, this header file can be abused by both test code and application code.

#ifndef CIRCULAR_BUFFER_INCLUDED
#define CIRCULAR_BUFFER_INCLUDED

#include 

typedef struct CircularBuffer;
{
    unsigned int count;
    unsigned int index;
    unsigned int outdex;
    unsigned int capacity;
    int values[];
};

struct CircularBuffer * CircularBuffer_Create(unsigned int capacity);
void CircularBuffer_Destroy(struct CircularBuffer *);
bool CircularBuffer_IsEmpty(struct CircularBuffer *);
bool CircularBuffer_IsFull(struct CircularBuffer *);
bool CircularBuffer_Put(struct CircularBuffer *, int);
int CircularBuffer_Get(struct CircularBuffer *);
unsigned int CircularBuffer_Capacity(struct CircularBuffer *);
#endif

Code that uses CircularBiffer.h has full visibility of the structure. A user could decide to not use CircularBuffer_IsEmpty, and look at the count member and see if it is zero. Let’s say that later you discover a concurrency problem with this implementation, you might change the implementation to not need the count member. Now all the users of count won’t compile. If you had tests that looked at count, they’d be broken too. Fragile code and fragile tests are not fun.

A better approach hides the internal details. Using a forward declaration in the header file and putting the full struct declaration in the source file helps protect the internals. Your tests only depend on the public interface.

#ifndef CIRCULAR_BUFFER_INCLUDED
#define CIRCULAR_BUFFER_INCLUDED

#include 

typedef struct CircularBuffer CircularBuffer;

CircularBuffer * CircularBuffer_Create(unsigned int capacity);
void CircularBuffer_Destroy(CircularBuffer *);
// snip

These tests define the needed behavior. Behavior defining tests will continue to be valid for different implementation choices.

TEST(CircularBuffer, is_empty_after_creation)
{
    CHECK_TRUE(CircularBuffer_IsEmpty(buffer));
}

TEST(CircularBuffer, is_not_empty_after_put)
{
    CircularBuffer_Put(buffer, 10046);
    CHECK_FALSE(CircularBuffer_IsEmpty(buffer));
}

TEST(CircularBuffer, is_empty_after_put_then_get)
{
    CircularBuffer_Put(buffer, 4567);
    CircularBuffer_Get(buffer);
    CHECK_TRUE(CircularBuffer_IsEmpty(buffer));
}

The LightScheduler test above has similar desirable attributes. The code under test is modular, it is exercised through its public interface. Given it has dependencies and we are testing behavior, those tests also controlled and monitored the interactions with its dependencies.

You can prevent fragile tests.

One thought on “Preventing Brittle Tests (and Production Code)

Leave a Reply

Your email address will not be published. Required fields are marked *

Be gone spammers * Time limit is exhausted. Please reload the CAPTCHA.