Preventing Brittle Tests (and Production Code)

Have you written unit tests only later to find they slow you down as implementation changes cause a lot of test breakage? You think tests are not living up to the promises you’ve heard. So you toss the tests and go back to business as usual (Debug Later Programming).

Hey, not so fast. Maybe it’s not test in general, but your tests and your code. The first project I used Test-Driven Development on, in 1999/2000, we ran into this problem. We were ready to give up. But I wanted the tests to work and looked for what was wrong. In our excitement we kept copying, pasting and tweaking the test cases. They were an ugly (in hind-site) mass of duplication. Small changes made for ripple effects through the tests. But I could see, it was our fault.

Someone on Quora asked me “How do you make unit tests less brittle”. Here is a more complete answer based on having written my own bad tests and seen many learners of TDD and unit testing go the wrong direction with their designs and tests.

The short answer has two parts.
1) Refactor your tests, removing duplication and expressing intent.
2) Test through its public interface and its interactions with other code. Test the behavior. Don’t rely on internal access to the thing being tested.

Now, the long answer.

Removing duplication

We’ve all heard of the Pragmatic Programmers’ DRY principle, Don’t Repeat Yourself. Many misinterpret this to mean code duplication. Code duplication is a symptom of code and tests that are not DRY, but removing all duplication is not the solution.

Take these two tests for example. Imagine you are working on the LightSchdeuler in a home automation system. The scheduler is woken up once every minute to check the schedule. Our design accommodates substitutes for time and light controlling hardware interactions with FakeTimeService and LightControllerSpy. The tests schedules light 3 to be turned on everyday at minute 1200, triggers the scheduler to wake-up at a specific day and minute, then the tests checks the scheduler’s reaction to the time.

TEST(LightScheduler, no_lights_controlled_at_wrong_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1199);
    LightScheduler_Wakeup();
    LONGS_EQUAL(NO_LIGHT_ID, LightControllerSpy_GetLastId());
    LONGS_EQUAL(NO_LIGHT_STATE, LightControllerSpy_GetLastState());
}
 
TEST(LightScheduler, light_turns_on_at_the_scheduled_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1200);
    LightScheduler_Wakeup();
    LONGS_EQUAL(3, LightControllerSpy_GetLastId());
    LONGS_EQUAL(LIGHT_ON, LightControllerSpy_GetLastState());
}

A good thing about these tests is that they are testing behavior. They do not reach inside the scheduler inspect its internal data. Tests that look at internal details will cause a lot of test breakage when you change the design. Another good thing, the tests don’t seem to be too long, at least by a line counting standard.

For understand-ability, the tests are too long. They are tedious and have a lot of duplication. It takes close inspection to see how they differ. There are probably many other tests that look the same. We could try getting rid of all duplication.

TEST(LightScheduler, no_lights_controlled_at_the_wrong_minute)
{
    ScheduleLightOnThenSetTimeAndCheckLightState(3, EVERYDAY, 1200, 
                                  SUNDAY, 1199, 
                                  NO_LIGHT_ID,NO_LIGHT_STATE);
}

TEST(LightScheduler, light_turns_on_at_the_scheduled_minute)
{
    ScheduleLightOnThenSetTimeAndCheckLightState(3, EVERYDAY, 1200, 
                                  SUNDAY, 1199, 
                                  3,LIGHT_ON);
}

Removing all duplication leaves the tests unintelligible, they do not tell their story. They are Sahara Desert DRY. That function name is ridiculous. There is no reasonable name that tells the story. When you can’t name a test case well (or a function, module, file, class, a member…), something is wrong. Let’s structure the steps of the test using the 4 phase test pattern (setup, trigger action, check, cleanup), or BDD’s Given-When-Then.

As a first step, simply add some white space to separate the parts of the test. I could add comments (// given // when // then), but those provide nothing for the skilled test reader.

TEST(LightScheduler, no_lights_controlled_at_wrong_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);

    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1199);
    LightScheduler_Wakeup();

    LONGS_EQUAL(NO_LIGHT_ID, LightControllerSpy_GetLastId());
    LONGS_EQUAL(NO_LIGHT_STATE, LightControllerSpy_GetLastState());
}
 
TEST(LightScheduler, light_turns_on_at_the_scheduled_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);

    FakeTimeService_SetDay(SUNDAY);
    FakeTimeService_SetMinute(1200);
    LightScheduler_Wakeup();

    LONGS_EQUAL(3, LightControllerSpy_GetLastId());
    LONGS_EQUAL(LIGHT_ON, LightControllerSpy_GetLastState());
}

Once we are happy with the separation, extract the needed helpers.

TEST(LightScheduler, no_lights_controlled_at_the_wrong_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    SetDayMinuteAndWakeUp(SUNDAY, 1199);
    CHECK_LIGHT_ID_STATE(NO_LIGHT_ID, NO_LIGHT_STATE);
}

TEST(LightScheduler, light_turns_on_at_the_scueduled_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    SetDayMinuteAndWakeUp(SUNDAY, 1200);
    CHECK_LIGHT_ID_STATE(3, LIGHT_ON);
}

That is better, as each part of the test is separate. The extracted helper functions are not lying but they are not as helpful as they could be. Test’s should express more of the problem being solved than how you are solving it. It is true that SetDayMinuteAndWakeUp(SUNDAY, 1199) sets the day the minute and then wakes up the scheduler, it’s just not intention revealing. Tests need a balance of DRY and DAMP (Descriptive And Meaningful Phrases). Test should strive to tell the story of the thing being tested using domain terms.

TEST(LightScheduler, no_lights_controlled_at_the_wrong_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    TheTimeBecomes(SUNDAY, 1199);
    LIGHTS_ARE_UNCHANGED;
}

TEST(LightScheduler, light_turns_on_at_the_scheduled_minute)
{
    LightScheduler_AddTurnOn(3, EVERYDAY, 1200);
    TheTimeBecomes(SUNDAY, 1200);
    THEN_LIGHT_IS_ON(3);
}

Notice that the tests tell their story. A light is scheduled, some time later the scheduler is notified about a new minute, then the test verifies that the system reacts correctly. Differences stand out between tests. The test name is supported by the steps in the test. Each test focuses on a single behavior. These tests state truths about the scheduling of lights and do not reveal internal details.

The mechanism of faking time and spying on the hardware are also only in the helper functions. They will be able to avoid being brittle as implementation and requirements change. The only vulnerability of this test is if the LightScheduler_AddTurnOn interface changes. Then you’ll have to touch each test case. So be it.

Avoiding internal knowledge

Another root cause of brittle tests is tests that know too much about the implementation details. Test generally should avoid accessing an object’s internal state. Your unit tests should be gray box while being created via TDD. You know the test that is needed to drive the next line of code you want to write. But you write the test by exercising behaviors. They are created with knowledge of the implementation. But you don’t let the internal details sneak into the tests. If you do not let internal implementation details into the test, you can later change the implementation (for refactoring or more suitable algorithm) and the tests do not care.

The schedule could be in a linear array or a key-value map. That implementation detail could change. The hardware and OS details are also hidden behind application-specific and intention-revealing interfaces.

Circular Buffer example

Take a CircularBuffer for example. If your tests have access to the internal counters and indexes, and some need comes that makes you change the meaning and usage of those counters and indexes, your tests will all be broken.

In C, don’t let the data structure for the CircularBuffer be known outside of the CircularBuffer itself. In C++, don’t access hidden parts with friend. In Java or C# generally do not use package scope to access hidden parts.

Here is a common C approach that reveals details that should be hidden. This header file can be abused by both test code and application code.

#ifndef CIRCULAR_BUFFER_INCLUDED
#define CIRCULAR_BUFFER_INCLUDED

#include 

struct CircularBuffer;
{
    unsigned int count;
    unsigned int index;
    unsigned int outdex;
    unsigned int capacity;
    int values[];
};

struct CircularBuffer * CircularBuffer_Create(unsigned int capacity);
void CircularBuffer_Destroy(struct CircularBuffer *);
bool CircularBuffer_IsEmpty(struct CircularBuffer *);
bool CircularBuffer_IsFull(struct CircularBuffer *);
bool CircularBuffer_Put(struct CircularBuffer *, int);
int CircularBuffer_Get(struct CircularBuffer *);
unsigned int CircularBuffer_Capacity(struct CircularBuffer *);
#endif

Code that uses CircularBiffer.h has full visibility of the structure. A user could decide to not use CircularBuffer_IsEmpty, and look at the count member and see if it is zero. Let’s say that later you discover a concurrency problem with this implementation, you might change the implementation to not need the count member. Now all the users of count won’t compile. If you had tests that looked at count, they’d be broken too. Fragile code and fragile tests are not fun.

A better approach hides the internal details. Using a forward declaration in the header file and putting the full struct declaration in the source file helps protect the internals. Your tests only depend on the public interface.

#ifndef CIRCULAR_BUFFER_INCLUDED
#define CIRCULAR_BUFFER_INCLUDED

#include 

struct CircularBuffer;

struct CircularBuffer * CircularBuffer_Create(unsigned int capacity);
void CircularBuffer_Destroy(struct CircularBuffer *);
// snip

These tests define the needed behavior. Behavior defining tests will continue to be valid for different implementation choices.

TEST(CircularBuffer, is_empty_after_creation)
{
    CHECK_TRUE(CircularBuffer_IsEmpty(buffer));
}

TEST(CircularBuffer, is_not_empty_after_put)
{
    CircularBuffer_Put(buffer, 10046);
    CHECK_FALSE(CircularBuffer_IsEmpty(buffer));
}

TEST(CircularBuffer, is_empty_after_put_then_get)
{
    CircularBuffer_Put(buffer, 4567);
    CircularBuffer_Get(buffer);
    CHECK_TRUE(CircularBuffer_IsEmpty(buffer));
}

The code under test is modular, it is exercised through its public interface. We are testing behavior.

You can prevent fragile tests!

2 thoughts on “Preventing Brittle Tests (and Production Code)

  1. Just wanted to say thank you very much. I’m very new to writing tests so I’m really grateful you took the time to share your wisdom.

Leave a Reply

Your email address will not be published. Required fields are marked *

Be gone spammers * Time limit is exhausted. Please reload the CAPTCHA.