My son is studying computer science at U of I in Chicago. He is taking an operating systems class and has to do some projects in C. (Some things change, some things stay the same.) He had an internship over the last 8-9 months and was doing TDD in Java. So he might be in the early stages of a test infection.
He has not done much C so he’s got some learning to do. I thought I better show him CppUTest. He could use it for a play ground to learn some of the subtleties of C, as well as use TDD.
A week ago we met at Panera Bread for a coffee and to setup CppUTest to work with his assignment. His assignment involved parsing a string (char* that is). His professor suggested they use strtok(). We googled strtok to read about it. (Odd that that ‘googled’ spell checks OK. Yahooed does not. I wonder who put the dictionary together.)
Here is the signature of strtok().
char *strtok( char *str1, const char *str2 ); |
To see if we understood strtok() we wrote this test.
TEST(Parser, ParseOneElement) { char* input = "abc"; char* token = strtok(input, "., "); STRCMP_EQUAL(input, token); } |
The test passed, feeling confident we tried a little more interesting test.
TEST(Parser, ParseTwoElement) { char* input = "abc,def"; char* token1 = strtok(input, "., "); char* token2 = strtok(0, "., "); STRCMP_EQUAL("abc", token1); STRCMP_EQUAL("def", token2); } |
Much to our surprise, this one crashed. After a little digging, we discovered the error in our ways. strtok() actually changes the string. It makes sense, but was a surprise. The subtlety is right there in its signature. The first parameter of strtok() is a char*, not a const char*. A careful reading of two different strtok() references explains the behavior. It makes sense that giving strtok() a literal string causes a segmentation fault when strtok() starts inserting its nul characters.
Aside: I’m no language lawyer but… The tests are written in C++. Assigning a literal string to a char* seems like it should generate at least a warning. Nary a warning there be.
This fixes the problem.
TEST(Parser, ParseTwoElement) { char input[] = "abc,def"; char* token1 = strtok(input, "., "); char* token2 = strtok(0, "., "); STRCMP_EQUAL("abc", token1); STRCMP_EQUAL("def", token2); } |
Fast forward one week. Paul is working on his parser. He could not get the unit test harness going for one reason or another during the week, but got his parser working. His main() would grab a line of text and parse it, then print the pieces. It works fine. The parser takes an input string and fills in a vector of pointers to the tokens.
He wants the test harness, good boy Paul. I guess my threat that if he does not write his tests, he’ll have to pay his rent is working. Well, here is the test, and it crashes the test runner.
TEST(Parser, parse) { char* input = "hey there"; char* token[10]; parser(input, token); STRCMP_EQUAL("hey", token[0]); STRCMP_EQUAL("there", token[1]); } |
A week had gone by and this test has the same mistake. strtok() is inserting nuls into a literal string again. When we made this mistake a week earlier in the strtok() learning tests, we found the problem quickly. This time, with only a little code on top of strtok(), it took a half hour to find the problem. It was insight from the previous week’s test that help find this instance of the mistake. Without the tests from the prior week, I think it would have taken quite a bit longer to find this problem.
strtok() is a library function. We can have an expectation of correctness; although statistically some library functions will have bugs too. We don’t need to write test for library functions to verify the functions. We rite the tests for us, not them. We write them so we can learn. What did they cost? Not much, but I think these learning tests have already had a positive ROI. Learning tests are free! Or maybe better than free!
As for char-pointers being convertible to char*, I think I’ve read that’s still there for C compatibility.
So, technically, “hello” is a const char*, but treating it as a char* works, unless you attempt to write to the buffer, and you’re on an OS that cares about memory protection.
For what it’s worth…
Yeah, I figured that was there to keep C programs happy. I think gcc -Wall (warnings all) should make a little noise though.
Pingback: Unit Testing the Java Class Library - Red-Green-Code