Mon, 24 Mar 2008

The (Lack of) Testing Death Spiral


At PyCon '08, I gave a talk on testing and the OLPC project where I referred to the "Testing Death Spiral". My accompanying slide, which aimed to be simple rather than comprehensive, had this scenario:

  1. Write a bunch of code & manually test it.

    (Good so far.)

  2. Start adding features over here.

  3. Watch code break over there.

  4. Rinse, lather, repeat

    (Where do you think this ends?)


OK, so that format doesn't really work in a blog post, but hopefully you get the gist of the scenario. This is a scenario I see a lot: a project gets hacked together & works well enough that people start using it; then the project starts to expand. Many new features are added. However, as these new (and presumably solid) features are being added, the old code becomes increasingly ignored, uncovered by manual testing, and fragile.

This is a simple consequence of an inescapable fact: the amount of testing needed to detect regressions scales with the number of features. Forget about finding new bugs in the code you just wrote -- I'm talking about breaking existing code.

I have seen people attempt to escape this scenario in a number of ways: improve the architecture and reduce internal linkages; open source it; release early, release often; alpha- and beta-test it; stop adding new features; and probably many more. These are all good thoughts, but they are all doomed to failure [1]. Nonetheless, I wish you well.

The only solution I have found is this: write automated tests.

Before I continue, let me say: automated tests are not a panacea. Writing good code is hard, getting your project "out there" is important, exploratory testing is mandatory, and writing appropriate automated tests is hard; there's a lot more to building software than writing good, automated tests. I stress that every time I talk about test automation. I just think automated tests are necessary [2].

Let us suppose, for the sake of argument, that you have some software that is actively evolving. Furthermore, this software has no automated tests. Every time you add a feature, you test the bejeezus out of that feature in order to satisfy yourself that it works. You do this for every new feature that is added, and thus consider your software to be solid.

I now have two questions to ask:

  • are you adding features in isolation from each other? that is, is your architecture such that each new features only uses non-state-changing code from elsewhere in your project?

    (if the answer is yes -- are you sure?)

  • do you completely control the packages, libraries, compiler, operating system, and hardware that your software runs on?

    (if the answer is yes, do you plan to never, ever, change any of those components? and have you discussed these plans with anyone outside your development team? and do you believe your managers?)

I like to summarize these questions this way: are you feeling lucky, punk?

If the answer is "yes" to all of the above, then congratulations -- you are Apple, stuck at one point in time, and never planning to release a new piece of software or hardware :). Hopefully you'll do better than Apple did before they decided to change and adapt...

If the answer is "no" to any of the above, I encourage you to read on.

I assert that even with a perfectly decoupled architecture, brilliant software engineers, and nigh-complete control over the software and hardware that you use -- in itself a dream software development situation -- you will eventually need to add features that crosscut that architecture, and you will also need to upgrade the compiler, libraries, language version, operating system, and hardware. In order to make sure that your software still works, each time you add a feature or change a component, you will have to retest every feature and every piece of code. And, if you have no automated tests, you will have to do this manually. Every time.

If you have automated tests, however, your development process could look something like this:

  1. change code
  2. run tests
  3. commit
  4. test manually, do exploratory testing
  5. find bugs, write automated tests to reduce bugs
  6. goto 1

Even if you don't add any new features, this process applies to library, compiler, platform, and hardware changes. At the least, you will be able to quickly determine if you've broken something that you're testing for; at the best, you will be able to quickly and confidently release new versions of your software.

Fundamentally, then, automated testing is important for software maintenance. And since the cost of software maintenance is a significant portion of the cost of developing the software in the first place [3], it behooves you to pay attention to anything that will reduce the cost of software maintenance. This is without even considering other aspects of test utility, like increased developer velocity, ease of refactoring, increased confidence in your software, etc.

This maintenance situation is the scenario that led me into testing (or, if you prefer, "illuminated me as to the importance of automated tests by whacking me over the head with a clue bat".)

Let me assure you that this maintenance situation doesn't just apply to large bodies of code, either. I maintain a number of small projects and having automated tests means that I simply don't release code with regressions. Moreover, when my small projects "grow up" into bigger ones -- or, more frequently, are used in larger projects -- I'm not stuck in a situation where I suddenly have to write a bunch of tests to achieve stability. I always try to grow my test framework organically with the project, because I will never have the time to put into writing tests from scratch for my bigger projects.


So, automated tests are important for maintenance, and they are critical for making sure that your old code still works while you focus on new code. Without automated tests, you will be doomed to releasing increasingly buggy software as your body of code increases and the average level of testing decreases.

Does this actually happen?


This is precisely the scenario that led to our consulting work with ARINC, which went well. (As in, they're adding new features with great confidence after we helped them adopt automation tools and practices.)


This is also the scenario that leads to what Jamie Zawinski named the Cascade of Attention Deficit Teenagers. Open Source projects, facing a continually increasing number of bugs, often opt to completely rewrite their components in the expectation that this time, they'll get it right. This completely ignores our experience with software rewrites, which suggests that (barring brilliance and luck) any rewrite will contain as many bugs as the original software -- they'll just be different bugs. (As JWZ points out, though, it's more fun to write new code than to fix the crud someone else wrote before...)


And, finally, it is also the scenario faced by the One Laptop Per Child project, which has built a tower of cards on open source software. Their build system pulls in about fifty distinct packages live from the Internet, compiles them all, and then layers the Sugar user interface on top of them.

There is no automated testing in place.


OK, back to the Software Testing Death Spiral. What happens to projects that lack both automated tests and an exponentially increasing team of testers? Starting somewhere in the middle of the process:

  1. They manually test the new features and bug fixes that they've just added.
  2. They release their software.
  3. Their software breaks in unexpected locations, bug reports are filed, and (optimistically) those bugs are fixed. Go to #1.

The inevitable consequence is a death spiral, barring only a complete rewrite (which will possibly fail, or likely lead to a product that's just as buggy, but with unknown bugs), trashing of the project, OR -- and this is an optimistic scenario -- the adoption of automated testing.


Here are a few straw men, with moderately snarky replies:

"We don't test, and we don't use version control. Which is more important?" Version control. But you're doomed, anyway.

"We don't have time to test." Why do you have time to write software, but not time to make sure it works?

"We don't have the expertise to build good tests, and/or we can't afford the tools, and/or we don't know how to use them." This is a pretty realistic scenario, actually. May I suggest: hire consultants, or read some good books, or dedicate your young new hire to learning the tools?

"We don't like to test." Well, at least you're honest ;). I would summarize your choices like this: either you can write crappy software, or you can learn to like testing. The former will most likely doom you to the rubbish bin of history. The latter gives you a better chance of "making it".

"We really do plan to rewrite our software in two years." Points for honesty, again! I think you're rolling the dice -- many software projects fail, but maybe you'll do better. Might I suggest an incremental rewrite rather than a complete rewrite? (For that you'll need testing, though...)

"We wrote a bunch of automated tests. They didn't help us. Ahh, a problem based in actual experience! I would like to suggest -- with no background in your particular problem -- that you try out several different kinds of tests, like functional tests or regression tests, and see what does help you.

"How do I test, if I don't know what the right answer is, anyway?" How do you know you got the right answer, then? If your customers don't care if you're right, then you've stumbled into a gold mine, but I daresay it will end badly. (This straw man was actually sighted at PyCon -- sorry, MC.) I hear this a lot in research, actually, but it's still nonsense. Perhaps another blog post in there...

"I can't convince my boss/team leader/PI that it's important to spend the time to write tests. (I even sent him/her your blog post.)" You could go one of three ways: try harder, integrate testing into your personal development strategy and view this situation as an opportunity to "manage up", or quit. The middle option is the interesting one: you can quietly start writing automated tests to "fence in" your own code, and explain to your boss that this is just how you code -- it's like using emacs instead of vi -- and you're not insisting that anyone else follow suit. Hopefully your productivity will not decrease much, while your reliability will increase. Good fellow programmers may follow suit and at some point your manager might realize that you've all evaded his dictat. Or not. But it beats working on untested code!

"I am but one lone programmer, and I can't convince my team to write/use tests. (I even sent them your blog post.)" See previous question/answer: you will find that most worthwhile programmers are in favor of anything that increases their productivity and reliability.

"There's so many other things to straighten out on my project before I can even think about what tests to write." I sympathize, I really do, but if your project is so undirected that you can't even figure out what it's supposed to do (and write tests for it) then you have far bigger problems than bad code to worry about.

"I took your advice and wrote tests. Then we changed a bunch of stuff, and now all the tests break, and I don't have time to fix them. What do I do now?" Hmm, this is a common complaint. First, try to separate out a subset of the tests that are of immediate use to you (as in, they pass and/or they exercise a lot of your code). Keep that subset working. Second, don't be afraid to simply delete your old tests. Tests should not be a maintenance headache; if you like and use tests, but don't see the point of maintaining a bunch of your broken tests, get rid of them! Then put new ones back in as necessary.


There really are a bunch of other reasons to write automated tests, too. For example, consider:

  • cross-platform development is dramatically simplified when you have a moderately thorough test suite. In particular, you can develop on your favorite machine, in your favorite programming environment, and let the continuous integration boxes run and test your code on all the other machines.
  • setting up new development environments and development machines is much easier when you can simply ... run the tests to figure out if it's all working.
  • integrating new people into the development team is much easier when they can run tests to figure out if they just broke something.
  • releasing "a quick bugfix" is a lot easier when you can be fairly confident that your quick new release is no more broken than your last release.

If these aren't enough to make you think seriously about testing, then I give up!


There's no real conclusion to this :). I'll talk more about the OLPC stuff later.

Don't get me wrong: testing is hard. Testing effectively is even harder. There are ways around this, but the best way to start may be to simply power through: write a bunch of tests, and ruthlessly discard those that don't help. Then refine your method over time. I have some advice to offer here, too, but that's for another blog post...

And remember... Darth Vader recommends testing!

--titus

p.s. Thanks to Tracy Teal, Lisa Crispin, Alex Gouaillard, Kumar McMillan, Shannon -jj Behrens, and Doug Hellmann for comments!

[1]E-mail me if you think I should write about why :)
[2]I can blog about "necessity" vs "sufficiency", too. Let me know.
[3]I've heard estimates of 80-90% of the total cost of development for a successful software project, i.e. initial feature development is 10-20%, maintenance is 80-90%, but I have no good references for this.

posted at: 21:35 | path: /mar-08 | 19 comments

Tags: , ,


Mon, 17 Mar 2008

PyCon '08: The Brain Dump


Just left PyCon yesterday; now I'm up in Michigan looking at some more houses, arranging lab stuff, talking with people, and getting ready to prosyletize the Google Summer of Code to a bunch of Michigan State CSE students as well as a few professors.

Some freeflow thoughts. Feel free to comment my ideas into oblivion :)

PyCon was a blast, even though I didn't attend many talks. I feel that for many -- if not most -- of the talks, I can more easily digest the material by simply going and reading about the project. The worthwhile talks are the ones that present new ideas or info and are presented well; sadly, the vast majority of geeks do not give good talk. For this and other reasons, I simply hung out for lunch and dinner and met with old friends from previous PyCons.

I wholeheartedly support the adoption of an advanced-technical-only track. As it was this year the talks I was interested in (mostly very technical) were embedded in the middle of a bunch of other talks that were not technical. I wasn't up to picking them out of the mix.

Speaking of "good talks", I think the whole review system is effed up. What's with the anonymous authorship of proposals? In 2007 my proposal for a twill/etc. talk nearly got rejected because of "lack of detail"; I thin this was partly because the reviewers couldn't take into account that I wrote all of the damned tools I was talking about. Anyway, it ended up being the third or fourth most popular talk of the conference, and the highest ranked non-plenary talk. Why not actively solicit or help those people that have a history of giving interesting or entertaining talks (as measured by audience response)? I don't think this is unjustified elitism: half the point of a conference is to have interesting talks, right? (The other half is the social aspect, and then there are various "thirds" running about too.) Maybe I'm just bitching, but I think that if a good & highly technical speaker submits a proposal that sounds boring to you as a reviewer, your estimate of the proposal is more likely to be wrong than right. (For example, when was the last time Brett Cannon gave a boring talk??)

Yes, I'd be happy to help review, but I want to know who the author is. Then my reviews could look like this: "This topic is really interesting, but your one paragraph summary doesn't reassure me that you actually know what you're talking about. Please justify your ability to give this talk." Or: "This could be an interesting talk, but your presentation last year was a on a similar topic and was really boring (see poll HERE). I vote to accept based on the hope that you will improve." Or: "Awesome presenter, boring topic. He will make it work."

During dinner with Leapfrog people, a talk scheduling proposal emerged: rather than trying to group talks in some logical coherent way, why not try to minimize scheduling conflicts and auditorium changes by asking people what talks they want to go to? It's actually a fun constraint solving/ expectation maximization problem...

Our testing tutorial went OK, although I think we've got to find a new tutorial format if Grig and I are going to stay interested :). Our audience members had widely varying skillsets and backgrounds, too, which meant that some of them were bored through most of the tutorial, while others were confronted with a huge volume of new information.

I'm thinking about how to improve for next year; we may try to do a whole day of tutorials, and bring computers and Ethernet cables, and help attendees solve their actual problems.

The conference support for tutorials was kind of minimal: normally we don't need anything more than a projector and a mike, but (for whatever reason) the conference organizers alternated between treating us really impersonally (sending mass mailings that ignored previous information we'd sent them) or really curtly ("No. That's your problem.") I understand getting overwhelmed -- I've run several conferences the size of PyCon '06 myself -- but if you let it change the nature of your interaction with people, you're doing no one any favors, least of all the conference or yourself.

Next year I may also ask for the tutorials to cover up to my own expenses (registration, hotel room and flights) from student fees, rather than having them simply give me $500 & free reg. I feel like I'm paying out for the privilege of giving each tutorial, and that's a bit frustrating. Probably they'll say "no", which will then leave me/us with the option of cancelling the tutorial or just sticking with it... we could also move the tutorial to a "sprint day" and encourage people to stick around for real "free consulting with grig and titus". I think we'd have more fun that way, and I'm damn sure we'd be more useful!

The few lightning talks I saw were great fun. Apparently I hit the few technical ones :); Bruce Eckel and others have complained. I didn't see any of that and I think the organizers, by and large, did a great job.

I got a whole boatload of T-shirts without even stopping by the booths.

I finally met Leslie Hawthorn (Google Open Source Programs Goddess, or some such) in person, which was a huge mistake to make. She's like a freakin' woman-shaped ball of energy (albeit low key even I know that sounds like a contradiction) and she pushes pushes pushes people to do Good Stuff for open source. I appear to be susceptible. More on that later.

Leslie buys a mean glass or three of Lagavulin. (Yum.) And it was great to meet her. But I suspect that every time I meet her, I will get talked into doing more stuff. Sigh.

I'd like to thank O'Reilly (represented by Julie Steele) for buying me dinner on Friday, and Leapfrog (represented by the entire testing team there :) for buying me dinner on Saturday.

My OLPC interactions were interesting:

On Sunday, I gave a talk on automated testing and the OLPC GUI, Sugar. I'll post slides and a screencast later, but a brief summary goes like this: Sugar development is a bit of a disaster, with very little in the way of any software engineering principles being applied. In particular, there's my particular bugaboo: they have no automated tests, at all. My talk discussed the situation and talked a bit about using technology to remedy the situation; ultimately, though, the choice the OLPC people have to make is whether or not their software is going to suck. (This version of my argument is intentionally provocative, but I strongly believe that this is indeed the choice they face. See "jwz CADT" and also my future posts on this topic.) In particular, their testing plans consist of this: "really hope that other people step up and test our shit." In stark contrast to some of their other detractors, I'm trying to become one of those people that does test their shit, but it also seems to me that without a sea change in the focus of the software management layer at OLPC, I will be wasting my time.

Anyway, so that's a mildly obnoxious talk to give and I did my best to leaven it with humor and some rilly rilly cool testing tech. What was interesting to me, though, was the private advice from a number of people -- there appears to be a large undercurrent of dissatisfaction with the OLPC project in the Python community. In particular, one group of people basically said "burn the f$$!ckers to the ground". (I largely ignored this advice and tried to focus on the positive.) These are not normally mean-spirited people, so from this, if nothing else, I conclude that the OLPC has mismanaged its interactions with the Python community. I'm not sure exactly where things have gone awry, but I hope it's not too late to get back some community luuuuuurve: for all their software failings, the OLPC is an awesome awesome project that has changed, and hopefully will continue to change, this world we live in. Advice and thoughts on this issue welcome; I will post (or re-post) those that I think are especially worthy of attention.

One interesting idea: one person suggested that after having done so many impossible things already, the OLPC folk think that software is going to be one more example where they have to break the mold. Well, guys, if you think you can break out of the Software Death Spiral without building in any automated testing, I think you're batshit crazy...

I do feel good about using whatever "testing" community capital I may have in putting forth a critique of the OLPC. I'm still nervous about having done it, frankly, because (as I said in my talk) it's like kicking the family dog. In this case neither the dog nor the rest of the family bit, but perhaps I've just missed the negative comments?

I did finally see Ivan Krstic talk about the OLPC effort. Due partly to a laptop failure (ironic!), his presentation was largely photos from his recent Peruvian and ???ian deployment of OLPC, which I'd already seen through his feed. Fantastic stuff, but a bit disappointing to see a blog summary as a talk :(. He's an engaging speaker.

Ivan did not come to my talk. I heard someone say, sotto voce, that it was partly because he was afraid that I was going to say what I did say. This is only a rumor, though, and regardless I would encourage him to engage me in a constructive conversation at some point...

I met Zed Shaw, too. He's a hoot (I think that's the technical term :). Clearly very smart and equally opinionated. He encouraged me in some of my technical geekdom for the OLPC talk, and then of course failed to come see the talk. Ehh, I'll send him my screencast when I finish it. There's no avoiding me, Zed!

Oh, I almost forgot -- I'm now a member of the Python Software Foundation (unless they retract it for criticizing both PyCon and OLPC in a single post)! Hurrah! I guess this means I'll have to run PSF/GHOP again, yeargh.

Hanging out with everyone was awesome, and I will probably pay the fee for next year's conference just for that. I will make an extra effort to attend the sprints next year, though, because they must be an absolute blast.

I'm sure I'm forgetting stuff, but this is all my brain can stand for now. More anon, esp about the OLPC and testing stuff.

--titus

posted at: 14:27 | path: /mar-08 | 4 comments

Tags: , ,


Fri, 14 Mar 2008

Testing Tutorial -- It's Over, Man!


Steve Holden and Doug Napoleone both attended our testing tutorial, (as did AMK, which was a bit of a surprise!), and had fairly positive things to say about it. This was a relief, because Grig and I always wonder whether or not this stuff is useful to anyone.

Our hope for this year was to stir things up by making the tutorial more interactive. We succeeded to the extent that there were a few more questions and a bit more interaction, but we would have liked to get more in the way of pre-tutorial feedback. In retrospect, the tutorial might have worked better if two things had happened:

  1. People signing up for the tutorial had actually sent us a bunch of things to test. As it was, we got about 5 e-mails, which didn't give us a lot of problem choices; some of the suggested problems were out of scope or too big to tackle in the time we had.

    This is not really anyone's fault, although probably Grig and I could have done more in advance.

  2. Grig and I had had more time to prepare.

    Both of us are crazy busy (he has a job that is a big timesuck, and I am in the middle of a transition from postdoc to prof, which means that I'm trying to finish some research while setting up a new lab 2000 miles away, PLUS I have a wonderful new daughter who is also quite the timesuck.

Now, don't get me wrong -- we did prepare adequately, but I would have liked a week or so to actually do some Really Cool New Stuff.

I'm not sure what to do next year, if anything. This was our third tutorial, and people seem to attend them and like them, but I feel like we should try something different. Unfortunately, the lack of reliable network access really limits our ability to push forward into doing live tutorials and code writing; Web testing depends on the Web!

Regardless of our thoughts, I am interested in any suggestions that people may have. If you've attended our tutorial -- what would you have liked to see, OR now that you've seen it once, what would you want to see the second time around, if anything? If you haven't attended our tutorial, what would you imagine to be the most interesting "testing tutorial" possible?

thanks, --titus

posted at: 10:03 | path: /mar-08 | 4 comments

Tags: , ,


Thu, 10 Jan 2008

Inexpensive Consulting with Grig and Titus


As many people have doubtless read, PyCon '08 has announced the tutorial sessions. This year, Grig and I are doing a workshop-tutorial on testing rather than a teaching-tutorial; what this means is that our tutorial will focus on actually applying testing tools effectively to your source code.

We're billing this tutorial as "Inexpensive Consulting with Grig and Titus". As we say in the tutorial announcement:

"Bring your tired (code), huddled (unit tests),
and cranky AJAX to us; we'll help you come up with tactics,
techniques, and infrastructure to help solve your problems."

Both Grig and I have developed a number of approaches and thought patterns that we'd like to share with people who are having a tough time with testing -- for whatever reason. And we'd like to share them in what we hope to be the most effective way, by working with you on them.

However, we need to hear from participants. You can't just slap us with 5k LoC and expect us to grok it instantly; we'd like at least a few day's notice ;). So, if you're planning to attend the tutorial, please think about what you want to hear about, and plan to send us some source code to work on a week or two before PyCon. (Code that should work on Linux or Mac OS X is preferred...) If you're having specific problems, let us know; otherwise, we'll try to figure out how we would start testing your project.

We got an evening slot for the tutorial, which is all to the good; that means we can go directly to the bar afterwards. We'd also like to invite attendees to have dinner with us before the tutorial, so that we can develop a bond (and get you drunk) before the tutorial. Speaking solely for myself, I'm also happy to consult inexpensively over a glass of beer or scotch at pretty much any time. So please don't be a stranger!

Incidentally, Noah and Shannon both have good things to say about our tutorial (although I suppose if you're James Bennett Noah's opinion may not be persuasive ;). I'm looking forward to the steel cage match, myself; my money's on Noah, who is quite the brute.)

--titus

posted at: 00:03 | path: /jan-08 | 1 comments

Tags: , ,


Fri, 16 Mar 2007

PyCon '07 Talk -- source code


I've put together a brief discussion, with links to the source code, surrounding the demos I did at PyCon '07 during my testing tools talk. Here's a brief TOC:

# Demo 1: Testing CherryPy
# Demo 2: Testing CherryPy without exec'ing a process
# Demo 3: Basic code coverage analysis with figleaf
# Demo 4: More interesting code coverage analysis with nose and figleafsections
# Demo 5: Writing a simple twill extension to do form "fuzz" testing
# Demo 6: Django fixtures for twill/wsgi_intercept
# Demo 7: Recording and examining a Django session with scotch
# Demo 8: Convert the Django session into a twill script
# Demo 9: Replaying the Django session from the recording

You can read the discussion at http://darcs.idyll.org/~t/projects/pycon-07-talk-source/README.html. This document also contains links to download the source code.

Enjoy!

--titus

posted at: 20:53 | path: /mar-07 | 0 comments

Tags: ,