Sun, 13 Mar 2011
Trying out 'cram'
I desperately need something to run and test things at the command line, both for course documentation (think "doctest" but with shell prompts) and for script testing (as part of scientific pipelines). At the 2011 testing-in-python BoF, Augie showed us cram, which is the mercurial project's internal test code ripped out for the hoi polloi to use.
Step zero: wonder-twin-powers activate a new virtualenv!
% virtualenv e % . e/bin/activate
Step one: install!
% pip install cram
... that just works -- always a good sign!
OK, let's test the bejeezus out of 'ls'.
% mkdir cramtest % cd cramtest
Next, I put
$ ls
into a file. Be careful -- you apparently need exactly two spaces before the $ or it doesn't recognize it like a test.
Now, I run:
% cram ls.t
and I get
. # Ran 1 tests, 0 skipped, 0 failed.
Awesome! A dot!
The only problem with this is that when I run 'ls' myself, I see:
ls.t ls.t~
Hmm.
As a test of the cram test software, let's modify the file 'ls.t' to contain a clearly broken test, rather than an empty one:
$ ls there is nothing here to see
and I get
! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,2 +1,1 @@ $ ls - there is nothing here to see # Ran 1 tests, 0 skipped, 1 failed.
OK, so I can make it break -- excellent! Cram comes advertised with the ability to fix its own tests by replacing broken output with actual output; let's see what happens, shall we?
% cram -i ls.t ! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,2 +1,1 @@ $ ls - there is nothing here to see Accept this change? [yN] y patching file /Users/t/dev/cramtest/ls.t Reversed (or previously applied) patch detected! Assume -R? [n] y Hunk #1 succeeded at 1 with fuzz 1. # Ran 1 tests, 0 skipped, 1 failed. % more ls.t $ ls there is nothing here to see there is nothing here to see
OK, so, first, wtf is the whole reversed patch detected nonsense? Sigh. And second, where's the output from 'ls' going!?
Hmm, maybe cram is setting up a temp directory? That would explain a lot, and would also be a very sensible approach. It's not mentioned explicitly on the front page, but if you read into it a bit, it looks likely. OK.
Let's modify 'ls.t' to create a file:
$ touch testme $ ls
and run it...
% cram ls.t ! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,3 +1,4 @@ $ touch testme $ ls + testme # Ran 1 tests, 0 skipped, 1 failed.
Ah-hah! Now we're getting somewhere! Fix the test by making 'ls.t' read like so:
$ touch testme $ ls testme
and run:
% cram ls.t . # Ran 1 tests, 0 skipped, 0 failed.
Awesome! Dot-victory ho!
Now let's do something a bit more interesting: check out and run my PyCon 2011 talk code for ngram graphs. Starting with this in 'khmer-ngram.t',
$ git clone git://github.com/ctb/khmer-ngram.git $ cd khmer-ngram $ ls $ python run-doctests.py basic.txt
I run 'cram khmer-ngram.t' and get
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -1,4 +1,15 @@ $ git clone git://github.com/ctb/khmer-ngram.git + Initialized empty Git repository in /private/(yada, yada) $ cd khmer-ngram $ ls + basic.html + basic.txt + data + graphsize-book.py + hash.py + load-book.py + run-doctests.py + shred-book.py $ python run-doctests.py basic.txt + ... running doctests on basic.txt + *** SUCCESS *** # Ran 1 tests, 0 skipped, 1 failed.
After getting cram to fix the file (using -i), and re-running cram, it now chokes at exactly one place; betcha you can guess where...:
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -1,5 +1,5 @@ $ git clone git://github.com/ctb/khmer-ngram.git - Initialized empty Git repository in /private/(yada, yada) + Initialized empty Git repository in /private/(different yada) $ cd khmer-ngram $ ls basic.html # Ran 1 tests, 0 skipped, 1 failed.
Right. How do you deal with output that does change unpredictably? Easy! Throw in a wildcard regexp like so
Initialized empty Git repository in .* (re)
My whole khmer-ngram.t file now looks like this:
$ git clone git://github.com/ctb/khmer-ngram.git Initialized empty Git repository in .* (re) $ cd khmer-ngram $ ls basic.html basic.txt data graphsize-book.py hash.py load-book.py run-doctests.py shred-book.py $ python run-doctests.py basic.txt ... running doctests on basic.txt *** SUCCESS ***
And I can run cram on it without a problem:
. # Ran 1 tests, 0 skipped, 0 failed.
Great!
I love the regexp fix, too; none of this BS that doctest forces upon you.
So, the next question: how do multiple tests work? If you look above, you can see that it's running all the commands as one test. Logically you should be able to just separate out the block of text and make it into multiple tests... let's try adding
I'll add in another test: $ ls
to the khmer-ngram.t file; does that work? It looks promising:
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -17,3 +17,12 @@ I'll add in another test: $ ls + basic.html + basic.txt + data + graphsize-book.py + hash.py + hash.pyc + load-book.py + run-doctests.py + shred-book.py # Ran 1 tests, 0 skipped, 1 failed.
and it sees two tests... but, after fixing the expected output using 'cram -i', I only get one test:
. # Ran 1 tests, 0 skipped, 0 failed.
So it seems like a little internal inconsistency in cram here. Two tests when something's failing, one test when both are running. No big deal in the end.
And... I have to admit, that's about all I need for testing/checking course materials! The cram test format is perfectly compatible with ReStructuredText, so I can go in and write real documents in it, and then test them. Command line testing FTW?
And (I just checked) I can even put in Python commands and run doctest on the same file that cram runs on. Awesome.
Critique:
The requirement for two spaces exactly before the $ was not obvious to me, nor was the implicit (and silent, even in verbose mode) use of a temp directory. I wiped out my test file a few times by answering "yes" to patching, too. What was up with the 'reversed patch' foo?? And of course it'd be nice if the number of dots reflected something more granular than the number of files run. But heck, it mostly just works! I didn't even look at the source code at all!
Verdict: a tentative 8/10 on the "Can titus use your testing tool?" scale.
I'll try using it in anger on a real project next time I need it, and report back from there.
--titus
p.s. To try out my full cram test from above, grab the file from the khmer-ngram repo at github; see:
https://github.com/ctb/khmer-ngram/blob/master/cram-test.t .
posted at: 18:46 | path: /mar-11 | 5 comments
Sat, 04 Apr 2009
What should Titus talk about next year at PyCon?
My talk this year kinda sucked -- more on that later -- and I am trying to come up with good and perhaps even non-testing talk ideas for next year.
One intriguing idea contributed by Brian Dorsey is that of giving 5 lightning talks in a 30 minute session. Since I like live demos, I could give five completely different demos, e.g.
- building and testing CPython across the Snakebite network, 30+ machines
- Django's test framework, from an outsider/testing-fiend perspective
- time-series analysis of code coverage data
- N packages in 5 minutes - trying to easy_install audience-suggested packages.
and whatever else comes to mind. Ideas welcome ;).
I may submit a separate talk proposal on my Web development course, too, but I have to see how it goes this fall.
--titus
posted at: 10:30 | path: /apr-09 | 7 comments
Wed, 25 Mar 2009
Twitter Ho!
OK, I'm going to try out twitter for the first time, in order to see if it works out at PyCon for keeping track of what's going on and letting people know what I'm up to. I guess you have to e-mail me to get in touch with me, though, as I'm not following anyone (and I don't expect to keep using twitter after PyCon).
Anyway, my twitterifick name (twitter handle?) is ctitusbrown. Boring as you'd expect.
--titus
posted at: 19:07 | path: /mar-09 | 4 comments
Tue, 02 Dec 2008
PyCon review process
We're going through the PyCon '09 review process, and participating in the process has been pretty interesting. (I joined the Program Committee in large part because I was told to put up or shut up after I critiqued PyCon '08. Ahh, the open source world... where you're encouraged to go fix things when you complain :). In particular, this is the first review process I've seen where regular communication between the reviewers and authors is expected, and proposals are modified in response to reviewer comments.
There are a couple of drawbacks to this process. One is that there's no clear boundary between reviewer opinion and expectation. It's one thing to say "I don't understand X, Y, or Z in your proposal; could you clarify, please?" and another to say "I don't agree with X, Y, or Z, and I won't push your proposal unless you change your views." The former seems pretty legit, but the latter strikes me as being counter to the conference ethos of encouraging diversity in views. While I don't think anyone has been that explicit, there have been extensive conversations between reviewers and authors that have had much the same effect...
Another drawback of this system is that authors can express their frustration at reviewer comments pretty directly. Sometimes this frustration is legit, but other times it's hysterically off-base; you don't make any friends when you tell your reviewers that they're idiots (just as one purely hypothetical example...)
On the flip side, I think several proposals have been dramatically improved through reviewer feedback. I don't know how well this kind of insta-review process might work for academic journals -- I believe PLoS One is trying it out? -- and I'll be watching some of the early experiments with interest.
Paranthetically, let me add that we have a bunch of great proposals, and Ivan Krstic is doing a fantastic job of running things! So I expect PyCon '09 to be a very good conference.
--titus
posted at: 18:01 | path: /dec-08 | 1 comments
Mon, 24 Mar 2008
The (Lack of) Testing Death Spiral
At PyCon '08, I gave a talk on testing and the OLPC project where I referred to the "Testing Death Spiral". My accompanying slide, which aimed to be simple rather than comprehensive, had this scenario:
Write a bunch of code & manually test it.
(Good so far.)
Start adding features over here.
Watch code break over there.
Rinse, lather, repeat
(Where do you think this ends?)
OK, so that format doesn't really work in a blog post, but hopefully you get the gist of the scenario. This is a scenario I see a lot: a project gets hacked together & works well enough that people start using it; then the project starts to expand. Many new features are added. However, as these new (and presumably solid) features are being added, the old code becomes increasingly ignored, uncovered by manual testing, and fragile.
This is a simple consequence of an inescapable fact: the amount of testing needed to detect regressions scales with the number of features. Forget about finding new bugs in the code you just wrote -- I'm talking about breaking existing code.
I have seen people attempt to escape this scenario in a number of ways: improve the architecture and reduce internal linkages; open source it; release early, release often; alpha- and beta-test it; stop adding new features; and probably many more. These are all good thoughts, but they are all doomed to failure [1]. Nonetheless, I wish you well.
The only solution I have found is this: write automated tests.
Before I continue, let me say: automated tests are not a panacea. Writing good code is hard, getting your project "out there" is important, exploratory testing is mandatory, and writing appropriate automated tests is hard; there's a lot more to building software than writing good, automated tests. I stress that every time I talk about test automation. I just think automated tests are necessary [2].
Let us suppose, for the sake of argument, that you have some software that is actively evolving. Furthermore, this software has no automated tests. Every time you add a feature, you test the bejeezus out of that feature in order to satisfy yourself that it works. You do this for every new feature that is added, and thus consider your software to be solid.
I now have two questions to ask:
are you adding features in isolation from each other? that is, is your architecture such that each new features only uses non-state-changing code from elsewhere in your project?
(if the answer is yes -- are you sure?)
do you completely control the packages, libraries, compiler, operating system, and hardware that your software runs on?
(if the answer is yes, do you plan to never, ever, change any of those components? and have you discussed these plans with anyone outside your development team? and do you believe your managers?)
I like to summarize these questions this way: are you feeling lucky, punk?
If the answer is "yes" to all of the above, then congratulations -- you are Apple, stuck at one point in time, and never planning to release a new piece of software or hardware :). Hopefully you'll do better than Apple did before they decided to change and adapt...
If the answer is "no" to any of the above, I encourage you to read on.
I assert that even with a perfectly decoupled architecture, brilliant software engineers, and nigh-complete control over the software and hardware that you use -- in itself a dream software development situation -- you will eventually need to add features that crosscut that architecture, and you will also need to upgrade the compiler, libraries, language version, operating system, and hardware. In order to make sure that your software still works, each time you add a feature or change a component, you will have to retest every feature and every piece of code. And, if you have no automated tests, you will have to do this manually. Every time.
If you have automated tests, however, your development process could look something like this:
- change code
- run tests
- commit
- test manually, do exploratory testing
- find bugs, write automated tests to reduce bugs
- goto 1
Even if you don't add any new features, this process applies to library, compiler, platform, and hardware changes. At the least, you will be able to quickly determine if you've broken something that you're testing for; at the best, you will be able to quickly and confidently release new versions of your software.
Fundamentally, then, automated testing is important for software maintenance. And since the cost of software maintenance is a significant portion of the cost of developing the software in the first place [3], it behooves you to pay attention to anything that will reduce the cost of software maintenance. This is without even considering other aspects of test utility, like increased developer velocity, ease of refactoring, increased confidence in your software, etc.
This maintenance situation is the scenario that led me into testing (or, if you prefer, "illuminated me as to the importance of automated tests by whacking me over the head with a clue bat".)
Let me assure you that this maintenance situation doesn't just apply to large bodies of code, either. I maintain a number of small projects and having automated tests means that I simply don't release code with regressions. Moreover, when my small projects "grow up" into bigger ones -- or, more frequently, are used in larger projects -- I'm not stuck in a situation where I suddenly have to write a bunch of tests to achieve stability. I always try to grow my test framework organically with the project, because I will never have the time to put into writing tests from scratch for my bigger projects.
So, automated tests are important for maintenance, and they are critical for making sure that your old code still works while you focus on new code. Without automated tests, you will be doomed to releasing increasingly buggy software as your body of code increases and the average level of testing decreases.
Does this actually happen?
This is precisely the scenario that led to our consulting work with ARINC, which went well. (As in, they're adding new features with great confidence after we helped them adopt automation tools and practices.)
This is also the scenario that leads to what Jamie Zawinski named the Cascade of Attention Deficit Teenagers. Open Source projects, facing a continually increasing number of bugs, often opt to completely rewrite their components in the expectation that this time, they'll get it right. This completely ignores our experience with software rewrites, which suggests that (barring brilliance and luck) any rewrite will contain as many bugs as the original software -- they'll just be different bugs. (As JWZ points out, though, it's more fun to write new code than to fix the crud someone else wrote before...)
And, finally, it is also the scenario faced by the One Laptop Per Child project, which has built a tower of cards on open source software. Their build system pulls in about fifty distinct packages live from the Internet, compiles them all, and then layers the Sugar user interface on top of them.
There is no automated testing in place.
OK, back to the Software Testing Death Spiral. What happens to projects that lack both automated tests and an exponentially increasing team of testers? Starting somewhere in the middle of the process:
- They manually test the new features and bug fixes that they've just added.
- They release their software.
- Their software breaks in unexpected locations, bug reports are filed, and (optimistically) those bugs are fixed. Go to #1.
The inevitable consequence is a death spiral, barring only a complete rewrite (which will possibly fail, or likely lead to a product that's just as buggy, but with unknown bugs), trashing of the project, OR -- and this is an optimistic scenario -- the adoption of automated testing.
Here are a few straw men, with moderately snarky replies:
"We don't test, and we don't use version control. Which is more important?" Version control. But you're doomed, anyway.
"We don't have time to test." Why do you have time to write software, but not time to make sure it works?
"We don't have the expertise to build good tests, and/or we can't afford the tools, and/or we don't know how to use them." This is a pretty realistic scenario, actually. May I suggest: hire consultants, or read some good books, or dedicate your young new hire to learning the tools?
"We don't like to test." Well, at least you're honest ;). I would summarize your choices like this: either you can write crappy software, or you can learn to like testing. The former will most likely doom you to the rubbish bin of history. The latter gives you a better chance of "making it".
"We really do plan to rewrite our software in two years." Points for honesty, again! I think you're rolling the dice -- many software projects fail, but maybe you'll do better. Might I suggest an incremental rewrite rather than a complete rewrite? (For that you'll need testing, though...)
"We wrote a bunch of automated tests. They didn't help us. Ahh, a problem based in actual experience! I would like to suggest -- with no background in your particular problem -- that you try out several different kinds of tests, like functional tests or regression tests, and see what does help you.
"How do I test, if I don't know what the right answer is, anyway?" How do you know you got the right answer, then? If your customers don't care if you're right, then you've stumbled into a gold mine, but I daresay it will end badly. (This straw man was actually sighted at PyCon -- sorry, MC.) I hear this a lot in research, actually, but it's still nonsense. Perhaps another blog post in there...
"I can't convince my boss/team leader/PI that it's important to spend the time to write tests. (I even sent him/her your blog post.)" You could go one of three ways: try harder, integrate testing into your personal development strategy and view this situation as an opportunity to "manage up", or quit. The middle option is the interesting one: you can quietly start writing automated tests to "fence in" your own code, and explain to your boss that this is just how you code -- it's like using emacs instead of vi -- and you're not insisting that anyone else follow suit. Hopefully your productivity will not decrease much, while your reliability will increase. Good fellow programmers may follow suit and at some point your manager might realize that you've all evaded his dictat. Or not. But it beats working on untested code!
"I am but one lone programmer, and I can't convince my team to write/use tests. (I even sent them your blog post.)" See previous question/answer: you will find that most worthwhile programmers are in favor of anything that increases their productivity and reliability.
"There's so many other things to straighten out on my project before I can even think about what tests to write." I sympathize, I really do, but if your project is so undirected that you can't even figure out what it's supposed to do (and write tests for it) then you have far bigger problems than bad code to worry about.
"I took your advice and wrote tests. Then we changed a bunch of stuff, and now all the tests break, and I don't have time to fix them. What do I do now?" Hmm, this is a common complaint. First, try to separate out a subset of the tests that are of immediate use to you (as in, they pass and/or they exercise a lot of your code). Keep that subset working. Second, don't be afraid to simply delete your old tests. Tests should not be a maintenance headache; if you like and use tests, but don't see the point of maintaining a bunch of your broken tests, get rid of them! Then put new ones back in as necessary.
There really are a bunch of other reasons to write automated tests, too. For example, consider:
- cross-platform development is dramatically simplified when you have a moderately thorough test suite. In particular, you can develop on your favorite machine, in your favorite programming environment, and let the continuous integration boxes run and test your code on all the other machines.
- setting up new development environments and development machines is much easier when you can simply ... run the tests to figure out if it's all working.
- integrating new people into the development team is much easier when they can run tests to figure out if they just broke something.
- releasing "a quick bugfix" is a lot easier when you can be fairly confident that your quick new release is no more broken than your last release.
If these aren't enough to make you think seriously about testing, then I give up!
There's no real conclusion to this :). I'll talk more about the OLPC stuff later.
Don't get me wrong: testing is hard. Testing effectively is even harder. There are ways around this, but the best way to start may be to simply power through: write a bunch of tests, and ruthlessly discard those that don't help. Then refine your method over time. I have some advice to offer here, too, but that's for another blog post...
And remember... Darth Vader recommends testing!
--titus
p.s. Thanks to Tracy Teal, Lisa Crispin, Alex Gouaillard, Kumar McMillan, Shannon -jj Behrens, and Doug Hellmann for comments!
| [1] | E-mail me if you think I should write about why :) |
| [2] | I can blog about "necessity" vs "sufficiency", too. Let me know. |
| [3] | I've heard estimates of 80-90% of the total cost of development for a successful software project, i.e. initial feature development is 10-20%, maintenance is 80-90%, but I have no good references for this. |
posted at: 21:35 | path: /mar-08 | 19 comments