Sun, 13 Mar 2011
Trying out 'cram'
I desperately need something to run and test things at the command line, both for course documentation (think "doctest" but with shell prompts) and for script testing (as part of scientific pipelines). At the 2011 testing-in-python BoF, Augie showed us cram, which is the mercurial project's internal test code ripped out for the hoi polloi to use.
Step zero: wonder-twin-powers activate a new virtualenv!
% virtualenv e % . e/bin/activate
Step one: install!
% pip install cram
... that just works -- always a good sign!
OK, let's test the bejeezus out of 'ls'.
% mkdir cramtest % cd cramtest
Next, I put
$ ls
into a file. Be careful -- you apparently need exactly two spaces before the $ or it doesn't recognize it like a test.
Now, I run:
% cram ls.t
and I get
. # Ran 1 tests, 0 skipped, 0 failed.
Awesome! A dot!
The only problem with this is that when I run 'ls' myself, I see:
ls.t ls.t~
Hmm.
As a test of the cram test software, let's modify the file 'ls.t' to contain a clearly broken test, rather than an empty one:
$ ls there is nothing here to see
and I get
! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,2 +1,1 @@ $ ls - there is nothing here to see # Ran 1 tests, 0 skipped, 1 failed.
OK, so I can make it break -- excellent! Cram comes advertised with the ability to fix its own tests by replacing broken output with actual output; let's see what happens, shall we?
% cram -i ls.t ! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,2 +1,1 @@ $ ls - there is nothing here to see Accept this change? [yN] y patching file /Users/t/dev/cramtest/ls.t Reversed (or previously applied) patch detected! Assume -R? [n] y Hunk #1 succeeded at 1 with fuzz 1. # Ran 1 tests, 0 skipped, 1 failed. % more ls.t $ ls there is nothing here to see there is nothing here to see
OK, so, first, wtf is the whole reversed patch detected nonsense? Sigh. And second, where's the output from 'ls' going!?
Hmm, maybe cram is setting up a temp directory? That would explain a lot, and would also be a very sensible approach. It's not mentioned explicitly on the front page, but if you read into it a bit, it looks likely. OK.
Let's modify 'ls.t' to create a file:
$ touch testme $ ls
and run it...
% cram ls.t ! --- /Users/t/dev/cramtest/ls.t +++ /Users/t/dev/cramtest/ls.t.err @@ -1,3 +1,4 @@ $ touch testme $ ls + testme # Ran 1 tests, 0 skipped, 1 failed.
Ah-hah! Now we're getting somewhere! Fix the test by making 'ls.t' read like so:
$ touch testme $ ls testme
and run:
% cram ls.t . # Ran 1 tests, 0 skipped, 0 failed.
Awesome! Dot-victory ho!
Now let's do something a bit more interesting: check out and run my PyCon 2011 talk code for ngram graphs. Starting with this in 'khmer-ngram.t',
$ git clone git://github.com/ctb/khmer-ngram.git $ cd khmer-ngram $ ls $ python run-doctests.py basic.txt
I run 'cram khmer-ngram.t' and get
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -1,4 +1,15 @@ $ git clone git://github.com/ctb/khmer-ngram.git + Initialized empty Git repository in /private/(yada, yada) $ cd khmer-ngram $ ls + basic.html + basic.txt + data + graphsize-book.py + hash.py + load-book.py + run-doctests.py + shred-book.py $ python run-doctests.py basic.txt + ... running doctests on basic.txt + *** SUCCESS *** # Ran 1 tests, 0 skipped, 1 failed.
After getting cram to fix the file (using -i), and re-running cram, it now chokes at exactly one place; betcha you can guess where...:
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -1,5 +1,5 @@ $ git clone git://github.com/ctb/khmer-ngram.git - Initialized empty Git repository in /private/(yada, yada) + Initialized empty Git repository in /private/(different yada) $ cd khmer-ngram $ ls basic.html # Ran 1 tests, 0 skipped, 1 failed.
Right. How do you deal with output that does change unpredictably? Easy! Throw in a wildcard regexp like so
Initialized empty Git repository in .* (re)
My whole khmer-ngram.t file now looks like this:
$ git clone git://github.com/ctb/khmer-ngram.git Initialized empty Git repository in .* (re) $ cd khmer-ngram $ ls basic.html basic.txt data graphsize-book.py hash.py load-book.py run-doctests.py shred-book.py $ python run-doctests.py basic.txt ... running doctests on basic.txt *** SUCCESS ***
And I can run cram on it without a problem:
. # Ran 1 tests, 0 skipped, 0 failed.
Great!
I love the regexp fix, too; none of this BS that doctest forces upon you.
So, the next question: how do multiple tests work? If you look above, you can see that it's running all the commands as one test. Logically you should be able to just separate out the block of text and make it into multiple tests... let's try adding
I'll add in another test: $ ls
to the khmer-ngram.t file; does that work? It looks promising:
! --- /Users/t/dev/cramtest/khmer-ngram.t +++ /Users/t/dev/cramtest/khmer-ngram.t.err @@ -17,3 +17,12 @@ I'll add in another test: $ ls + basic.html + basic.txt + data + graphsize-book.py + hash.py + hash.pyc + load-book.py + run-doctests.py + shred-book.py # Ran 1 tests, 0 skipped, 1 failed.
and it sees two tests... but, after fixing the expected output using 'cram -i', I only get one test:
. # Ran 1 tests, 0 skipped, 0 failed.
So it seems like a little internal inconsistency in cram here. Two tests when something's failing, one test when both are running. No big deal in the end.
And... I have to admit, that's about all I need for testing/checking course materials! The cram test format is perfectly compatible with ReStructuredText, so I can go in and write real documents in it, and then test them. Command line testing FTW?
And (I just checked) I can even put in Python commands and run doctest on the same file that cram runs on. Awesome.
Critique:
The requirement for two spaces exactly before the $ was not obvious to me, nor was the implicit (and silent, even in verbose mode) use of a temp directory. I wiped out my test file a few times by answering "yes" to patching, too. What was up with the 'reversed patch' foo?? And of course it'd be nice if the number of dots reflected something more granular than the number of files run. But heck, it mostly just works! I didn't even look at the source code at all!
Verdict: a tentative 8/10 on the "Can titus use your testing tool?" scale.
I'll try using it in anger on a real project next time I need it, and report back from there.
--titus
p.s. To try out my full cram test from above, grab the file from the khmer-ngram repo at github; see:
https://github.com/ctb/khmer-ngram/blob/master/cram-test.t .
posted at: 18:46 | path: /mar-11 | 5 comments
Sun, 21 Feb 2010
What's with the goat?
A new meme was born at PyCon 2010: The Testing Goat.
Or, "Be Stubborn. Obey the Goat."
The goat actually emerged from the Testing In Python Birds of a Feather session at PyCon, where Terry Peppers used slides full of goat in his introduction. This was apparently an overreaction to lolcat, but the testing goat is now being held up in opposition to the Django pony.
sigh.
--titus
posted at: 12:35 | path: /feb-10 | 0 comments
Tue, 22 Dec 2009
Why use buildbots?
I've recently turned my basilisk eye from Web testing and code coverage analysis to continuous integration, as you can see from my PyCon '10 talk and my UCOSP proposal, not to mention everyone wants a pony.
There's some confusion about what "continuous integration" means (see Martin Fowler on CI) so for simplicities sake I'm just going to talk about "buildbots" that take your code, compile it (if necessary), run all the tests across multiple platforms, and provide some record of the results. (This choice of terms is also confusing because "buildbot" is a widely used Python software package for CI. Sigh.)
why use buildbots?
So, uhh, why use buildbots, anyway?
- They build your code and run your tests without your conscious involvement.
Obvious, yes -- that is, after all, ostensibly the point of buildbots. But it has more benefits than you might immediately.
For this to work, you must have a systematized and automated build process.
You must also have some automated tests.
And your your build process and tests are being run on a regular basis, whether or not any particular developer feels like it. And if the build or tests fail, then more likely than not, something changed to make them fail -- and now you'll know.
These are all good and necessary things.
- They can build your code and run your tests in multiple environments.
buildbots can build and run your project on whatever operating systems you or your colleagues can access, and report the results to you, with a minimum of setup.
This is the main reason I use buildbots myself: to run tests on other versions of Python, and other operating systems. I'm a UNIX guy, and I develop on Linux; therefore my software usually works on Linux. My pure Python code generally works on Mac OS X, too, although I sometimes run into trouble with compiled code. But I don't ever run my software on Windows systems, because I don't have Windows handy; so my code often doesn't work on Windows. This is where a Windows buildbot comes in really handy, by catching the errors that I otherwise wouldn't even notice.
There's a more subtle point here that many people miss, which is the ability of buildbots to test dependence on a specific full stack of hardware and software. Most developers work with at most one or two build environments, including compiler or interpreter versions, operating system patchlevels, etc. The more different versions you have being tested, the more you can detect sensitivities to specific operating system or compiler or language features; whether or not cross-compiler or cross-version compatibility important to you is a different question, of course, but it's nice to know.
The most entertaining aspect of this is when buildbots detect when developers -- especially inexperienced ones -- introduce unintended or unauthorized new dependencies. "Hey, Joe, since when does our software depend on FizBuzz!?"
These latter points feed particularly into #3 and #4:
- They provide a de facto set of docs on your build & test environment.
buildbots require explicit build instructions, so if you've got one running at least your project has some form of build documentation. Not a good one, maybe not an explicit one, but something.
This is not a concern for most big open source projects, because they usually have fairly straightforward and well-documented build environments (although not all -- OLPC/Sugar was horrific!) Where I think this really helps is for small private projects and especially for for academic projects, where the level of software engineering expertise can be, ahem, poor. Having explicit build instructions that graduate student B can use to build & run the code now that graduate student A has left the project is quite helpful.
4. They are evidence that it is possible to build your code and run your tests on at least some platform.
You might be surprised how much some projects really need this kind of evidence :). As with #3, small private projects and academic projects benefit the most from this.
- They can run all the tests, even the slow ones, regularly.
This is the third reason that software professionals like continuous integration and buildbots: many tests (in particular, integration and acceptance tests) may take a loooong time to run, and developers may end up simply not running all of them. With buildbots, you can run them on a daily basis and detect problems, without distracting or defocusing your developers.
Are buildbots overkill for your project?
buildbots require setup and maintenance effort, which (in our zero-sum world) takes that effort away from developing new features, exploratory testing, etc. When does the benefit outweigh the cost?
Almost always, I believe.
For small side projects that you may not be constantly focused on, having the tests alert you when something breaks is really helpful. But even if you're in a mature software engineering setting and you have a good build process, a good set of documentation on how to build your software, and a commitment to running the tests regularly, many of the advantages above still apply. In particular, #1 (building w/o conscious effort), #2 (building across multiple environments), and #5 (running all of the tests, especially the slow ones), are advantageous for all projects.
I think buildbots aren't that useful for projects that are mostly UI (which is hard to develop automated tests for) or that are at a very early stage (where you're accumulating technical debt on a daily basis) or that depend on lots of specialized hardware. What else?
What's next?
I personally think that the technology that's out there in the Python world isn't that simple and hackable, so that's what I'm working on. I'd also like to minimize configuration and maintenance. I have a simple implementation "thought project", pony-build, that I'm hoping will address these issues. The goal is to make buildbots "out of sight, out of mind."
A secondary goal (one of many - watch this space) is to enable simple integration into a pipeline where patches can be tested, and/or automatically accepted or rejected, based on whether or not they pass tests on multiple platforms.
--titus
posted at: 12:34 | path: /dec-09 | 0 comments
Tue, 18 Aug 2009
A TiP tip - Pandokia
Victoria Laidler just announced a Pandokia release. She gave a great lightning talk on Pandokia at the PyCon '09 testing BoF and I've been looking forward to this release.
Pandokia seems like a nice way to manage test running and results analysis & reporting, and it fits a fairly unique niche. Definitely worth taking a look if you're test-obssessed and have a few minutes.
--titus
posted at: 22:27 | path: /aug-09 | 0 comments
Wed, 10 Jun 2009
A great list of testing anti-patterns
This TDD anti-pattern catalogue is truly excellent!
--titus
posted at: 17:07 | path: /jun-09 | 2 comments