Sun, 01 Feb 2009
Some project changes
I'm switching several projects from darcs to either git on github, or svn on Google Code.
twill, a simple Web testing tool/HTTP driver in Python, was switched over to Google Code several months ago: see http://code.google.com/p/twill. I'll post more on twill development soon, I hope.
I just put scotch in the twill svn as well. scotch is an HTTP recording and playback tool written in Python with WSGI interfaces.
I also moved figleaf to github, at http://github.com/ctb/figleaf/tree/master. figleaf is a flexible code coverage recording and analysis tool for Python.
This move will hopefully open up development a bit more; I've been inactive for too long, and I feel that one obstacle to participation in these projects has been my self-hosting of the DVCS archives in darcs.
Note, I will probably move pinocchio, pygr-draw, zounds, and blastparser over to github as well, whenever I get around to it. I'm planning to decomission my darcs repository as I swizzle machines around while moving my virtual home over to MSU; the URLs will still work for pulling, but I'm going to stop pushing things to them.
--titus
posted at: 20:31 | path: /feb-09 | 1 comments
Mon, 29 Oct 2007
Testing is fun
I spent a fairly satisfying two days setting up unit and functional tests for figleaf, my code coverage package. I implemented the tests using a technique that I gather is called vertical slicing -- I just called it "starting a new phase of a project" before ;). The idea is that you implement a single test, with necessary fixtures, from end to end; then you widen it. This way you don't get overwhelmed with planning ahead for different features or other distractions: you just get one thing working, with the attendant psychological reward of satisfaction.
Why all the testing? Well, after twill, figleaf is the most popular of my packages, but it is also much uglier than it should be. A number of bug reports have been sent to me, and I need to deal with them. Since I am also hoping to shoehorn it into a large Python-based project, this is an apropos time.
Some of the fun new features I am addign for figleaf 0.7 include a much enhanced and more configurable reporting/annotating interface, and (something I'm particularly proud of) tracing of statements only within specific files.
Anyway, I ran into a few entertaining problems with the tests.
First, python2.4 and python2.5 differ in what they consider an executed statement. Most notably, python2.4 will call the trace function when 'pass' is executed, while python2.5 will not. This means my regression tests need to compare against version-specific "known good" data.
Second, easy_install was mucking around with my imported packages and I couldn't get the figleaf-under-test to be imported ahead of the installed version. It turns out that this code re-scans sys.path and thus imports the correct package:
import figleaf # probably unnecessary sys.path.insert(0, '/path/to/correct/figleaf') reload(sys.modules['figleaf'])
Third, testing command-line executable programs is a huge pain in the butt. There's room for someone to come up with a framework for this purpose. Right now I think texttest is probably the best framework, but I have to say that it seems overcomplicated for what it does; it would be nice to have something simpler. I think that the main features that a test framework for command-line programs needs are a way of compactly comparing test output to known-good output, and a nice way of specifying configurations (directory locations, config files, etc) so that a variety of features can be tested.
Fourth and finally, achieving true test isolation is a necessary art. Because I was fiddling with sys.settrace and some module-global-singletons in the code under test, I had to be very careful to include appropriate tests in the setup code and appropriate teardown code to remove/reset everything. I kept on running into new areas where this was a problem, but it taught me a lot about where my own code kept state. Very useful.
cheers, --titus
posted at: 00:26 | path: /oct-07 | 2 comments
Fri, 16 Feb 2007
Some figleaf goodness
Yesterday for our SoCal Piggies meeting I whipped up something I'd been thinking about doing for a while: sectioning in figleaf recording. (figleaf is my package for Python code-coverage analysis.)
It's easier to show than to explain, but briefly, I added two new functions to figleaf:
figleaf.start_section(name) figleaf.stop_section()
What these functions let me do is define a name with which code coverage can be associated. The goal is to determine what part of your code is calling some other part of your code; the explicit example I had in mind is unit testing, where it can be helpful to know which lines of code are being executed by which unit test.
It took me but a moment to add a 'figleaf-sections' plugin to pinocchio, my extensions for nose. This plugin wraps each function and method test with section coverage reporting. Again, it's easier to show than to explain, so here is some sample output:
-- all coverage --
| test_sections.test_one
| | test_sections.TestTwo.test_three
| | | test_sections.TestTwo.test_four
| | | |
+ | def setup():
+ | print 'howdy'
|
+ | def teardown():
+ | print 'bye!'
|
+ | def test_one():
+ + | assert 1 == 1
|
+ | class TestTwo:
+ | def setup(self):
+ + + | assert "setup" == "setup"
|
+ | def test_three(self):
+ + | assert 2 == 2
|
+ | def test_four(self):
+ + | assert 3 == 3
|
+ | def teardown(self):
+ + + | assert "teardown" == "teardown"
The '+' marks in the first column represent combined code coverage; this includes all coverage sections (as well as stuff that's executed outside of section coverage). The marks in the second, third, and fourth columns represent the lines of code executed by the individual nose tests, indicated at the beginning of the output.
Here you see that (as expected) the setup() and teardown() functions are executed outside the context of any test; test_one() is executed individually; and the class fixtures, TestTwo.setup(self) and TestTwo.teardown(self), are executed for both test_three(self) and test_four(self). (The function definitions are executed on module import, of course, and hence lie outside the coverage sections defined by my nose plugin.) Neat, eh?
It's even more fun to run this on real code. Here's part of twill's commands.py file, which is touched by many (most!) of the twill tests. You can see a sort of barcode of tests for each function; the go(url) function is obviously pretty important, and it's nice to see that even in the case of the code(n) function, at least one of my tests checks that the assertion is raised properly.
|
+ | def exit(code="0"):
| """
| exit [<code>]
|
| Exits twill, with the given exit code (defaults to 0, "no error").
| """
+ + + | raise SystemExit(int(code))
|
+ | def go(url):
| """
| >> go <url>
|
| Visit the URL given.
| """
+ + + + + + + + + + + + + + + + + + + + + + + + + | browser.go(url)
+ + + + + + + + + + + + + + + + + + + + + + + + + | return browser.get_url()
|
+ | def reload():
| """
| >> reload
|
| Reload the current URL.
| """
+ + + + + | browser.reload()
+ + + + + | return browser.get_url()
|
+ | def code(should_be):
| """
| >> code <int>
|
| Check to make sure the response code for the last page is as given.
| """
+ + + + + + + + + + + | should_be = int(should_be)
+ + + + + + + + + + + | if browser.get_code() != int(should_be):
+ + | raise TwillAssertionError("code is %s != %s" % (browser.get_code(),
| should_be))
Note that you get a kind of barcode of code execution, which is nifty.
Anyway, I think this functionality is incredibly neat, but then again I'm a sucker for my own code ;). It seems like it is more useful for what I would call "forensic code analysis," i.e. trying to understand what other people's code is doing, than it is for direct testing and analysis of your own code. Forensic code analysis is very useful, but it's difficult to sell because it's removed from what most programmers seem to think about. Or am I wrong?
I have some more code to write before I decide on its ultimate usefulness -- I'd like to be able to dissect exactly what code is run by precisely one test, and that's the next feature I'll add. I'm probably going to turn this into a lightning talk for PyCon, too; more on that at PyCon.
If you think you have a use for this, please let me know in the comments. I'm actively looking for use cases! And if you're interested in trying it out, you should be able to do something like this:
easy_install http://darcs.idyll.org/~t/projects/figleaf-latest.tar.gz # go to your test directory, and then: wget http://darcs.idyll.org/~t/projects/pinocchio-latest.tar.gz tar xzf pinocchio-latest.tar.gz easy_install pinocchio-latest nosetests --with-figleafsections figleaf-latest/annotate-sections.py .figleaf <pyfile1> <pyfile2> ...
(You need to have nose installed already, of course.)
--titus
posted at: 09:57 | path: /feb-07 | 1 comments
Tue, 06 Feb 2007
Nifty Nose Functionality
I just put the finishing touches on some automated regression tests for figleaf, my simple code coverage analysis program. In the process I found a nice use for nose's yield test constructor.
Briefly, nose lets you write code like this:
def test()
for i in range(0, 10):
yield check_num, i, 2*i
def check_num(i, j):
assert 2*i == j
This defines a set of 10 tests, each of which are executed and counted independently.
(I think this behavior is based on py.test but I could be wrong.)
I hadn't had any use for this kind of test before, but when considering how to write the figleaf tests, I realized that this would be a really neat way of basing regressions tests on individual files.
Suppose I have a directory full of .py files together with coverage information, and I want to execute each .py file and check the coverage results against the previously recorded coverage information. Easy! Here's the code:
def test():
for filename in os.listdir(testdir):
if filename.startswith('tst') and filename.endswith('.py'):
yield compare_coverage, filename, filename + '.cover'
where compare_coverage(pyfile, cover_file) is a function that executes the given filename and compares current coverage data with the pre-recorded stuff.
This saves me having to do something silly like write an individual test loader for each .py file, which would be cumbersome and perhaps brittle.
--titus
posted at: 23:42 | path: /feb-07 | 3 comments