Sun, 27 Jan 2008
Building test fixtures for PostgreSQL
I'm having trouble with some tests of a PostgreSQL-based system. Briefly, I have a set of functional tests that
- create a new database
- populate it with a data model
- run a Web server (in-process)
- test the integrated Web server - database functionality
The tests are now slow enough that I'm averse to writing new ones, so it's becoming important for me to figure out how to run them faster.
The main time sink appears to be in the fixtures, where I create a new database. Actually creating an empty postgres database is slow: it takes 18 seconds (on my server, normally a pretty fast computer...) to create a new, empty database.
So, how can I get a known-good database in place quickly?
The most obvious route is for me to do dev tests with something small and fast (sqlite?), but I can't switch to another database system because I'm using PostgreSQL-specific features.
I poked around the PostgreSQL documentation and tried using template databases but the problem persists: createdb is just slow to run.
I can't figure out how to build user-accessible snapshots (to which I could revert after tests...) and Point-in-Time-Recovery is only for superusers; I don't want users to have to be postgresql superusers to run my tests.
The parameters of the problem:
- At a minimum, my test fixtures need to (quickly!) construct a test-only database with a pre-loaded SQL data model, containing no data.
- Ideally, I would be able to specify a single snapshot and then revert to that snapshot at any time.
- No sysadmin access should be required, and certainly no raw filesystem manipulations should be required.
Any ideas? As usual, either comment or drop me a line.
--titus
posted at: 17:03 | path: /jan-08 | 13 comments
Motif searching with Cartwheel: a screencast
I spent some time over the last week adding fairly simple motif searching to Cartwheel, my bioinformatics site for biologists doing cis-regulatory analysis of genomic sequence. The new features include the ability to define and search with IUPAC and position-weight matrix (PWM) motifs, as well as visualization of motif search results on actual sequence.
I made a 5 minute screencast demo; take a look!
--titus
posted at: 00:03 | path: /jan-08 | 0 comments
Wed, 23 Jan 2008
Testing for sysadmins -- monitoring your infrastructure
Noah and Grig have been CCing me on a conversation about JoelOnChecklists and Grig's post. Noah's writing a book chapter on this stuff, and asked for some tips.
Here are mine.
First, I have a bunch of individual twill scripts in a directory that are run every hour. These scripts are mostly of the form,
% cat neuro-is-alive go http://neuro.caltech.edu/ code 200 find "Shimojo"
That is, they verify that the host is alive, successfully serving content, and serving content with the right keyword.
I run them from cron:
10 * * * * /usr/bin/twill-sh /u/t/.tests/* > /dev/null
and they have been invaluable for telling me when machines are broken. Obviously they don't replace "proper" monitoring software, but they do detect downtime, misconfigurations, etc. -- and they're really cheap to write/update/disable. (Remember, folk, KISS...)
Second, I use twill to test my DNS setup. I have a few scripts that are run hourly (see mechanism above) against both my master name server and the public caching name servers provided by my ISP:
extend_with dns_check dns_a alife.org. 134.10.15.75 $dns_server
This gives me security that my entire DNS system is working, and also lets me do "test-driven name service", where I can write the test first ("I want the A record for alife.org to point to X.Y.Z"), then write the bind config & verify that it works.
I think I have managed to mildly annoy my ISP a few times by asking why their name servers were returning bad or outdated or inconsistent information ;)
Third, to test mailman installs and the queue runner (which has a habit of dying on my machine ;(:
I set up the following: one of my machines sends a message to a mailman list, which forwards to a single alias file, which in turn saves the message to a Web-accessible location. The saved message is wiped each time a new message is sent.
Another machine checks that Web-accessible location for correct content using the twill tests (above).
It will fail if the saved message is ever not wiped -- I didn't bother putting a time stamp in ;) -- but this gives me some security that I will detect system-wide mailman failures in the future.
This test setup also serves as a fairly simple test of e-mail configuration and delivery.
I think I can simplify this third test by adding some sendmail commands into twill that allow sending of an e-mail containing a unique identifier, followed by a check of the list archive mbox. I'd have to write new code for that, though, and the above fits my needs.
These simple tests really keep my machines on the straight-and-narrow. Since I run most of this stuff for fun and not for profit, this simple and easily maintainable system test infrastructure is all I really need.
--titus
posted at: 13:34 | path: /jan-08 | 0 comments
"Computational Approaches to Finding and Analyzing cis-Regulatory Elements"
I just finished a chapter for a book, Methods in Avian Embryology, being edited by my boss, Marianne Bronner-Fraser. This chapter is intended for developmental biologists who are interested in locating regulatory modules and analyzing them for binding sites. It ended up being my outlet for a compilation of problems and drawbacks to computational search, and so it might be useful to non-biologists, too. I'd be happy to pass a copy on to people who are interested; drop me a line.
--titus
posted at: 13:34 | path: /jan-08 | 0 comments
Sun, 20 Jan 2008
Dear Lazyweb: Config file guidelines?
Tracy recently asked me if there were any good guidelines about how to write configuration files -- not coding-level guidelines, but guidelines on structure and content.
I was unable to come up with anything: my Google-fu failed me, and my DevonThink database was silent (although it did have some nice testing articles, which I of course forwarded on to her).
I must admit to a general mental block on the subject of how to write config files. How do you choose the right balance between configurability and complexity? What about language -- should they be native Perl/Python/whatnot (Tracy's package is for programmers) or is there a benefit to embracing the mental overhead of config file parsing?
Thoughts and pointers appreciated, either in comments or via email.
--titus
posted at: 15:03 | path: /jan-08 | 7 comments