Wed, 07 Jan 2009

What's a good Python code base?


A friend asks,

i'm going to be recoding <x> from scratch starting next week, in python.
what codebase would you recommend as good to model after?

Any thoughts on a well-formed, reasonably sized (yet not huge), and simple Python code base?

There have to be some examples somewhere! I'd suggest something in the stdlib, but nothing is coming to mind right now -- and there are some real stinkers in there, too.

For now, I've pointed my friend towards PEP 8 and the Python Cookbook. Very unsatisfying.

--titus

posted at: 12:11 | path: /jan-08 | 13 comments

Tags:


Sun, 27 Jan 2008

Building test fixtures for PostgreSQL


I'm having trouble with some tests of a PostgreSQL-based system. Briefly, I have a set of functional tests that

  • create a new database
  • populate it with a data model
  • run a Web server (in-process)
  • test the integrated Web server - database functionality

The tests are now slow enough that I'm averse to writing new ones, so it's becoming important for me to figure out how to run them faster.

The main time sink appears to be in the fixtures, where I create a new database. Actually creating an empty postgres database is slow: it takes 18 seconds (on my server, normally a pretty fast computer...) to create a new, empty database.

So, how can I get a known-good database in place quickly?

The most obvious route is for me to do dev tests with something small and fast (sqlite?), but I can't switch to another database system because I'm using PostgreSQL-specific features.

I poked around the PostgreSQL documentation and tried using template databases but the problem persists: createdb is just slow to run.

I can't figure out how to build user-accessible snapshots (to which I could revert after tests...) and Point-in-Time-Recovery is only for superusers; I don't want users to have to be postgresql superusers to run my tests.

The parameters of the problem:

  • At a minimum, my test fixtures need to (quickly!) construct a test-only database with a pre-loaded SQL data model, containing no data.
  • Ideally, I would be able to specify a single snapshot and then revert to that snapshot at any time.
  • No sysadmin access should be required, and certainly no raw filesystem manipulations should be required.

Any ideas? As usual, either comment or drop me a line.

--titus

posted at: 16:40 | path: /jan-08 | 13 comments

Tags: ,


Useless 'net arguments


The blognet is full of people posting their own opinions, and that's a good thing. What is a little less supportable is flawed argumentation.

I recently spent some time discussion a post about software engineering; I was trying to figure out why the author thought what he did. The annoying bit was that his arguments boiled down to "I'm right because I say I am, and you can't disagree because it's a matter of logic: I'm starting from the axiom that A is true, and since my conclusion (A is true!) follows from that axiom, there is no argument possible." That is true as long as you acknowledge his starting axiom, but of course interesting discussions don't ever follow from this style of argument.

I see three categories of argumentation.

First, math-logic, in which the axioms and ground truths are given, and the rules of logic are applied to reach some conclusion. Conclusions are interesting only when a non-trivial result is reached: Euclidian Geometry is an excellent example, where 5 postulates lead to a plethora of interesting results. The drawback is that math-logic is very rarely directly applicable to reality: why (for example) Hilbert spaces apply directly to physical processes, and how far we can take this kind of correspondence between formal math and messy physics, is a fascinating and open topic of discussion.

The second type of argument is scientific, in which falsifiable statements are made about the real world. The distinction here is that a scientific theory must make predictions, and if those predictions are found to be false, then the theory itself must be false as it stands. Again, conclusions here are only interesting when the predictions are non-trivial with respect to the starting point: Quantum Mechanics and the Standard Model are both good examples, because (even though they must be wrong in certain regimes) they both make a large number of testable predictions. Evolution is a bit more controversial, scientifically speaking, because the predictions it makes have more wiggle room: however, specific predictions about how protein sequences should behave have been made and verified.

Third, rhetorical (using rhetoric in the sense of "verbal communication; discourse"). In rhetorical arguments, people start from positions that may or may not be well defined, but in any case are open to disagreement; participants are seeking to convince other participants as well to explore their own arguments. When Paul Graham writes about how Essays help him reach "surprises", he is talking about the most interesting kind of rhetorical argument: one that helps you realize truths that flow naturally from your internal precepts, modified by opinions and facts from the real world.

Rhetorical arguments are by far the most common on the Internet, and people often try to masquerade them as math-logic or scientific arguments. This is because math-logic and scientific arguments are both much stronger kinds of arguments; they lead to some notion of "truth", in either logic-space or reality-space. By contrast, rhetorical arguments are much less well-defined in both their starting and ending points, and (when two stubborn people conduct them) usually end in one or both people simply leaving the discussion. This is especially true when either of the participants refuses to discuss the precepts leading to the discussion: after all, if you're not starting from the same assumed ground truth, then the discussion can only rarely conclude with agreement; the main point of the discussion should then become to figure out where you differ in your assumptions.

Getting back to the frustrating blog post interaction that led to my post, then, the author started by asserting a moderately strong statement, and then defended himself against all comers by stating that "it was simply a matter of logic". This was undoubtedly true: asserting fact A and then concluding fact A is, in fact, a matter of logic. However, it is not interesting.

After probing a bit, I figured out that:

  • the author's argument, stripped of language, really did boil down to an assertion followed by a conclusion based immediately on that assertion.
  • the author explicitly rejected the notion that their conclusion could be falsified in any way.

Thus the argument was not an interesting math-logic argument (because the conclusions followed immediately from the precepts) and it was not a scientific argument (because the conclusions were not falsifiable). The only potentially interesting part of the argument was rhetorical -- and the author also explicitly rejected discussion of his initial axiom, rendering that uninteresting.

I probably wouldn't have gotten irritated, either, if the author hadn't decided that my comments, still polite and on-topic, weren't worth posting any more. Since they continued posting the sycophantic "what a good idea! <gush>" comments, I concluded that the whole point of the post was simply to garner attention. Disappointing ;(.

Math-logic: when interesting conclusions follow immediately from assumptions, that tells you that the assumptions contained that of interest; turns into a rhetorical argument about the assumptions.

posted at: 12:43 | path: /jan-08 | 0 comments

Tags:


Sat, 26 Jan 2008

Motif searching with Cartwheel: a screencast


I spent some time over the last week adding fairly simple motif searching to Cartwheel, my bioinformatics site for biologists doing cis-regulatory analysis of genomic sequence. The new features include the ability to define and search with IUPAC and position-weight matrix (PWM) motifs, as well as visualization of motif search results on actual sequence.

I made a 5 minute screencast demo; take a look!

--titus

posted at: 22:51 | path: /jan-08 | 0 comments

Tags: ,


Wed, 23 Jan 2008

Testing for sysadmins -- monitoring your infrastructure


Noah and Grig have been CCing me on a conversation about JoelOnChecklists and Grig's post. Noah's writing a book chapter on this stuff, and asked for some tips.

Here are mine.


First, I have a bunch of individual twill scripts in a directory that are run every hour. These scripts are mostly of the form,

% cat neuro-is-alive
go http://neuro.caltech.edu/
code 200

find "Shimojo"

That is, they verify that the host is alive, successfully serving content, and serving content with the right keyword.

I run them from cron:

10 * * * * /usr/bin/twill-sh /u/t/.tests/* > /dev/null

and they have been invaluable for telling me when machines are broken. Obviously they don't replace "proper" monitoring software, but they do detect downtime, misconfigurations, etc. -- and they're really cheap to write/update/disable. (Remember, folk, KISS...)


Second, I use twill to test my DNS setup. I have a few scripts that are run hourly (see mechanism above) against both my master name server and the public caching name servers provided by my ISP:

extend_with dns_check
dns_a alife.org. 134.10.15.75 $dns_server

This gives me security that my entire DNS system is working, and also lets me do "test-driven name service", where I can write the test first ("I want the A record for alife.org to point to X.Y.Z"), then write the bind config & verify that it works.

I think I have managed to mildly annoy my ISP a few times by asking why their name servers were returning bad or outdated or inconsistent information ;)


Third, to test mailman installs and the queue runner (which has a habit of dying on my machine ;(:

I set up the following: one of my machines sends a message to a mailman list, which forwards to a single alias file, which in turn saves the message to a Web-accessible location. The saved message is wiped each time a new message is sent.

Another machine checks that Web-accessible location for correct content using the twill tests (above).

It will fail if the saved message is ever not wiped -- I didn't bother putting a time stamp in ;) -- but this gives me some security that I will detect system-wide mailman failures in the future.

This test setup also serves as a fairly simple test of e-mail configuration and delivery.

I think I can simplify this third test by adding some sendmail commands into twill that allow sending of an e-mail containing a unique identifier, followed by a check of the list archive mbox. I'd have to write new code for that, though, and the above fits my needs.


These simple tests really keep my machines on the straight-and-narrow. Since I run most of this stuff for fun and not for profit, this simple and easily maintainable system test infrastructure is all I really need.

--titus

posted at: 12:21 | path: /jan-08 | 0 comments

Tags: , ,