Sun, 20 Apr 2008

Eating your own dogfood (but only eating half the bowl)


So I'm pretty bullish on testing for maintenance reasons. It was nice to see how well it worked out for me when a user recently reported a problem with Cartwheel.

This is what happened: third-party package (LAGAN) that the user was running through the Web interface depended on certain command-line behavior from 'sort'. Now, I wasn't aware the the command-line arguments to sort were still evolving, but apparently they are -- my latest Debian upgrade removed some options (the '+1' behavior) in favor of '-k 1'. In any case, I did this big upgrade of many packages, and didn't realize that this third-party program was now broken. (More on that later.)

The user reported weird results, so I went and verified that he'd set everything up properly and that this was in fact a real problem. Then I ran the Cartwheel automated test suite. Voila! Problem was instantly pinpointed in a reproducible manner.

I fixed the program (editing Perl, ick), re-ran the tests, and then re-ran the user's analyses. Tada, done.

OK, so, great, the tests pinpointed the error for me after the user had found it.

Why did I have to wait for a user to report it?

Because I wasn't running the tests under continuous integration on my compute server.

Why not?

Can't think of why.

What would you have done differently?

I would have made sure all my tests were passing on my compute server after I upgraded the thing, i.e. not been a schmuck.

What have we learned?

Tests are only useful if (first) you write them -- that's half the battle -- and (second) you run them. Oops.

More generally, it was fun to note that by putting a fairly high-level functional test on the batch-processing backend, I discovered a bug several levels down in my software stack -- a problem lying between a third-party package and a system utility. Unit tests wouldn't have found this bug, unless the third-party package had them (don't think so) and I was running the third-party package unit tests (good grief...)

OK, back to work.

--titus

posted at: 13:02 | path: /apr-08 | 0 comments

Tags: ,


Sat, 19 Apr 2008

John Ringo is a caricature of a wingnut


I read a lot of total crap, and one of my recurring crap authors has been John Ringo. He's a total nutjob politically, but he writes good battle scenes and is an enjoyable read once you cut through the nonsense. Still, I'm having a tough time getting through the opening chapters of The Last Centurion. In this book, Ringo constructs a near-future world where Hillary Clinton is president, global cooling is the problem, and the chemicals from processed food and big farming are life saving.

Let's take those one at a time.

One of Ringo's favorite tropes is that the left, and the Clintons especially, are what's wrong with America. It's hard to convey the dripping scorn with which he discusses these topics, but it involves a lot of naughty words. In this book, Hillary Clinton (or a straw woman facsimile thereof) is president through the Big Chill and the simultaneous deadly bird flu outbreak, and she makes every mistake possible. While Hillary Clinton is not my favorite politician, it's worth noting that our current president (who can do little wrong in Ringo's eyes) has actually made almost every mistake possible, and this makes Ringo's text unbearably difficult to read. If Ringo is hoping to even tell a good story, much less sway anyone's opinion, he'd be better off with less in the way of textual histrionics.

Another one of Ringo's tropes is that the global warming hypothesis is nonsense. Not only does he mention this frequently, but he literally pauses in the middle of his books to deliver four page diatribes on the subject. In this latest book, Ringo makes the next big climate change event a major solar COOLING, which has predictable effects on the food supply. Now, I'm a scientist and a lefty, and I've even worked on science relevant to climate change, so presumably (by Ringo's criteria) I am unfit to comment, being moderately knowledgeable. But when your social commentary depends entirely on fiction, it loses any relevance and becomes a distraction.

The most interesting novelty in this book (which presumably will become another abortive series, to join the ranks of his other five unfinished series?) is the device where American lives are saved by having eaten so many processed foods. As far as I can tell, the idea is that eating processed foods conveys resistance to chicken flu, and this leads to a dramatically greater survival rate in America. I'm not sure why this device is in the book, unless it's another imaginary nail in Ringo's imaginary coffin of liberalism. Whyever it's there, it's entertainingly stupid -- there's plenty of evidence that weird, random chemicals do weird, random things to your DNA, and that's one reason why cancer is so prevalent. There's no reason at all to believe that these chemicals would somehow "cancel out" bird flu. But what do I know? I'm just a molecular freakin' biologist...

Combine all that with Ringo's inimitable writing style in which no breasts are too big, no hero goes unfucked by multiple (large-breasted) women, and no terrorist goes unpunished, and these books are truly a piece of work. I do not, however, mean "of art". In fact, this last book is so outlandish that I'm actually becoming a bit suspicious of Ringo's sincerity. It's hard to read such complete and utter crap without thinking that perhaps the author is secretly making fun of the very viewpoints he is espousing. But it's been a consistent trend towards lunacy thus far, so I'm inclined to believe that he's actually somewhat sincere.

Anyway, here's my judgement: Ringo's latest book is masturbatory fodder for hard right wingers, and it's becoming increasingly difficult to enjoy his books if you're not actually lobotomized. Luckily that ensures him an 18% market.

--titus

posted at: 03:02 | path: /apr-08 | 6 comments

Tags:


Threading and subprocess


I'm having a long-running discussion with some people about threading and why using threads with simple subprocess calls is almost certainly an overcomplicated (== BAD) use of threads. Everyone seems to think I'm wrong (at least, there's either deafening silence or straight out argument ;) and I think I finally figured out why.

The task at hand: use subprocess to run some command (say, 'ping') a bunch of times. Because the command is I/O bound, you want to run the commands in parallel. Should you use threads to do this? Is it necessary in order to achieve good performance?

Well, consider these two examples ('common.py' is down at the bottom; it just contains the list of IP addresses to ping, and a function to call subprocess.Popen).

nothread.py:

from common import IP_LIST, do_ping

z = []
for i in range(0, len(IP_LIST)):
   p = do_ping(IP_LIST[i])
   z.append(p)

for p in z:
   p.wait()

thread.py:

import threading
from common import IP_LIST, do_ping

def run_do_ping(addr):
   p = do_ping(addr)
   p.wait()

###

# start all threads
z = []
for i in range(0, len(IP_LIST)):
   t = threading.Thread(target=run_do_ping, args=(IP_LIST[i],))
   t.start()
   z.append(t)

# wait for all threads to finish
for t in z:
   t.join()

Both of these work fine, and in both cases are easily modifiable to retrieve the output, exit status, etc. of the ping command. (In the threaded example you have to keep track of 'p' in 'run_do_ping' to retrieve this kind of info, and I wanted to keep things as simple as possible.)

They also run in about the same amount of time, although the non-threaded one is quicker by a few milliseconds for me. I think this is because thread starts & joins are extra overhead.

The key misunderstanding in the discussion seems to have been that the examples at hand were using subprocess.call, which blocks waiting for the subprocess to exit, i.e. equivalent to using this code in nothread.py:

for i in range(0, len(IP_LIST)):
   p = do_ping(IP_LIST[i])
   p.wait()

Here the pings would execute serially rather than in parallel, with the obvious performance problem :). However, you can bypass this effect of subprocess.call by using subprocess.Popen, which creates a new process that executes in parallel with the calling process.

So, for this simple use of subprocess -- running a shell command and gathering the output -- which is "better"? I think 'nothread.py' is better because it is simpler, shorter, clearer, and less complicated. Of course, as soon as you start doing more complicated stuff like reading the streams of information coming out of the subprocess commands, the threaded version may well have its advantages. But that's not the case here, I think.

Comments welcome.

--titus

common.py:

import subprocess

IP_LIST = [ '131.215.17.3',
            '131.215.17.4',
            '131.215.17.5',
            '131.215.17.16',
            '131.215.17.17',
            '131.215.17.18',
            '131.215.17.19',
            '131.215.17.24',
            '131.215.17.25',
            '131.215.17.31']

cmd_stub = 'ping -c 5 %s'

def do_ping(addr):
    cmd = cmd_stub % (addr,)
    return subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)

posted at: 01:13 | path: /apr-08 | 20 comments

Tags:


Thu, 17 Apr 2008

http://www.advogato.org/person/nconway/diary.html?start=55


posted at: 00:33 | path: /apr-08 | 0 comments


Tue, 15 Apr 2008

Some new terminology?


In some discussions with a moderately new Python programmer who seems to value complexity over simplicity, I may have coined a new term:

"Penis size" style of programming -- the (mistaken) belief that the
more advanced programming language features you use, the more
impressive your code will look.

I think it's a fair generalization to say that experienced programmers value simplicity over complexity, all other things being equal.

A search for "penis size programming" came up with this link, which is entertainingly apropos.

--titus

p.s. I originally used "dick size", but now that I'm a professor, I have to be decorous, right?

posted at: 10:35 | path: /apr-08 | 3 comments

Tags: