Wed, 23 Jan 2008
Testing for sysadmins -- monitoring your infrastructure
Noah and Grig have been CCing me on a conversation about JoelOnChecklists and Grig's post. Noah's writing a book chapter on this stuff, and asked for some tips.
Here are mine.
First, I have a bunch of individual twill scripts in a directory that are run every hour. These scripts are mostly of the form,
% cat neuro-is-alive go http://neuro.caltech.edu/ code 200 find "Shimojo"
That is, they verify that the host is alive, successfully serving content, and serving content with the right keyword.
I run them from cron:
10 * * * * /usr/bin/twill-sh /u/t/.tests/* > /dev/null
and they have been invaluable for telling me when machines are broken. Obviously they don't replace "proper" monitoring software, but they do detect downtime, misconfigurations, etc. -- and they're really cheap to write/update/disable. (Remember, folk, KISS...)
Second, I use twill to test my DNS setup. I have a few scripts that are run hourly (see mechanism above) against both my master name server and the public caching name servers provided by my ISP:
extend_with dns_check dns_a alife.org. 134.10.15.75 $dns_server
This gives me security that my entire DNS system is working, and also lets me do "test-driven name service", where I can write the test first ("I want the A record for alife.org to point to X.Y.Z"), then write the bind config & verify that it works.
I think I have managed to mildly annoy my ISP a few times by asking why their name servers were returning bad or outdated or inconsistent information ;)
Third, to test mailman installs and the queue runner (which has a habit of dying on my machine ;(:
I set up the following: one of my machines sends a message to a mailman list, which forwards to a single alias file, which in turn saves the message to a Web-accessible location. The saved message is wiped each time a new message is sent.
Another machine checks that Web-accessible location for correct content using the twill tests (above).
It will fail if the saved message is ever not wiped -- I didn't bother putting a time stamp in ;) -- but this gives me some security that I will detect system-wide mailman failures in the future.
This test setup also serves as a fairly simple test of e-mail configuration and delivery.
I think I can simplify this third test by adding some sendmail commands into twill that allow sending of an e-mail containing a unique identifier, followed by a check of the list archive mbox. I'd have to write new code for that, though, and the above fits my needs.
These simple tests really keep my machines on the straight-and-narrow. Since I run most of this stuff for fun and not for profit, this simple and easily maintainable system test infrastructure is all I really need.
--titus
posted at: 13:34 | path: /jan-08 | 0 comments
Fri, 30 Nov 2007
wsgi_intercept has a new home & maintainer
Hi folks,
just a quick note -- Kumar McMillan has offered to take over wsgi_intercept. You can see the new project over at code.google.com, http://code.google.com/p/wsgi-intercept/.
While I will miss the income from the project, I think that Kumar will treat it well.
--titus
posted at: 22:03 | path: /nov-07 | 0 comments
Mon, 12 Feb 2007
This is simple with twill
I don't feel like I need to "defend" twill -- it's successful beyond both my expectations and my cognizance (I have no idea who's actually using it, but it's apparently a lot of people!), but I may need to promote it better. I ran across this post earlier today. It shows how you can use mechanize to do some simple "screen scraping," and it spurred me to check the following scripts into a new "advocacy" section of the twill archive.
Here's how you can use a twill script to do what Greg did with mechanize:
add_extra_header User-Agent "Mozilla/5.0 (compatible; MyProgram/0.1)" go http://www.python.org/ show
You can also use straight Python, if that's your poison:
from twill.commands import *
import twill
add_extra_header('User-Agent', 'Mozilla/5.0 (compatible; MyProgram/0.1)')
go("http://python.org/")
html = twill.get_browser().get_html()
print html
Note that twill is based on mechanize, and so promoting twill doesn't mean pushing mechanize down. mechanize is amazingly powerful -- but sometimes you want to just go grab some HTML, and twill (tries to) make that easy.
One unexpected result of all this -- I discovered that the function get_browser() wasn't actually exported from twill.commands by default, which is silly. It's the first function I wanted to call in order to do something minimally complex. So now the API is one call bigger ;). One more corner rubbed off...
--titus
posted at: 00:03 | path: /feb-07 | 2 comments