Fri, 04 Jul 2008

zounds, for running lots of BLASTs


I finally got sick of manually schlepping BLAST files around, so I wrote something to do it for me. 'zounds' is a very simple server/client system for coordinating a bunch of 'worker' nodes through a central server; it does everything in Python with objects and pickling, so it's easy to do extra Python-based processing on the worker nodes. See 'filters' for more info.

You can read a bit more about zounds here:

http://iorich.caltech.edu/~t/zounds/README.html

It's freely available, open-source, etc. etc.

Comments and thoughts welcome; send them to the bip list.

--titus

posted at: 12:32 | path: /jul-08 | 0 comments

Tags: ,


Tue, 17 Jun 2008

Serving XML-RPC over HTTPS with Python


We've been talking about how to manage pygr resources remotely via the existing XML-RPC interface, and for that HTTPS is a requirement. I offered to track down the code necessary for running an XML-RPC server over HTTPS. Here's what I found:

It turns out that while the Python stdlib supports HTTPS client
connections (connecting to https:// URLs), it does not directly support
HTTPS serving.  To do that, you need to use pyOpenSSL.  However, once
that's installed it's a breeze: it's as simple as this,

  server = SecureXMLRPCServer(server_address, KEYFILE, CERTFILE)

You can download the SecureXMLRPCServer code and an example here:

      http://iorich.caltech.edu/~t/transfer/xmlrpc-https.tar.gz

To run it just install pyOpenSSL ('python-openssl' under Debian),
and then execute 'python serve-ssl.py' in one shell and 'python
test-conn-ssl.py' in another.

Thanks to Laszlo Nagy for his Python Cookbook recipe which only needed a bit of fixing (for Python 2.5) and refactoring (for reusability).

The example .tar.gz above contains a private key and certification so that the code Just Works.

--titus

p.s. Ping me at titus@idyll.org if the .tar.gz file isn't accessible and I'll repost it.

posted at: 11:03 | path: /jun-08 | 0 comments

Tags:


Fri, 09 May 2008

pygr gets some summer love


(pygr is a neat bioinformatics framework in Python.)

After some commenters on my last post seemed happy to hear that pygr was the focus of some summer work, I realized I had only discussed the pygr summer work in a post to the biology-in-python list.

Whoops.

So, here's the scoop: not only is pygr the focus of Rachel McCreary's Google Summer of Code project, but Jenny Qian will be using pygr to build an ENSEMBL interface, also as part of the Google Summer of Code.

That's not all!

In addition to Rachel and Jenny (under the sterling mentorship of Chris Lee, Robert Kirkpatrick, Namshin Kim, and myself) I have two MSU students working with me over the summer, Alex Nolley and Marie Buckner. They'll both be working with pygr-related things, although like Jenny their efforts may end up being more on ways to use pygr than on pygr's code itself.

I also have a grad student or two that may drop in on pygr, if only to use it for something research-y.

So all in all, pygr will get a lot of love this summer. Hopefully we can polish the code and documentation and tutorials to the point where the learning curve is as minimal as it can get, and this fabulous package will become readily available to many others...

Why am I personally putting so much effort into pygr? Well, I've been using it more and more over the last few months, and (somewhat like scipy) it's transformed my work by turning annoyingly difficult data organization problems into trivial Python transformations. I can literally throw together a custom genome browser in a matter of hours -- I've implemented two or three already, for different projects -- and it has enabled several new research program. pygr seems to be one of those rare packages (kind of like Python itself) that is not only functional and effective but presents a unified and coherent intellectual interface. pygr is the only good middleware layer I've seen for sequence intertwingling in bioinformatics. It's not that mature yet, but it has serious promise, and I'm hoping to get in on the ground floor, so to speak :).

cheers,

--titus

posted at: 11:03 | path: /may-08 | 2 comments

Tags: ,


Wed, 07 May 2008

Dear Lazyweb: JavaScript "imagemaps" and/or image subselection?


Dear Lazyweb, help!

I'm embarking on a number of summer projects in my new lab at MSU, and several of them focus on using pygr to do cool genomic stuff. In particular, I'm planning to build a personal genome annotation system that will let people run their own full genome Web sites and annotate the genomes with private information such as Solexa data, cDNA/EST projects, ChIP-seq, cis-regulatory reporter constructs, ncRNA predictions, etc. etc. (If you're interested in this sort of thing, get in touch -- it will, of course, be open source and open development, albeit in Python :)

As I've been thinking more about how to do the display side of things, I've been running headfirst into a serious lack of knowledge. I would like to make an interface that looks somewhat like your standard genome browser/GMOD/UCSC interface, such as this UCSC view of the chicken genome. I already have the basics of that view working; for example, see this simple example and a group-feature example. But I'd like to add more - a LOT more -- interactivity.

Ideally I'd like to be able to draw simple objects (squares, rectangles, lines) on some sort of canvas and then use JavaScript and AJAX to pop up windows and display bits of information. But I don't really know this space of functionality very well.

So I'm turning to the lazyweb.

Are JavaScript+image maps the right way to go (for example, this, this, and this)? Do they work well with multiple browsers? Or are there good JS libraries for drawing images on the fly in the browser? Is SVG a good thing to look at? Were you stuck with this task, what would you use?

The most important things for this project are, in order of importance:

  • basic functionality (JS image maps seem fine for this)
  • cross-browser functionality
  • selection (e.g. GMOD RubberBandSelection)
  • flexibility: reordering and redrawing of images.

Your thoughts are much appreciated! Please drop me a line or comment, whichever is most convenient. I'll summarize the options.

thanks,

--titus

p.s. I'm perfectly fine with "Google this, dumby!" I just don't have much in the way of google keyword knowledge in this area...

posted at: 15:03 | path: /may-08 | 8 comments

Tags: ,


Sun, 20 Apr 2008

Eating your own dogfood (but only eating half the bowl)


So I'm pretty bullish on testing for maintenance reasons. It was nice to see how well it worked out for me when a user recently reported a problem with Cartwheel.

This is what happened: third-party package (LAGAN) that the user was running through the Web interface depended on certain command-line behavior from 'sort'. Now, I wasn't aware the the command-line arguments to sort were still evolving, but apparently they are -- my latest Debian upgrade removed some options (the '+1' behavior) in favor of '-k 1'. In any case, I did this big upgrade of many packages, and didn't realize that this third-party program was now broken. (More on that later.)

The user reported weird results, so I went and verified that he'd set everything up properly and that this was in fact a real problem. Then I ran the Cartwheel automated test suite. Voila! Problem was instantly pinpointed in a reproducible manner.

I fixed the program (editing Perl, ick), re-ran the tests, and then re-ran the user's analyses. Tada, done.

OK, so, great, the tests pinpointed the error for me after the user had found it.

Why did I have to wait for a user to report it?

Because I wasn't running the tests under continuous integration on my compute server.

Why not?

Can't think of why.

What would you have done differently?

I would have made sure all my tests were passing on my compute server after I upgraded the thing, i.e. not been a schmuck.

What have we learned?

Tests are only useful if (first) you write them -- that's half the battle -- and (second) you run them. Oops.

More generally, it was fun to note that by putting a fairly high-level functional test on the batch-processing backend, I discovered a bug several levels down in my software stack -- a problem lying between a third-party package and a system utility. Unit tests wouldn't have found this bug, unless the third-party package had them (don't think so) and I was running the third-party package unit tests (good grief...)

OK, back to work.

--titus

posted at: 13:03 | path: /apr-08 | 0 comments

Tags: ,