Mon, 30 Apr 2007
A new BLAST parser
I spent the weekend hacking out a BLAST parsing package with pyparsing.
BLAST is a really common bioinformatics tool used to search large-ish sequence databases, and the NCBI BLAST program is probably the single most heavily used program in bioinformatics by a long shot. Unfortunately, the NCBI folk have a habit of making tools with idiosyncratic output formats, and AFAIK the only way to obtain all of the information calculated by BLAST is to parse the (human-readable) text format.
This text format is not only human-readable (and not very machine-readable) but it changes fairly regularly, breaking parsers in packages like BioPython. Since I'm already using pyparsing in twill, and I appreciate its very nice syntax, I decided to try writing a maintainable BLAST parser with pyparsing. (The other primary goals were to build a nice Pythonic API and to simplify the use of introspection.)
It took me a long time (all weekend!) to do so, but I've finally got a nice, simple API and what seems to be a largely functioning parser:
for record in parse_file('blast_output.txt'):
print '-', record.query_name
for hit in record.hits:
print '--', hit.subject_name, hit.subject_length
for submatch in hit.matches:
print submatch.expect, submatch.bits
alignment = submatch.alignment
print alignment.query_sequence
print alignment.alignment
print alignment.subject_sequence
It's not really ready for unsupervised use yet, but if anyone out there is jonesin' for a BLAST parser and wants to try this one out, please let me know via e-mail and I'll send it your way. I'd appreciate comments.
--titus
posted at: 07:57 | path: /apr-07 | 4 comments
Sun, 29 Apr 2007
Next: the moooovie
Just saw Next. Highly recommended, believe it or not -- it was a very intelligently done sci-fi movie.
Go! See it! You will enjoy it, if you're in the mood for a bit of silliness and some good ol' fashioned paranormal powers!
--titus
posted at: 12:04 | path: /apr-07 | 0 comments
Fri, 27 Apr 2007
PSF Summer Of Code planet up
I haven't seen anyone announce this, so I guess I should: there's now a Planet Python/Summer of Code site, http://soc.python.org/, hosted by yours truly.
Enjoy!
--titus
p.s. Regular blogging may resume shortly.
posted at: 12:02 | path: /apr-07 | 0 comments
Mon, 02 Apr 2007
Intermediate and Advanced Software Carpentry with Python
(Here's the blurb that I came up with for my Advanced SWC class. This particular class instance isn't open to the public, but I'm not averse to giving it again.
--titus)
What you will learn:
how to use and extend builtin advanced types in Python;
how to lay out code for ease of maintenance, reusability, and testability;
- how to profile for performance bottlenecks and improve performance with
extensions and threading;
- how to start using the wide variety of external packages that are
useful for scientists, including plotting and data analysis tools such as matplotlib, SciPy, IDLE, MPI, and Rpy;
- make your data more accessible to yourself and others with databases
and Web presentation tools;
Course benefits:
The Python programming language contains an immense number of features that are extraordinarily useful to scientific programmers and readily accessible to intermediate level developers. This course will provide an introduction to many of these features, focusing on those that will make your Python programs more maintainable, testable, accurate, and faster. This course will also introduce a number of third-party packages for development, plotting, and data-analysis that are particularly useful to scientists.
Who should attend:
Scientists who use Python for data processing, data analysis, data presentation, data management, or working with external code and libraries. An introductory knowledge of Python is assumed, as are basic concepts in object-oriented programming.
Hands-on training:
Exercises throughout this course offer immediate, hands-on reinforcement of the ideas you are learning. Exercises include:
- recipes for interacting with advanced Python builtin types;
- refactoring example programs for better code reuse and testing;
- writing unit tests, doc tests, and functional tests for existing code;
- enhancing data processing performance with psyco, pyrex, and C extensions;
- refactoring C extension code to support multithreading;
- graphing data in matplotlib;
- working with MPI in Python;
- practical work with the IDLE IDE;
- interacting with a large database via the Web;
- building a simple graphical interface for data analysis;
posted at: 09:30 | path: /apr-07 | 3 comments