Wed, 25 Mar 2009

Twitter Ho!


OK, I'm going to try out twitter for the first time, in order to see if it works out at PyCon for keeping track of what's going on and letting people know what I'm up to. I guess you have to e-mail me to get in touch with me, though, as I'm not following anyone (and I don't expect to keep using twitter after PyCon).

Anyway, my twitterifick name (twitter handle?) is ctitusbrown. Boring as you'd expect.

--titus

posted at: 19:07 | path: /mar-09 | 4 comments

Tags: ,


Sat, 21 Mar 2009

The Use and Abuse of Keyword Arguments in Python


I'm 3/4 of the way through my first ground-up code review for pygr, and I want to gripe about something that pygr does a fair bit of: use Python's **kwargs. I don't want to escale it into a policy argument & policy decision for pygr, so I'm posting it here; hopefully I can sway minds with, well, suasion, rather than dictat.

What am I griping about, exactly? Check out this code:

class SomethingExtensible(object):
   def __init__(self, foo, bar, **kwargs):
       ...
       baz(x, y, **kwargs)

def baz(a, b, **kwargs):
   ...

Now think about how you'd read code like this in order to answer the following four questions:

  1. What keyword arguments does __init__ take?
  2. What does __init__ do with these arguments?
  3. What keyword arguments does baz take?
  4. What does baz do with these arguments?

(These would among the first questions I'd ask of the code; you too, right?)

Well, one thing you can immediately tell is that there's a big black box of arguments passed into 'baz', so there's no point in trying to completely understand the arguments to __init__ without also completely understanding the arguments to baz.

Another thing you can immediately tell is that without a pretty good docstring or a detailed examination of both __init__ and baz, you're not going to be able to begin to understand either function. And since docstrings, documentation, and comments are always wrong or incomplete anyway, you're going to have to grok the entire function.

So, basically, you're going to be lost in this code without a lot of work. Also, because there's no concise way to validate that we only received the set of kwargs we were expecting -- __init__ may not even know what those are, and people rarely check anyway because it's a bit of ugly code to do so -- you're subject to errors from misspellings: arguments that look almost right, but lack an 's' at the end, for example, when the code expects that 's' right there.

On the flip side, you do have an opaque "box" of arguments that you're passing around, and this can come in very handy if you're doing something like this:

class SomethingExtensible(object):
   def __init__(self, foo, bar, **kwargs):
      ...
      baz(x, y, **kwargs)

   def baz(a, b, **kwargs):
      ...

(note indentation of baz -- it's now a method in the class, rather than an independent function). Why is this handy? Because now you can subclass SomethingExtensible and potentially redefine 'baz' to take new arguments without having to change the constructor at all -- you just change what keyword arguments you pass into it.

So it's readability vs extensibility. I tend to argue for readability over extensibility, and that's what I'll do here: for the love of whatever deity/ies you believe in, use **kwargs sparingly and document their use when you do.

--titus

p.s. It may be that I'm missing some syntax tricks here, where I can unpack **kwargs and demand that it nicely conforms to my expectations without doing serial gets and removals:

x = kwargs.get('x', some_default)
if 'x' in kwargs: del kwargs['x']

y = kwargs.get('y', some_default)
if 'y' in kwargs: del kwargs['y']

...

assert len(kwargs) == 0, "error, unexpected kwargs"

or some such. Any thoughts?

posted at: 19:22 | path: /mar-09 | 19 comments

Tags:


Thu, 19 Mar 2009

Some Google Summer of Code Project Ideas


'tis the season, and so it's time for me to post my list of accumulated project ideas. I'll transfer these over to the wiki tomorrow, after I track down some references. I'm willing to mentor any or many of these but I'd prefer to find someone to be the primary mentor for most of 'em. The ideas are not isolated to a single project; there are some potential overlaps in the testing stuf, especially.

  1. Improve subprocess

I and others (mainly others) have talked about cleaning up & improving the subprocess module and associated documentation. This project would involve gathering a bunch of features, integrating them, testing them out on multiple platforms (most especially Windows), documenting them, submitting them to the Python core, and working through at least the first round of critiques.

  1. Core Python testing infrastructure/nose compatibility

Work on the Python 2.x/3.x test running infrastructure to build a nose compatibility layer, so that developers can run the Python tests with nose. This would then enable tag-based execution, code coverage analysis, and all sorts of other nice features. The goal would be to produce a nose plugin that was core-Python-specific, so no changes would need to be made to the core Python code.

  1. Analyze code coverage and improve test coverage, for Python core.

The CPython and stdlib code coverage is not terribly great, ranging from 99% to ~50% for some stdlib modules. Measure the C code coverage, integrate it with the Python code coverage, and provide a convenient integrated report. Make it easy to run & generate code coverage. Do so for Mac OS X, Windows, and Linux. Improve code coverage by adding tests.

  1. Integrate of Pyrex and C code coverage into figleaf reporting

Right now, figleaf (my code coverage analysis tool) measures and reports on Python code coverage. Add integration hooks to allow it to import and generate reports on C/C++ and Pyrex/Cython code coverage.

  1. Port figleaf to Python 3.0.

Code coverage analysis is an important component of testing, especially when refactoring legacy projects. Make figleaf 3.0 compatible, probably in a separate branch.

  1. Implement branch coverage measurement and reporting for CPython.

'nuff said. (This is a tough project that would be more research than implementation, I think.)

  1. Extend a simple continuous build system for Python projects.

Work on a simple buildbot replacement that allows simple, flexible reporting and remote push of results. (This is also part of the snakebite project.) More in a bit.

  1. GridRepublic/BOINC Python

Distributed Python: Borrow or invent a notation for master/slave execution in Python. Develop a system that implements this on BOINC, i.e., creates WUs and applications, and harvests the results. See the BOINC dev projects generally and the Python app design document specifically.

--titus

p.s. Oh, yeah, and a pygr-related project would be good, too.

posted at: 23:26 | path: /mar-09 | 6 comments

Tags: ,


Tue, 03 Mar 2009

titus!


i asked two other friends of mine, [ ... ], for recommendations about model code and about their work environment. their feedback was extremely helpful and i thought you'd be interested to hear the opinions of other good programmers. i would also love to hear your thoughts on these (extracted and scattered) comments:

Re: model code:

satra:

the nipy folks have actually done some nice things in their package

  • rest/sphinx for documentation
  • unit testing
  • traits

You should make your code based on pynifti. Also look at the code for pymvpa, pyepl and rpy2.

In terms of structuring code, nipy for code organization and pymvpa for architecture. Especially in your case, where you might want the ability for users to compare multiple algorithms easily.

tyler:

I definitely agree with Satra. I think in general, you'll want to:

  1. Follow Python coding conventions ( http://www.python.org/doc/essays/styleguide.html)
  2. Use some sort of documentation generation system (epydoc is also very popular: http://epydoc.sourceforge.net/)
  3. Have some unit testing, and an easy way to run regression tests whenever you make changes (at least on the critical code)

Unfortunately, Python 3 may take a while to mature, and get support from scipy, numpy, matplotlib, etc., so you'll probably want to target Python 2.x, but you should probably test the 2to3 tool once in a while to make sure that your code can be easily converted (http://docs.python.org/library/2to3.html). The projects that Satra listed are all really good, and I would add that the python standard modules are also a good place to dig around (somewhere in python -c "import sys;print sys.path").

Re: programming environment:

satra:

I would go with 2.6 if the other libraries have updated. I'm still on 2.5 for now and keep a keen eye towards minimizing the usage of the print statement (probably the most important difference that could break code)!!

@: it might not be time yet to do work in python 3 as tyler mentions, but i spent the evening reading cython documentation. if i understand what i've read so far, since it is compatible with python 3 as well as Python 2.3 or later, and works with numpy, this might be a cool way of coding and compiling an application that works in both python worlds!

...the beauty of the cython route is that you can code in either 2.x or 3.x (for the cython statements), compile, and i believe you can run it with either 2.x or 3.x! if i've got it right, you code your 3.x in cython and you've got a cool transition. what do you think?

otherwise, the recommended is as you say. code in 2.6, use 2to3, and only modify the 2.6 code until you stop maintaining 2.x, which seems like a pain to me.

satra:

I'm going to rewrite in cython only when I need speed as opposed to writing everything in cython. Unfortunately, thats the current problem with python. Python 3 as you know broke several things to improve Python. I wouldn't worry about doing 2to3 for now. Most establishments haven't even moved from 2.4. When distributing code, one has to partly care about the largest userbase and at these large hospitals, things take a while to change. For me my target is to disseminate to the entire FSL/SPM userbase. Can't be too cutting edge for that.

I've looked at sagemath. Although neat, too symbolic for my current needs.

tyler:

I agree with everything Satra said.

Cython doesn't seem like the right way to go.

Regarding 2to3, all I'm suggesting there is that you periodically try running it on your code to see what warnings it gives. Basically, in theory 2to3 can automatically convert most of your code, but will sometimes find something quirky that can't be automatically converted, so it might be nice to know about that. However, like Satra says, it will be a long time before 3 has a significant userbase, so it should be the least of your worries right now.

Sagemath is very cool, and I think that underneath it's running fairly stock python + libraries, but I think you're probably better off developing and testing with the most vanilla possible python 2.x distribution. The enthought distro should be fine though. Arthur has some experience with Sage, so you may want to ask him about it.

@:

so you're going to write in code that can be run by 2.4 or 2.5? and you'll 2to3 later when more people use 3?

dependencies

satra and tyler -- what dependencies would you consider reasonable: numpy, matplotlib, pynifti, traits, cython, and sphinx?

satra: Yes. Currently I'm writing it for 2.5. I would add scipy and pyR to the list, but you can add modules easily later on. I would definitely recommend ipython as your python console.

Re: ide vs. ipython:

satra:

ipython has parallel computing options and I can use it interactively with the -pylab switch. That was sufficiently motivating for me. But ide's are a personal choice, so going with whatever you find most convenient. I've also used SPE and komodo and I switch between all of these according to various irrational whims. ...Emacs of course!! I haven't required an extensive debugger yet. So ipython's minimal interface has been sufficient.

tyler:

I use vim + ipython. My typical work cycle is to open a file that I'm working on in vim, and open ipython in the same directory. I'll make edits, and the type "%run <filename> [args]" in ipython whenever I'm ready to test the code. That brings the whole file into ipython's namespace, and I can generally debug sufficiently that way. Sometimes I'll force a single breakpoint in my code by putting "raise" on a line by itself, and doing %run again. I can then inspect the state by doing a %whos, and printing variables. If you need more in depth debugging, you can do "%pdb on" in ipython to cause pdb to be invoked automatically whenever your code raises an exception (then you can step through the code, etc.). Ipython can also more tightly interact with vim and emacs by allowing you to pipe between (for example) an emacs buffer and ipython (and vice versa), but I've never felt the need to do that, and I couldn't get it to work exactly right when I tried (google for emacs ipython integration maybe). I played for a few minutes with WingIDE, and it seemed nice, but I never put in the time that would be required to be proficient with it, so I'm happy hacking away with the ubiquitous vi. Some things that you might get with a full fledged IDE that aren't as easy to get with emacs + ipython are: code completion, code refactoring support, multi-file search and replace, etc.. As Satra said though, choice of development environment is personal decision, and I can't say what's right for you.

posted at: 18:39 | path: /mar-09 | 0 comments


Sun, 01 Mar 2009

Beginning Python Visualization, by Shai Vaingast


I recently had the pleasure of being the technical reviewer for a new Apress offering, Beginning Python Visualization, by Shai Vaingast.

To quote from the apress page,

What you'll learn:

  • Write ten lines of code and present visual information instead of data soup.
  • Set up an open source environment ready for data visualization.
  • Forget Excel: use Python.
  • Learn numerical and textual processing.
  • Draw graphs and plots based on textual and numerical data.
  • Learn how to deal with images.

I can't recommend this book highly enough for people who need to do basic data processing, manipulation, and viewing in Python. My role as TR was largely limited to occasional style issues and some serious cheerleading: "more figures! more examples! great stuff!"

So, in sum: buy this book.

Note that apress didn't ask me to review this, and they only paid me for the TR role. This blog post is gratis and I was slow enough on the reviews that I suspect they won't ask me to TR again no matter what I say ;).

--titus

posted at: 21:49 | path: /mar-09 | 3 comments

Tags: ,