I'm 3/4 of the way through my first ground-up code review for pygr, and I want to gripe about something that pygr does a fair bit of: use Python's **kwargs. I don't want to escale it into a policy argument & policy decision for pygr, so I'm posting it here; hopefully I can sway minds with, well, suasion, rather than dictat.
What am I griping about, exactly? Check out this code:
class SomethingExtensible(object): def __init__(self, foo, bar, **kwargs): ... baz(x, y, **kwargs) def baz(a, b, **kwargs): ...
Now think about how you'd read code like this in order to answer the following four questions:
- What keyword arguments does __init__ take?
- What does __init__ do with these arguments?
- What keyword arguments does baz take?
- What does baz do with these arguments?
(These would among the first questions I'd ask of the code; you too, right?)
Well, one thing you can immediately tell is that there's a big black box of arguments passed into 'baz', so there's no point in trying to completely understand the arguments to __init__ without also completely understanding the arguments to baz.
Another thing you can immediately tell is that without a pretty good docstring or a detailed examination of both __init__ and baz, you're not going to be able to begin to understand either function. And since docstrings, documentation, and comments are always wrong or incomplete anyway, you're going to have to grok the entire function.
So, basically, you're going to be lost in this code without a lot of work. Also, because there's no concise way to validate that we only received the set of kwargs we were expecting -- __init__ may not even know what those are, and people rarely check anyway because it's a bit of ugly code to do so -- you're subject to errors from misspellings: arguments that look almost right, but lack an 's' at the end, for example, when the code expects that 's' right there.
On the flip side, you do have an opaque "box" of arguments that you're passing around, and this can come in very handy if you're doing something like this:
class SomethingExtensible(object): def __init__(self, foo, bar, **kwargs): ... baz(x, y, **kwargs) def baz(a, b, **kwargs): ...
(note indentation of baz -- it's now a method in the class, rather than an independent function). Why is this handy? Because now you can subclass SomethingExtensible and potentially redefine 'baz' to take new arguments without having to change the constructor at all -- you just change what keyword arguments you pass into it.
So it's readability vs extensibility. I tend to argue for readability over extensibility, and that's what I'll do here: for the love of whatever deity/ies you believe in, use **kwargs sparingly and document their use when you do.
p.s. It may be that I'm missing some syntax tricks here, where I can unpack **kwargs and demand that it nicely conforms to my expectations without doing serial gets and removals:
x = kwargs.get('x', some_default) if 'x' in kwargs: del kwargs['x'] y = kwargs.get('y', some_default) if 'y' in kwargs: del kwargs['y'] ... assert len(kwargs) == 0, "error, unexpected kwargs"
or some such. Any thoughts?
Posted by Mike Hansen on 2009-03-22 at 00:17.
Using pop is a little bit better: x = kwargs.pop('x', some_default) y = kwargs.pop('y', some_default)
Posted by Chris Lasher on 2009-03-22 at 00:18.
I agree wholeheartedly. One of the things that the previous FriendFeed API Python library made heavy use of was **kwargs. I got rid of those very quickly with well documented parameters in my fork of it in FriendFeed PyAPI. On the note of the unpacking of the kwargs dictionary, why would you want to assert that you've deleted all the entries from it? My understanding of the keyword arguments capture is that you just won't know what you will get in it. If you expect to receive something in it, then you must know something about the structure, and it should be instead be a named parameter.
Posted by Titus Brown on 2009-03-22 at 00:44.
Mike, thanks! For some reason didn't realize 'pop' existed on dicts. Chris, that's to deal with the issue of misspelling. I've run into situations where the interface specifies kwargs, but you're actually fairly sure of what you expect to get in a function call; then making sure you only get that is a good way to proceed, if only in a test situation. It's not something I like but I don't always have control over the interface. Not sure what would happen if I explicitly unpacked them in the function def.. hmm. --t
Posted by Huw Giddens on 2009-03-22 at 03:58.
One of the more persuasive reasons to use **kwargs that I've seen is so that calls to your method via super will work in a coherent way - http://.ws/ has the gory details, but the part that I think applies here is that if your method could be called via super, you don't know what arguments your method will be actually called with. As such, if some method is going to call/be called by super, it may not be a good idea to assert that ***kwargs contains only expected arguments, or to avoid using ***kwargs at all.
Posted by Noah Gift on 2009-03-22 at 06:39.
I think is a clear case of why doctest are a good thing. If you write a doctest, then the API is understandable and testable.
Posted by anon on 2009-03-22 at 07:51.
Your second baz is missing the self argument.
Posted by Marius Gedminas on 2009-03-22 at 08:08.
I agree: **kwargs is a large hammer. Use it sparingly. There are use cases where it is indispensable (wrapping arbitrary functions, e.g. when you're writing a decorator).
Posted by Gael Varoquaux on 2009-03-22 at 11:11.
Actually, I have the opposite opinion, at least for the examples you show: in <em>_init_</em> **kwargs should be prefered. The reason being that it is the only way to have classes enabling changes in the parents (with regards to inheritance), without forcing a rewrite of the subclasses. This is important, because the subclasses may not be in the same codebase as the parent classes. While this is not a golden rule, I would say that a golden rule is: "use positional arguments in your <em>_init_</em> sparsingly". The reason being that positional arguments are terribly fragile to inheritance.
Posted by Titus Brown on 2009-03-22 at 11:22.
Hi Gael, could you give an example? I suspect I'll still try to come down on the side of readability in this case, but I also suspect you have hidden levels of nastiness lurking for me if I say that ;)
Posted by Joseph Lisee on 2009-03-22 at 12:56.
Here is a way I have used **kwargs to good effect (I think) in my own project (see website): class SpecificPanel(wx.Panel): def <em>_init_</em>(self, customArg1, customArg2, **args, ***kwargs): wx.Panel.<em>_init_</em>(self, **args, ***kwargs) # Use customArg1 and customAgr2 This lets users of SpecificPanel just pass the extra args it needs at the front of the list then give all the normal arguments they would give to a wx.Panel. I have found it makes the API cleaner, rather then having to duplicate the argument list of wx.Panel.
Posted by Doug Hellmann on 2009-03-22 at 18:32.
We ended up with a similar use of keyword args in some of our libraries. We came up with two solutions. In some cases, we just went back and listed all of the arguments explicitly. That makes it much easier to track down what arguments are needed/supported by a class. Another solution we came up with for cases where there were a lot of arguments was to define "options" classes. The long list of args is only defined in the constructor to that class, and a single option argument can be passed to the other functions/methods instead of the keyword args. That makes it easy to add new options without updating every caller (it also makes it easy to set up fixtures in tests).
Posted by Titus Brown on 2009-03-22 at 21:42.
Huw Giddens' comment vanished, but he wanted to post this link: <a href="http://mail.python.org/pipermail/python- dev/2005-January/050656.html">http://mail.python.org/pipermail/python- dev/2005-January/050656.html</a> and this: <a href="http://fuhm.net/super-harmful/">http://fuhm.net/super- harmful/</a> His conclusion: **kwargs is a fairly good idea when you want your method to play nice with super()
Posted by Ronny Pfannschmidt on 2009-03-23 at 06:02.
starting with python3 the syntax does support declaring keyword-only arguments see <a href="http://www.python.org/dev/peps/pep-3102/"> http://www.python.org/dev/peps/pep-3102/</a> for details
Posted by Brandon Corfman on 2009-03-23 at 08:10.
Hm, which is more readable: foo = MyObj("bar", (210, 2, 4)) or foo = MyObj(title="bar", rgb=(210, 2, 4)) and this is just a small example. To me, readability from the caller is more important than for the callee (which as you noted is easy to take care of with a docstring).
Posted by <em>Mark</em> on 2009-03-23 at 15:44.
Since calling it "kwargs", though idiomatic, is redundant with the "**" itself, perhaps saying "**extra_args_from_derived_classes" (or something more elegant) that self-documents <em>why</em> you're asking for them?
Posted by Titus Brown on 2009-03-24 at 07:25.
Brandon, sure... use keyword arguments. Just don't collect them all in a '**' variable and pass them around!
Posted by Chris Lasher on 2009-03-24 at 16:35.
Gael, perhaps I've misunderstood you, so like Titus, I'm interested in an example. Going off of my possible misunderstanding, however, I'll disagree and say that **kwargs should be entirely disfavored in place of proper arguments in subclasses. My impression is that in "proper" OOP, all subclasses must implement all methods of their parent classes, though they may implement more. Likewise, each method that is inherited from the parent class, when overridden, must support the same arguments as the original, though it may support more. , **kwargs shouldn't be used in overriding methods in place of arguments explicitly specified by the parent class. This may be practically compatible, but it is of poor practice. I acknowledge that in the short term, it's convenient given the underlying parent classes' parameters can change--and perhaps they are even in an external library you have no control over. When this happens, if you went the **kwargs route, your subclass interface can "stay the same". It's very unhelpful shortcut to take, however. If the parent class changes, take due diligence and update the subclasses. Chances are, you'll have to mess with your own code inside the subclass, anyway, since parameters have likely changed, been added, or been removed.
Posted by Yaroslav Halchenko on 2009-03-24 at 23:00.
Very interesting post. I too have mixed feelings about using **kwargs, especially in the <em>_init_</em>'s (where I find myself using them more and more). Since the main problem later on, as you mentioned, difficulty to understand what keyword arguments any given constructor is needing, for our project (www.pymvpa.org) we worked out following solution: use a helper function for <em>_doc_</em> which helps us out -- goes through super classes <em>_init_</em> docstrings, parses them (we try to be consistent and use ReST for docstrings), and composes :Parameters: section for a current class. Now, whenever a developer needs to discover available parameters -- there is no difficulty at all. Also a developer can easily subclass, provide only documentation for the relevant parameters in a derived class <em>_init_</em> docstring and obtain a complete list of parameters whenever doc for that class is asked. Another part of the solution is that for some classes we define class level Parameters, which are transformed into per-instance attributes later on by the <em>_metaclass_</em> which not only takes care about creating per instance collection of those 'parameters', but extends the docstring of the class with listing of those, and description of added arguments to the <em>_init_</em>. Altogether now we obtain more or less complete description of all arguments which are hidden behind **kwargs ;) so, now example... Constructor information: Definition: SMLR(self, **kwargs) Docstring: Initialize an SMLR classifier. Parameters lm The penalty term lambda. Larger values will give rise to more sparsification. (Default: 0.1) convergence_tol When the weight change for each cycle drops below this value the regression is considered converged. Smaller values lead to tighter convergence. (Default: 0.001) resamp_decay Decay rate in the probability of resampling a zero weight. 1.0 will immediately decrease to the min_resamp from 1.0, 0.0 will never decrease from 1.0. (Default: 0.5) .... whenever actual <em>_init_</em> has only: def <em>_init_</em>(self, **kwargs): """Initialize an SMLR classifier. """
Posted by Gael Varoquaux on 2009-03-25 at 01:09.
Hi all, Here is an example, sorry for the delay: I am traveling and sprinting. Suppose you have a plotting library, and the base class of the objects you draw on the canvas is an 'Artist' (FWIW, I could be talking about a toolkit, and a 'Widget'). This artist is going to be taking init arguments such as a 'linewidth', and a 'color'. I am going subclass it in a crazy amount of ways, because everything I draw is a subclass of 'Artist'. My subclasses are going to add arguments to the init. Now suppose that later down the line, I enhance my 'Artist' to add an 'alpha' argument to it, that controls transparency. I am going to have to add a input argument to every single one of the init method of the subclasses. Not only is this tedious, but if somebody has implemented custom 'Artists' in a different module, they won't be getting this 'alpha' argument. Is this a good example?