Steve Yegge recently wrote a long article, "Code's Worst Enemy", about how "many lines of code" causes problems in projects.
That's obviously pretty silly. To see why, let's examine a little project I've recently started; conservatively, I estimate that it incorporates well over a million lines of code:
print 'hello, world'
Well, that's one line.
But what's needed to run it? The Python interpreter; the C compiler (to build the Python interpreter); the libraries necessary to run Python and actually make that statement appear on the screen; and the Linux (or Mac O$ X, or Window$) operating system and drivers needed to bind them all together.
There's easily a million lines of C code in there, if not ten million. So have I just coded on the most bloated, worst project of all time?
Nope. It's not the lines of code that matter. It's the lines of code you need to think about that matter.
When I write Python code, I rarely need to worry about anything other than the code I'm writing. I don't need to think about the std lib all that much, I certainly don't worry about the CPython core code, and I touch on the UNIX kernel very infrequently. Why?
Because all of that other code is nicely encapsulated, behaves in expected ways, and rarely breaks.
And this is why having a full language, good libraries, and a reliable OS are all ways to decrease the "brain load" of your software.
I also think it points to a deep truth of software engineering, which is that a good library API is one that you don't need to think about much. A good library should be compact, inclusive of core features, work reliably, and contain functionality orthogonal to your code. All of these things will help you worry about the core functionality of your own code and not about any other code.
By almost any measure (excepting that of life itself) our software is unimaginably (and unmanageably) complex already. We manage that complexity as best we can by encapsulating functionality in libraries, APIs, protocols, and "expectations" that are fulfilled, more often than not. And that's the real lesson you should take away from Yegge's post: that writing a 500k Ball of Mud is a bad idea indeed, but more because his design process failed well before he got to 500k LoJC.
Posted by Vinny on 2007-12-28 at 13:05.
Posted by Titus Brown on 2007-12-28 at 13:49.
But... Yegge's whole article (and my perspective too) is focused on maintenance of large, complicated codebases. None of your comments on JS address this. So JS may "win out" but it will be a pyrrhic victory unless it somehow overcomes the intrinsically poor library and code organization constructs -- something that Python excels at.
Posted by David Avraamides on 2007-12-28 at 13:51.
I had a similar reaction when I read Yegge's post. Isn't a large portion of that 500k code in reusable libraries or components? Doesn't he have things like sprites, a game engine, MUD tools, etc? It would seem that most of that code should have been well-baked by now and he wouldn't really need to revisit and maintain it. Or as you put it, he wouldn't need to think about it much.
Posted by Paul Moore on 2007-12-29 at 18:41.
This made me think about an element of David Allen's "Getting things done" time management approach. He advocates getting things out of your mind by putting them into a filing system you **trust** (so you are comfortable you'll get prompted about things when you need to be). And it's similar with software - you don't need to concern yourself about those lines of code that you trust. So, generally, I can ignore the Python interpreter, the stdlib, the OS, etc, because I assume they work. My project's code, I don't. That's why reimplementing standard library code is bad - you lose the trust element. It's also why bug-ridden libraries are bad - you don't trust the library, so all of that library code written by someone else comes into the area of "stuff you need to think about". That's really what API encapsulation is about - putting a line around stuff you're prepared to trust to "just do its job".
Posted by Steve on 2007-12-30 at 00:25.
Posted by bi on 2007-12-30 at 00:37.
Huh? I thought that, among other things, Yegge made a very reasonable point: people keep talking about using the best software maintenance techniques to deal with 500k lines of code, but do you really <em>need</em> to have 500kloc in the first place? I'm not sure what the analogue is here to Titus's example. Well, maybe it's this: Suppose you want to write a simple "Hello world" program, and your development and/or production machine isn't even powerful enough to run a Python interpreter. Should my system "Requirements" include having upgraded machines that can run a Python interpreter which can then run "Hello world"?