Thursday, October 12, 2006

5 Principles For Programming

Here are a few things I have learned about programming computers, in no particular order. I didn't invent any of them, and I don't always follow them. But since nobody seems to know very much about making good software, it makes sense to try to distill a little wisdom when possible.

Fail Fast

Check for programming errors early and often, and report them in a suitably dramatic way. Errors get more expensive to fix as the development process progresses?an error that the programmer catches in her own testing is far cheaper then one the QA tester finds, which is in turn far cheaper than the one your largest customer calls to complain about. The reason this matters is that the cost of software comes almost entirely from the errors. To understand why this is, consider writing code in the following manner: you are assigned some feature, you type up a complete implementation all in one go, then you hit compile for the first time. (I TA'd a beginning programming class in grad school and this is not very different from how beginning programmers insist on working.) The point is that if you have any experience writing software you know that if getting to the first compile required n man-hours, then the time required to having shippable code is probably between 2n and 100n man-hours, depending on the domain. That time will be divided between the programmer's own bootstrap testing, QA time (and the associated bug fixing), and perhaps some kind of beta.

The classic examples of this principle are type-checking, unit testing, and the assert statement. When I first learned about the assert statement I couldn't accept that it was useful?after all, the worst thing that can happen is that your code can crash, right? and that is what the assert statement causes. For all the hoopla about unit testing, you would think that it was something deeper then just a convention for where to put your assert statements. But software development is in such an infantile stage, that we shouldn't poke fun?unit testing, for all the child-like glee of its proponents?may well be the software engineering innovation of the decade.

You see the violation of the fail fast principle all the time in Java programming where beginning programmers will catch (and perhaps log) exceptions and return some totally made up value like -1 or null to indicate failure. They think they are preventing failures (after all whenever an exception shows up in the logs it is a problem, right?) when really they are removing simple failures and inserting subtle time-sucking bugs.

Unfortunately the idea of failing fast is counter-intuitive. People hate the immediate cause of pain not the underlying cause. Maybe this is why you hear so many people say they hate dentists, and so few say they hate, I don't know, plaque. This is a lot of what irritates people about statically typed languages?when the compiler complains we hate the compiler, when the program does what we say and crashes we hate ourselves for screwing up?even when it is an error that could have been discovered by a more vigilant compiler.

This is why I can't work myself into the same first-kiss level of ecstasy others manage over languages like Ruby[1]. Dynamic code feels great to program in. After the first day you have half the system built. I did a huge portion of my thesis work in Python and it was a life saver. Thesis work doesn't need to be bug free, it is the quintessential proof-of-concept (and yet so many CS students, when faced with a problem, break out the C++). But I have also worked on a large, multi-programmer, multi-year project, and this was not so pleasent. A large dynamically typed code base exhibits all the problems you would expect: interfaces are poorly documented and ever changing, uncommon code paths produce errors that would be caught by type checking, and IDE support is weak. The saving grace is that one person can do so much more in Python or Ruby that maybe you can turn your 10 programmer program into three one programmer programs and win out big, but this isn't possible in a lot of domains.  It is odd that evangelists for dynamic languages (many of whom have never worked on a large, dynamically-typed project) seem to want to deny that static type-checking finds errors, rather than just saying that type-checking isn't worth the trouble when you are writing code trapped between a dynamically typed database interface and a string-only web interface.
Syntax highlighting (and auto-compilation) in IDEs is another example of this principle, but on a much shorter timescale. Once you have become accustomed to having your errors revealed instantaneously it is painful to switch back to having to wait for a compiler to print them out in bulk one at a time.

Write Less Code (and Don't Repeat Yourself)

This is perhaps the most important and deep principle in software engineering, and many lesser principles can be derived from it. Somehow simple statements/programs/explanations/models are more likely to be correct. No one knows why this is; perhaps it is some deep fact about the universe, but it seems to be true.

In software this comes into play as bugs: longer programs have a lot more bugs so longer programs cost more.

Worse, difficulty seems to scale super-linearly as a function of lines of code. In the transition from Windows XP to Vista the codebase went from 40 million to 50 million lines of code. To do this took 2,000 of the world's best software engineers 5 years of work.

The reason for this is that the only way to get real decreases in program size (decreases of more than a few characters or lines) is to exploit symmetry in the problem you are solving. Inheritance is a way to exploit symmetry by creating type hierarchies, design patterns are an attempt to exploit symmetry of solution type. Functional languages are still the king of symmetry (all of lisp is built out of a few primitive functions). But rather than categorize these by the mechanism of the solution, it is better to think of them as what they are: ways to write less code.

The best way of all to avoid writing code is to use high quality libraries. The next time you find yourself writing a web application reflect on how little of the code executing is really yours, and how much belongs to the Linux kernel, Internet Explorer, the Windows XP, Oracle, Java, and the vast arrays of libraries you rely on ("ahh the old hibernate/spring/JSF/MySQL solution, so lightweight…").

Some of the difficulties of large programs are technical but many are sociopolitical. Microsoft is the size of a medium sized country by income and the size of at least a small city by head-count. Yet it is run in much the same way as any 200 person company, namely some guy tells some other guy what to do, and he tells you. Unfortunately they have found that none of these things work at that scale, and I don't think anyone has a really good idea of how to fix them.

Your problem doesn't require a small city to produce, but the principle is the same. If your solution requires double the man-power then all the organizational overhead will have to be developed to handle this. Furthermore the organization will be composed of computer programmers who are often at roughly the same level of interpersonal sophistication as that sling blade guy.

What is remarkable, though, is that to make the solution small means also making it clear. I think that this has mostly to do with human brains. We can only think one or maybe two sentences worth of thought at a time, so finding the concepts that make your solution one sentence is essential. The famous haskell quicksort is a perfect example of this. I can't help but feel jealous of the computer science students ten years from now who will see algorithms presented in that way. (If you don't read haskell the program just says: "An empty list is quicksorted. The quicksort of a non-empty list is the concatenation of (1) the quicksort of list elements less than the first element, (2) the first element itself, and (3) and the quicksort of list elements greater than the first element." Though, of course, the haskell version is much briefer.)

Computer Programs Are For People

"We want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute."
The Structure and Interpretation of Computer Programs

The wonderful thing about the above quote is that it gets less brave and more obvious every year. We know that c/java/lisp/haskell have not one bit of power that isn't in simple assembly langauge?they only allow us to express the ideas more clearly and prevent certain kinds of stupid mistakes. There is no program that can be written in one that can't be written in another, and all end up as machine instructions sooner or later (some at compile time, some at run time, but no matter). Given this fact it should be obvious that the only reason to have a programming language is to communicate to a person. Don Knuth wrote about this idea, calling it Literate Programming, and created a system called WEB which was a great idea mired in a terrible implementation[2]. The idea was to embed the program in an essay about the program that explained how it worked. The Sun programmers simplified this in some ways with Javadoc, but still something was lost, since it is very hard to get any of the big ideas out of Javadoc, or even to know where to start reading. Projects always have two links: one for Javadoc and one for a higher level documentation which is written in some other system. WEB created a linear narrative to describe a program that might not be quite so straight-forward; Javadoc creates a flat list of documentation with no beginning, end, or summary. Neither is quite what I want.

It is a small tragedy that programmers who spend so much time trying to understand obscure code, and so much time creating new lanauges to write more obscure code in, spend so little time coming up with the WYSIWYG version of WEB that makes program source input look like the beautiful WEB/TEX output.

I think the best ideas in object-oriented programming also fall under this category. The methodology of solving a problem by creating a domain model to express your problem in is the best example. The point of such a model is to create that level of abstraction which exists wholly for people to think in. Not all problems are easily broken by such a method but many are. The concept of encapsulation (you know, the reason you type private in front of all those java variables) is another example. Both of these are to make things simpler, more usable, to put it simply, more human.

A sort of corollary of writing computer programs for people, writing less code, and solving the general problem is the following: write short functions. If I could transmit only a single sentence to the programmers of tomorrow which summed up everything I knew it would be that: write short functions[3]. When you write short functions you are forced to break the code into logical divisions and you create a natural vocabulary built out of function names. This makes the code an easily readable, easily testable, set of operations. When you have done this it becomes possible to see the duplication in your code and you start solving the more general problems. It is sad that the best piece of software engineering advice that I know of is to write short functions, but, well, there it is. Don't spend it all in one place.

Do The Right Thing

This principle sounds funny when stated directly, after all who is advocating doing the wrong thing? And what is the right thing to do, anyway?

The point is that in the process of developing software I am always facing the following situation: I can cut corners now, hack my way around a bug, add another special case, etc. OR I can try to do the right thing. Often I don't know what the right thing is, and then I don't have a choice but to guess. But more often, I know the best solution but it requires changing things. In my experience every single factor in the software development process will argue for doing the wrong thing: schedules, managers, coworkers, and even, when they get involved, customers. All of these groups want things working as soon as possible, and they don't care what is done to accomplish that. But no one can see the trade-off being made except for the programmers working on the code.  And each of these hacks seems to come back like the ghost of Christmas past in the form of P0 bugs, and I end up doing the right thing then under great pressure and at higher cost then I would have done it before.

A lot of times doing the right thing means solving the more general problem. And it is an odd experience in computer programming that often solving the more general problem is no harder then solving the special cases once you can see the general problem.
There is a lot of advice that argues the opposite. This line of thought says just throw something out there, then see how it is conceptually broken, then fix it. This argument is perfectly summarized in the " worse-is-better" discussion, and I don't have much to add to it. Except to say this, I think that worse-is-better is a payment scheme. With worse-is-better you get 85% of a solution for dirt cheap, and the remaining 15% you will pay in full every month for the rest of your software system's life. If you are writing software that you know will be of no use tomorrow, then worse-is-better is a steal, (but you might want to consider quitting your job). If you are writing software that will last a while you should do the right thing.

Reduce State

You may have heard that Amdahl's law is the new Moore's law and that by the time Microsoft finishes the next version of Windows computers will have like holly shit 80 fucking cores. This means that in five years when your single threaded program is going full tilt boogie it will be using all of 1/80th of the processor on the machine. As a semi-recent computer science grad. student my opinion of concurrency is "neato." But I notice the old timers at work have more of a "we are so totally fucked" look about them when they talk about it. I think the reason for this is this: if x is a mutable object then the following doesn't hold in multithreaded program:

x.equals(x)

Like everyone else, I am guessing that the end game for all this will be a massive reduction in mutable state. Those functional language people are definitely on to something. But those of us who still have to program in dysfunctional languages during the day need a more gradual path. The question is, if mutable state is bad, is less mutable state less bad? or do I have to get rid of it all?

I don't know the answer to this. But for a while now, I have been trying the following. Whenever possible avoid class variables that aren't declared final. See functional programming gets rid of side-effects altogether, but I know if this is necessary.  Inside a function the occasional i++ really isn't that confusing and I am not sure I want to give it up just yet. The reason is that method-local variables have no publicly accessible state, so as long as I am writing short functions this temporary state shouldn't be a problem. By declaring x final you ensure that x.equals(x). This also makes it very easy to prevent invalid states, just ensure that either the user provides valid inputs to the constructor or the constructor throws an exception?if you do this, and don't have mutable state, then you are guaranteed no bad states.

I haven't figured out yet how to make all my members final just yet (or I would probably be using haskell). It seems to me that if I want to change a User's email address then I need to be able to call user.setEmail(). That is because the state of the email address is real state out there in the world that I have to model in my program. So the domain model retains its state. But as yea of the Java world know, the domain model is not all the code, oh no. We still have business objects, and persistence objects, and gee-golly all kinds of other objects. And guess what?99% of the state in these objects can go. And when it does everything gets better.

But I am only starting with this concurrency thing. I am reading this book, which is awesome. In it you can learn about all kinds of disturbing things like how the JVM has secretly been scrambling the order of execution of your code in subtle ways and things like that.
Know Your Shit

Just as the workable solution is always the last thing you try, the impossibly to diagnose bug is always in the software layer you don't understand. You have to understand all layers that directly surround your code?for most programmers this begins with the OS. If you do low level programming you better know about computer architecture too. But this idea is bigger that just catching obscure bugs, it has to do with finding the solution to hard problems. Someone who is familiar with the internals of an OS has enough of the big ideas under their belt to attack most large software problems.
Web programmers know that all performance problems come from the database. Naturally when you first see a performance problem in a database backed application, you want to do some profiling and see where in the code the time is going. After all, isn't this what everyone says to do?. You can do this, but you might as well just save yourself the time and just log the queries that are issued on that code path, then get the database execution plan on each, you'll find the performance problem pretty quickly. The reason is simple: data lives on disks and disks are (say) 100,000 times slower than memory. So to cause a problem in java you have to be about 100,000 times more stupid shit than to cause a problem in SQL. But notice how the gorgeous relational database abstraction layer has broken down and in order to solve the problem one has to think about how much data is being pulled off the disk to satisfy the query. The point is that you can't stop at understanding the relational part, you also need to understand the database part.

The larger problem is what should we be learning to be better at this.  I know that the following things will help because they have helped me:

  1. Learn a functional programming language
  2. Learn how operating systems work
  3. Learn how databases work
  4. Learn how to read a computer science paper
  5. Learn as much math as you can (but which math…)

Unfortunately t is virtually impossible to say what will not help you solve a problem. Will knowledge of good old-fashioned AI help you write enterprise software? It certainly might when you implement their 400,000 lines of Java business rules templates as 2,000 lines in a prolog-like business rules system. Likewise, it isn't every day that I need to integrate at work, but when I have the payoff has usually been big. If you asked me if studying something very useless and different from computers, literature, say, would help you to solve problems, I couldn't tell you that it wouldn't. It might be less likely then studying operating systems or databases to pay off, so there might be some opportunity cost, but I couldn't tell you that the next big advance wouldn't come from someone who had divided their time between programming and literature. I think that that is pretty much the state of our art, we don't even know what the framework in which the new ideas will come, let alone what they might be. I'm not sure if that is exciting or pathetic.

[1] "Ruby is a butterfly". Wow dude, go outside. Look at any of the beautiful creatures on the earth. Now go back in and look at your computer. See much resemblance? Me either.

[2] Knuth says: "A user of WEB needs to be good enough at computer science that he or she is comfortable dealing with several languages simultaneously. Since WEB combines TEX and PASCAL with a few rules of its own, WEB programs can contain WEB syntax errors, TEX syntax errors, PASCAL syntax errors, and algorithmic errors; in practice all four types of errors occur, and a bit of sophistication is needed to sort out which is which. Computer scientists tend to be better at such things than other people."

Just because we are better at it doesn't mean we are good at it, and even if we are good at it that doesn't make it a good idea. Anyone who has looked at a JSP that contained equal parts HTML, CSS, SQL, Java, and Javascript has a fair idea what the source for WEB programs looks like. But the produced output, like all TEX output, is absolutely stunning.

[3] Wearing sunscreen is good advice too, but computer programmers are probably the least at-risk sub-population for skin cancer since most of them aren't white and none of them seem to get outside enough. If in doubt write short functions while wearing sunscreen, but if you have to give up one, lose the sunscreen.