As part of the analytics back end I’m writing for Paupt, I just finished putting together a Mongodb ODA Extension for BIRT, and the entire process confirmed my hypothesis that the next great frontier of Software Engineering research is……documentation.
It’s not that the documentation I found was “bad,” it wasn’t, some was actually pretty good, and extremely helpful, but, in the end, it was hard to find, and I was still left perplexed way too often, and for longer than I liked (I’ve got other stuff to do to get a start-up off the ground). Now, having completed most of the implementation, I’m left wondering why it was so “hard,” when in reality, it wasn’t; there’s a wizard that generates most of the source code for a generic implementation, and there are complete sample implementations for things like accessing a flat file. If I had a time machine, and could jump back to last week, I could explain to myself what needed to be done in about an hour. I would start with the high-level concepts of the implementation, and point out all the details that need attention, and those that could safely be ignored, especially when just starting an implementation. I’d also point out some subtle things I’d discovered along the way, like a set of static utility methods that converted between property representations. Someone took the time to create them, so I knew they were important and useful, but I never found them mentioned in any documentation, and their Javadoc mostly stated the obvious, so when I first tripped across them by accident, it wasn’t clear when and where I would use them, now I know. It probably took me a few hours of looking for code examples, cloning git repositories, waiting for git repositories to clone, remembering that I was cloning a git repository, reading code, stepping through examples with the debugger, googling for answers, reading answers, googling again, lather, rinse, repeat, before I developed a “mental model” that I could manipulate. The frustration is that with my time machine, I could probably tell myself what I needed to know (i.e., transfer my mental model) in a few minutes. This is a serious disconnect that I’m sure resonates with more than a few developers.
So why do we have this disconnect? I think there are two reasons. The first is that the economics of producing code, particularly, open-source, favor the production of executable “code” rather than “documentation” (I like to think of documentation as “code” that executes in the brain.). There are economic rewards for “shipping code,” and making “your dates,” but there are fewer rewards for producing and maintaining documentation (and little status). The second is that the discipline of Software Engineering lacks formal models for documentation. While there are standards like UML, and even conventions like Javadoc, the structure and semantics of virtually all documentation is ad hoc with little tractable accountability; as an experienced developer, I know that in general I cannot completely trust the documentation I read, I need to verify it with the code.
So what should be do about it? The economic forces that favor code production over documentation need to be completely re-examined. What is missing is an accounting for “opportunity loss,” by this I mean the lost or reduced adoption of a software framework by developers due to the difficulty of them learning the framework. For instance, the BIRT framework benefits greatly from the development of ODA extensions, yet there doesn’t seem to be a continuing focus on making that process trivial. Why? Because people get paid to add new (undocumented) features, fix bugs, and produce a new release; nobody gets paid for making the framework accessible; nobody gets paid to keep track of the developers who struggled and gave up. I’ve read other people complain that Eclipse itself is too hard to learn, I think they’re correct, in fact, I think that it is WAY too hard to learn. Yes, there are books; yes, there are wikis; yes, there are forums, but are the actually solving the problem of making Eclipse easy to learn? That hasn’t been my experience, so I don’t think so; I have most of the books published on Eclipse, they’re all literally years out-of-date with few new ones on the horizon. I find the wikis hard to navigate, sparse, poorly edited, and always out-of-date. The forums are more timely, but extremely noisy, and, again, hard to navigate. The counter-response is to point out that Eclipse is open source, so any complainers should just stop complaining and contribute a fix; done; “Next!” I love this response because it effortlessly cuts off discussion while offering no resolution. The premise behind it is that all problems are solved by “open” access to content; they aren’t. The problem with poor open source documentation isn’t one of access, it’s one of economics. Imagine if the Eclipse Foundation recognized this problem and hired a core set of experienced Eclipse developers, paid them VERY well, and gave them lots of status, to write documentation for Eclipse. You can argue about the number, but say there were ten people continually churning out, updating, and polishing, a consistent set of prose, tutorials and sample code for most of the Eclipse projects. Would life for the tens of thousands of Eclipse developers be better? Would the quality of the projects improve? Would the adoption of Eclipse expand? The answer is obvious.
The other direction to follow, which is more academic, is to focus on researching the development of documentation frameworks with the same energy and innovation that we’ve exercised to produce programming languages and frameworks. The idea isn’t to develop yet another way to markup prose, create a project web site, host a wiki, extract a class summary or navigate a code base, snore, that’s been done. Instead, I think we need start creating experimental documentation frameworks that are capable of producing and maintaining a spectrum of documentation about a particular program, code base, or programming framework; everything from full books that address conceptual aspects of the design, and best practices, to step-by-step installation instructions and descriptions of how to perform a build, all “integrated” together. If anything is missing, it would be obvious. We should look at developing documentation with the same precision that we develop code, with the same automated checks and balances, and continuous integration. For instance, could we come up with the conceptual equivalent of a “unit test” for documentation? What about documentation coverage tools? A developer should be able to come to a code base, recognize that it is documented with a particular framework (just like it is programmed in a particular language), and instantly “jump in,” and start navigating the documentation. They would know what (if anything) was out-of-date, questionable, or missing, and they could easily identify, and skip over parts that they already understand or weren’t relevant. I’m expecting someone to point out some system that already does this, or they think it does, and maybe it does, but there’s nothing that is so good that it has wide-spread, “everybody has heard about it,” adoption. What is needed is a shift in “developer culture,” one that expects such systems in the same way that it expects code to be tested; that is the next challenge for Software Engineering.