2009-03-20 / 12:45 /

Price & quality aside, an advantage of open source software is network effect: the more people are using a piece of software, the better the chance someone else has the same problem as you (and hopefully the sooner you’ll have the same solution as them). Not that open source itself causes the network effect, rather it’s the arms control treaty that allows competitors to contribute to shared code. So this New York Times’ article on Hadoop made me wonder how Google’s closed source infrastructure will play out.

Using closed source makes sense if the code would help competitors more than they would contribute back. Given that Hadoop was inspired by Google’s MapReduce paper [1] it seems like that was originally the case. But, to the article’s main point, Hadoop is now used by Yahoo, Microsoft, Facebook, Autodesk, numerous startups, etc… which is to say at least a few smart people. If you believe in the power of open source, this eats away at Google’s proprietary advantage.

Google could, of course, switch to Hadoop, either fully or via an in-house version (whose changes don’t have to be re-released since they’re not selling the software). Both require a cost, either up front or in ongoing merging. Alternately, they could simply adopt any ideas that seem worthwhile; at scale the coding cost is probably small compared to 1) developing the idea (necessary for any change) and 2) maintenance. According to the article, Google and IBM are are financing college-level Hadoop classes, so Google at least wants students to graduate understanding MapReduce theory and/or parallelism.

On an unrelated note, check out that photo at the top of the story. Was the guy on the right badly photo-shopped in or does he just not cast a shadow?


[1]: And, yes, I’m sure the idea of MapReduce has been around forever. It’s at least as old as functional programming; it’s probably somewhere in Backus’ classic [PDF] though I don’t feel like checking.