Thomas Piketty and Spreadsheets
Like Carmen Reinhart and Kenneth Rogoff before him, Thomas Piketty has had questions raised about his analysis; in his case, his work on wealth inequality. Though I canít knowledgeably comment on the questions or the analysis, I can comment on the technology that Ms. Reinhardt, Mr. Rogoff and Mr. Piketty chose to do their work: the spreadsheet. This choice can increase the chances of error in complex analysis, but it also can make finding errors by nonexperts easier.
Roughly speaking, one can think of economics analysis as taking one of two forms: Itís either descriptive or multivariate. Descriptive work is simple, which is not a criticism, because it can also be correct and powerful. Itís basically what can be easily illustrated and understood with a chart. Multivariate work can be very complex, though no less powerful to those who can understand it. As the name suggests, itís an analysis that involves many variables simultaneously. And just because itís complex, that doesnít make it right. But there is a right way to do it, and itís not with a spreadsheet.
The process of going from original data to the conclusions of a multivariate analysis is not easily conveyed graphically. It is, instead, essentially algorithmic. That is, conclusions are reached by starting with data and then applying a sequence of steps to arrive at answers to questions of interest. These steps can be and should be written down clearly and unambiguously and, for a computer to follow them, they must be. If this sounds like computer programming, it is. Modern, applied social science relies heavily on programming. It should.
But it canít with a spreadsheet (like Excel), because a spreadsheet isnít primarily designed to be used that way. Its strength is that it makes visualization and manipulation of numbers easy to do with little training. Itís sort of a glorified standard calculator ó the kind you undoubtedly have at home and use to balance your checkbook. This is also its weakness, because its simplicity has a cost: spreadsheets hide the details. They donít make the sequence of steps in any analysis as transparent as they could be. Theyíre there, but theyíre not front-and-center. This makes discerning what they are difficult and invites error.
Try this puzzle: With a standard calculator, I started with the number 6, did some analysis to answer a specific question, and ended up with the number 28 as my result. What sequence of steps did I take to get there? If you think you know what they are, youíre almost certainly wrong. There are an infinity of ways, and a standard calculator doesnít reveal which one I used. To be sure, the steps exist. But theyíre in my head, and youíd have to do more work (like interview me) to discover them. Iím also likely to forget them. This might seem unimportant, because I have the answer: 28. But how do you or I know it is the correct one? The best way to convince ourselves of that is to look at the sequence of steps and check that they make sense. But we canít do that easily. Theyíre hidden from view.
A spreadsheet is only slightly better than this at revealing the process of analysis. You can make it out, but barely. You have to really work at it. That not only makes it hard for others to assess what one does to data, it makes it hard for even the creator of that spreadsheet to keep track of what he or she has done and to see and fix errors.
For complex analysis, what social scientists usually do instead is write analysis steps in a statistical programming language, of which there are many. Such a program is like a recipe, one anyone familiar with the language can read. It says precisely how you go from raw ingredients (the data) to final product (the answer). Moreover, one can annotate such programs with plain-language descriptions of steps, making them even easier to understand and to find and fix errors. Analysis written out this way makes plain what has been done and why. Errors are far easier to find and fix than they would be in a spreadsheet.
But Mr. Pikettyís work is not complex and multivariate. Itís fairly simple. And for that, a spreadsheet is a reasonable choice. Moreover, because advanced training is not required to examine a spreadsheet, by working in one, and sharing it, Mr. Piketty made it possible for more people to check his work. Thatís praiseworthy.
If the allegations hold up, Mr. Piketty may have made some errors in his spreadsheet. But the choice of that tool is not to blame for them. Were his work more complex, heíd likely have been better off using a statistical programming language. But it isnít, and a spreadsheet is just fine.