5/11/2012

How dearly we love the fake data of the ACW

One of the remarkable features of today's culture is the desire to model truth and in fact to substitute models for truth. (The postmodern thinker who explored this strange development was Jean Baudrillard.)

In the run up to the last census, there was a drumbeat sounded for abandoning the count in favor of modeling. This would substitute assumptions, projections, extrapolations, and inferences for their "equivalent," hard data. It would have asked an agency that has great trouble forecasting population trends to estimate these post facto.

The climategate emails, by the way, are well worth reading to witness "data equivalents" becoming data in the new sense of non-data data.

No matter how many economies are destroyed by financial modeling, no matter how discredited climatology becomes through publicizing its modeling, no matter how many armies are defeated by their own flawed threat modeling, our lust for and belief in models proceeds and even takes on strange new shapes.

People used to believe in counting (as well as touching, hearing, seeing, and tasting). This is how the census came about - out of just such a cultural bias. I was young and in the Army at the tail end of this preference. We had to have body counts. They were important. We measured unit readiness by counting operative vs. inoperative systems, by counting men present for duty. And given how corrupt my generation was and is, we faked the numbers we needed: the body counts, the readiness data, and more, down until the present day.

The ethics of counting became an exercise in fudging and faking. With this widely known, the new attitude developed "why bother." Enter the estimates and models. We have a Defense Department that has not been audited in living memory - what would be the point? We have Civil War authors who make love to zeros, tacking on a minimum of five to any number they need, and rounding counts off to make them "easier" for the reader to absorb.

There was a splash in Civil War news recently when Professor J. David Hacker (shown above) proposed to revise the number of war dead upward from 618,222 to 750,000-850,000.

Look at the form of these numbers and you see my point. There is a painstaking precision in 612,222 which many today would be content to exchange for a range of goose eggs which in their minds would be more "accurate." That is a remarkable thing, think of it, for an educated person with a basketful of eight goose eggs to imagine himself closer to the truth than when he had 618,222 in hand. Astonishing, really.

In approaching the truth, from the inside out, we have:
counting > estimating > modeling.

In Civil War analysis this would equate to
Fox > Livermore > Hacker.

Hacker's model does not seem complete, at least as presented here, witness the zeroes and the use of a big range. The centerpiece of his analysis is the disparity between the 1860 census and 1870 census. Is death in war the only possible reason for the gap? No, and this may be why Hacker uses a range of numbers.

Hacker also mentions that one can find errors in the 1860 data through careful review at the micro level but this discovery has apparently not led him to project an adjusted, modeled total number in lieu of a range. This is why his model seems incomplete.

When Fox tabulated his losses, he counted and his counts included adjustments. An adjustment might involve enhancing a number with additional data from a second source or reconciling two sources, say a muster roll and a hospital discharge list.

Livermore took Fox's sources (and more) and made estimates. He might take the hard data of Confederate soldiers killed and apply to it the Union's proven 1.5 casualty ratio, dead to wounded, to arrive at a CSA wounded total. Here we are getting on thin ice and we must remain conscious that we are handling estimates and we must understand how the estimates are derived. Time and again, authors (like James McPherson) will use Livermore's numbers not knowing (or caring?) that they are handling estimates.

Hacker represents the modern way. Subtract one census result from the other, add assumptions, publish range of estimated figures. Voila, data point!

I'm exaggerating to make an effect. Hacker speaks of Fox and Livermore with a precision I have not seen in Civil War literature. He knows what they were doing and how they were doing it. What seems so astonishing to me is the cultural component, that he could look at their work and substitute an incomplete model for their approaches to the truth.

The way to have handled Hacker's discovery would be to say that there is X difference in the cohort count, this may have a lot to do with uncounted ACW mortality, although we don't know. An author would then revisit the sources to understand how an error of great magnitude could have occurred unnoticed. This research would either open up or close off the line of inquiry, "Proposed: the 1870 census discloses a higher Civil War death toll."

The historical problem is a cohort discrepancy; that is the problem. The problem then went forth to search for a solution and came up with some satisfycing, ACW casualty totals.

Feynman has the last word here:
I have the advantage of having found out how hard it is to get to really know something, how careful you have to be about checking the experiment, how easy it is to make mistakes and fool yourself. I know what it means to know something, and therefore I see how they get their information and I can’t believe they know it, they haven’t done the work necessary, haven’t done the checks necessary, haven’t done the care necessary.

(Take a look at Fox's work here, Livermore's here, and Hacker's here.)