Monday, June 9, 2008

"Objective" numerical formula evaluation is not necessarily objective, and is often grossly inaccurate

With regard to Paul Krugman's June 8th, 2008 post, "Column inches":

This is just another nasty result of the "objective" fever that's swept so much of society, including academia, where you use "objective" measures to evaluate quality and results, like number of inches, or for academia, number of publications, or something like 5 points for a publication in a top tier journal, 2 points for a publication in the next tier, and 1 for the third tier.

So an Einstein-like publication gets the same 5 points as just a very good publication in a top tier journal, and a professor who works for 5 years and produces one Einstein-like publication would be denied tenure (essentially getting fired), under such an "objective" system, but a colleague who produced one just very good publication and two fairly minor ones would get tenure, even though he had contributed far less to the field and to society.

Moreover, it doesn't even consider how much skill and competency the professor has, and her expected value production over the next 30+ years. A professor who spent years learning deep intuition for advanced continuous time modeling might have had little time left over to publish in those five years, but this understanding is extremely valuable to future contributions to the field and society. But it's not in the simple "objective" publications formula, and it's not even possible to accurately put it in any simple "objective" formula. And, even a formula that's a page long, or a computer program of many pages (and try entering important, but not simple, information into a computer), is simple relative to the advanced, ultra-high dimensional, flexible thinking of logical human brains, using formulae and data only as an aide.

This "objective" fever's simple-mindedness is often very inefficient and harmful. For example, it has caused great harm to the effectiveness of the CIA. Former CIA spy Lindsay Moran writes in a 2005 New York Times Op-Ed:

Simply put, the directorate of operations needs to clean up its own act before it can recruit and, more important, retain quality employees.

Part of the problem is that the agency's culture rewards quantity over quality. Career advancement depends on the number of foreigners an officer is able to recruit, rather than the quality of information derived from them.

What if an operations officer made only one recruitment during the course of his career - but that foreign agent were, say, part of Osama bin Laden's inner circle? That would be an enormous benefit to the nation. But the years required to make such a plum recruitment would render this officer's career stagnant. Conversely, a clandestine officer who recruits a dozen potentially useless foreign assets like truck drivers and falafel-stand owners with no real information but a lot of opinions is likely to have a successful career.

It's hard to say exactly how this "objective" fever developed. Part of it is that society has become so rushed, and using simple-minded formulas, although they can be highly inaccurate for what we truly want to gauge, are quick and take no thinking. Part is an increase in litigation; you are usually ok if you say you didn't discriminate because you applied the same "objective" simple formula to everyone. And another part of it is just the explosion of data in recent years and computer advances. This has allowed easy and quick access to lots of numbers that can be used in simple "objective" formulae and criteria.

It's important to note that just because you use some simple-minded "objective" numerical formula for evaluation doesn't mean you're being objective. There's always the possibility for a great deal of subjectivity in your choice of which the numbers and formula to use. If I base my grades on the "objective" formula that the tallest student gets the highest grade, the next tallest student gets the next highest grade, and so on, is that really objective? Is that really fair? Is that really smart? Just because it's based on "objective" numerical measures and applied equally to all doesn't mean it's an accurate reflection of merit, fair, efficient, or truly objective.


My first term at the University of Michigan MBA program, I turned in what I thought was an outstanding paper, but I only received 85%. When I asked the professor why, he showed me the simple formulaic grading technique he had a teaching assistant follow. It was something like this: 40 points for mentioning core competency, 20 for mentioning alliances, 10 for mentioning diversification, etc. I had what the professor agreed was one of the best discussions of alliances he had ever seen from a student, but I didn't mention a few of the minor points on the laundry list and so lost the 15 points assigned to them. It's not that I didn't understand those points; they were just pretty obvious and less important, so I left them out and focused on the more important issues in depth.

I further found out that his teaching assistant wasn't discriminating based on quality of understanding at all. If a student said one simple sentence on core competency he got the same 40 points as a student who gave Einstein-like ideas and insights on it. It was just another check on the list. So the Einstein-like student who didn't mention a few minor points could get an 82, while a student who understood everything poorly, but wrote on a long laundry list of things would get 100 -- looking at the grades, employers would think he understood business better!

I, in fact, just gamed the system after that, using what I called the "Kitchen Sink" approach -- throw every point into the essay but the kitchen sink, and you're sure to get every check on the list. And, the simple formula the student graders were told to follow didn't take off any points for mentioning something not on the list -- even if it's, "I think they should spend $1 billion on alchemy research" -- so I almost always ended up with 100% (It would have been always, but sometimes I just put in very little time for these things).

When I first protested about this horribly inaccurate grading system (if you want the grades to reflect competency), the professor said it was "objective"; it was applied equally to all students. And you could see why almost all professors used it. They were super busy at a top university, where they were evaluated almost exclusively on publications, and this system was simple enough that an undergraduate could do all of the grading for them. Furthermore, they could easily handle complaints by saying it was an "objective" formula applied equally to everyone. They didn't have to spend a lot of time and effort explaining problems with the quality of your arguments. they could just say, "Didn't mention outsourcing, negative 7 points; didn't mention anti-trust negative 5 points; that's why you got 88."

So, even at a top university, they predominantly used highly inaccurate simple "objective" evaluation. But again, is it really objective? Is that really fair? Is it really accurate for what we actually want to measure? Just because it's based on "objective" numerical measures and applied equally to all doesn't mean it's an accurate reflection of merit, fair, efficient, or truly objective.

No comments: