The Tyranny of Metrics Flashcards

Jerry Muller

1
Q

The series, Bodies, written by Jed Mercurio, a former hospital physician, takes place in the obstetrics and gynecology ward of a metropolitan hospital. In the first episode, a newly arrived senior surgeon performs an operation on a patient with complex comorbidities, after which she dies. His rival then provides him with this advice: “The superior surgeon uses his superior judgment to steer clear of any situation that might test his superior ability.”

A

Bodies is a medical drama, but the phenomena it depicts exist in the real world. Numerous studies have shown that when surgeons, for example, are rated or remunerated according to their success rates, some respond by refusing to operate on patients with more complex or critical conditions. Excluding the more difficult cases—those that involve the likelihood of poorer outcomes—improves the surgeons’ success rates, and hence their metrics, their reputation, and their remuneration. That of course comes at the expense of the excluded
patients, who pay with their lives. But those deaths do not show up in the metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Gaming the metrics occurs in every realm: in policing; in primary, secondary, and higher education; in medicine; in nonprofit organizations; and, of course, in business. And gaming is only one class of problems that inevitably arise when using performance metrics as the basis of reward or sanction. There are things that can be measured. There are things that are worth measuring. But what can be measured is not always what is worth measuring; what gets measured may have no relationship to what we really want to know. The costs of measuring may be greater than the benefits. The things that get measured may draw effort away from the things we really care about. And measurement may provide us with distorted knowledge—knowledge that seems solid but is actually deceptive.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Starting in the late 1990s, Gallup conducted a multiyear examination of high-performing teams that eventually involved more than 1.4 million employees, 50,000 teams, and 192 organizations. Gallup asked both high- and lower-performing teams questions on numerous subjects, from mission and purpose to pay and career opportunities, and isolated the questions on which the high-performing teams strongly agreed and the rest did not. It found at the beginning of the study that almost all the variation between high- and lower-performing teams was explained by a very small group of items. The most powerful one proved to be “At work, I have the opportunity to do what I do best every day.” Business units whose employees chose “strongly agree” for this item were 44% more likely to earn high customer satisfaction scores, 50% more likely to have low employee turnover, and 38% more likely to be productive.

A

Taken from an HBR Article on performance management
We set out to see whether those results held at Deloitte. First we identified 60 high-performing teams, which involved 1,287 employees and represented all parts of the organization. For the control group, we chose a representative sample of 1,954 employees. To measure the conditions within a team, we employed a six-item survey. When the results were in and tallied, three items correlated best with high performance for a team: “My coworkers are committed to doing quality work,” “The mission of our company inspires me,” and “I have the chance to use my strengths every day.” Of these, the third was the most powerful across the organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

At the end of every project (or once every quarter for long-term projects) we will ask team leaders to respond to four future-focused statements about each team member. We’ve refined the wording of these statements through successive tests, and we know that at Deloitte they clearly highlight differences among individuals and reliably measure performance. Here are the four:

  1. Given what I know of this person’s performance, and if it were my money, I would award this person the highest possible compensation increase and bonus [measures overall performance and unique value to the organization on a five-point scale from “strongly agree” to “strongly disagree”].
  2. Given what I know of this person’s performance, I would always want him or her on my team [measures ability to work well with others on the same five-point scale].
  3. This person is at risk for low performance [identifies problems that might harm the customer or the team on a yes-or-no basis].
  4. This person is ready for promotion today [measures potential on a yes-or-no basis].
A

Taken from an HBR Article on performance management

We ask leaders what they’d do with their team members, not what they think of them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Research into the practices of the best team leaders reveals that they conduct regular check-ins with each team member about near-term work. These brief conversations allow leaders to set expectations for the upcoming week, review priorities, comment on recent work, and provide course correction, coaching, or important new information. The conversations provide clarity regarding what is expected of each team member and why, what great work looks like, and how each can do his or her best work in the upcoming days—in other words, exactly the trinity of purpose, expectations, and strengths that characterizes our best teams.

A

Taken from an HBR Article on performance management
Our design calls for every team leader to check in with each team member once a week. For us, these check-ins are not in addition to the work of a team leader; they are the work of a team leader. If a leader checks in less often than once a week, the team member’s priorities may become vague and aspirational, and the leader can’t be as helpful—and the conversation will shift from coaching for near-term work to giving feedback about past performance. In other words, the content of these conversations will be a direct outcome of their frequency: If you want people to talk about how to do their best work in the near future, they need to talk often. And so far we have found in our testing a direct and measurable correlation between the frequency of these conversations and the engagement of team members. Very frequent check-ins (we might say radically frequent check-ins) are a team leader’s killer app.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A key premise of metric fixation concerns the relationship between measurement and improvement. There is a dictum (wrongly) attributed to the great nineteenth- century physicist Lord Kelvin: “If you cannot measure it, you cannot improve it.” In 1986 the American management guru, Tom Peters, embraced the motto, “What gets measured gets done,” which became a cornerstone belief of metrics.3 In time, some drew the conclusion that “anything that can be measured can be improved.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The key components of metric fixation are
The belief that it is possible and desirable to replace judgment, acquired by personal experience and talent, with numerical indicators of comparative performance based upon standardized data (metrics);
The belief that making such metrics public (transparent) assures that institutions are actually carrying out their purposes (accountability);
The belief that the best way to motivate people within these organizations is by attaching rewards and penalties to their measured performance, rewards that are either monetary (pay- for- performance) or reputational (rankings).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Metric fixation is the persistence of these beliefs despite their unintended negative consequences when they are put into practice.6 It occurs because not everything that is important is measureable, and much that is measurable is unimportant. (Or, in the words of a familiar dictum, “Not everything that can be counted counts, and not everything that counts can be counted.”7) Most organizations have multiple purposes, and that which is measured and rewarded tends to become the focus of attention, at the expense of other essential goals. Similarly, many jobs have multiple facets, and measuring only a few aspects creates incentives to neglect the rest.8 When organizations committed to metrics wake up to this fact, they typically add more performance measures—which creates a cascade of data, data that becomes ever less useful, while gathering it sucks up more and more time and resources.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Because the theory of motivation behind pay for measured performance is stunted, results are often at odds with expectations. The typical pattern of dysfunction was formulated in
1975 by two social scientists operating on opposite sides of the Atlantic, in what appears to have been a case of independent discovery. What has come to be called “Campbell’s Law,”
named for the American social psychologist Donald T. Campbell, holds that “[t]he more any quantitative social indicator is used for social decision- making, the more subject it will be
to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Trying to force people to conform their work to preestablished numerical goals tends to stifle innovation and creativity— valuable qualities in most settings. And it almost inevitably
leads to a valuation of short- term goals over long- term purposes.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Some Flaws:

Measuring the most easily measurable. There is a natural human tendency to try to simplify problems by focusing on the most easily measureable elements.1 But what is most easily measured is rarely what is most important, indeed sometimes not important at all. That is the first source of metric dysfunction.

Measuring inputs rather than outcomes. It is often easier to measure the amount spent or the resources injected into a project than the results of the efforts. So organizations measure what they’ve spent, rather than what they produce, or they measure process rather than product.
An example [Arnold frequently found himself inspecting schools where students ingested mountains of facts and arithmetic, but were bereft of analytic ability and utterly incapable of understanding sophisticated prose or poetry. They were taught not to reason but to cram.6 Both before and especially after the adoption of “payment for performance,” he criticized such education for being “far too little formative and humanizing . . . much in it, which its administrators point to as valuable results, is in truth mere machinery.”

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Degrading information quality through standardization.
Quantification is seductive, because it organizes and simplifies knowledge. It offers numerical information that allows for easy comparison among people and institutions.
2 But that simplification may lead to distortion, since making things comparable often means that they are stripped of their context, history, and meaning.3 The result is that the information appears more certain and authoritative than is actually the case: the caveats, the ambiguities, and uncertainties are peeled away, and nothing does more
to create the appearance of certain knowledge than expressing it in numerical form.
Gaming the metrics takes a variety of forms.

Gaming through creaming. This takes place when practitioners find simpler targets or prefer clients with less challenging circumstances, making it easier to reach the metric goal, but excluding cases where success is more difficult to achieve.

A

Improving numbers by lowering standards. One way of improving metric scores is by lowering the criteria for scoring. Thus, for example, graduation rates of high schools and colleges can be increased by lowering the standards for passing. Or airlines improve their on- time performance by increasing the scheduled flying time of their flights.

Improving numbers through omission or distortion of data. This strategy involves leaving out inconvenient instances, or classifying cases in a way that makes them disappear from the metrics. Police forces can “reduce” crime rates by booking felonies as misdemeanors, or by deciding not to book reported crimes at all.

Cheating. Outright cheating can take many forms. The higher the rewards and stakes, the higher the motivation to cheat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

McNamara’s Pentagon was characterized by what the military strategist Edward Luttwak called “the wholesale substitution of civilian mathematical analysis for military expertise.
The new breed of the ‘systems analysts’ introduced new standards of intellectual discipline and greatly improved bookkeeping methods, but also a trained incapacity to understand the most important aspects of military power, which happen to be non-measurable.” The various armed forces sought to maximize measurable “production”: the air force through the
number of bombing sorties; artillery through the number of shells fired; infantry through body counts, reflecting statistical indices devised by McNamara and his associates in the Pentagon. But, as Luttwak writes, “In frontless war where there are no clear lines on the map to show victory and defeat, the only true measure of progress must be political and non-quantifiable: the impact on the enemy’s will to continue to fight.
What could be precisely measured tended to overshadow what was really important

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What about the things that cannot be measured?
Primary schools, for example, have their tasks of teaching reading, writing, and numeracy, and these perhaps could be monitored through standardized tests. But what about goals that are less measurable but no less important, such as instilling good behavior, inspiring a curiosity about the world, and fostering creative thought?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A number of contemporary critics have observed, the fixation on quantifiable goals so central to metric
fixation—though often implemented by politicians and policymakers who proclaim their devotion to capitalism—replicates many of the intrinsic faults of the Soviet system. Just as Soviet bloc planners set output targets for each factory to produce, so do bureaucrats set measurable performance targets for schools, hospitals, police forces, and corporations. And just as Soviet managers responded by producing shoddy goods that
met the numerical targets set by their overlords, so do schools, police forces, and businesses find ways of fulfilling quotas with shoddy goods of their own: by graduating pupils with minimal skills, or downgrading grand theft to misdemeanor- level petty larceny, or opening dummy accounts for bank clients

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the formulation of Alfie Kohn, a long- time critic of pay- for- performance, metrics “inhibits risk taking,
an inevitable concomitant of exploration and creativity. We are less likely to take chances, to play with possibilities, and to follow hunches, which may, after all, not pay off

A
17
Q

“To demand or preach mechanical precision, even in principle, in a field incapable of it is to be blind and to mislead others,” as the British liberal philosopher Isaiah Berlin noted in an essay on political judgment. Indeed what Berlin says of political judgment applies more broadly: judgment is a sort of skill at grasping the unique particularities of a situation, and it entails a talent for synthesis rather than analysis, “a capacity for taking in the total pattern of a human situation, of the way in which things hang together.” A feel for the whole and a sense for the unique are precisely what numerical metrics cannot supply.

A
18
Q

Reward for measured performance in higher education is touted by its boosters as making universities “more like a business.” But businesses have a built- in restraint on devoting too much time and money to measurement—at some point, it cuts into profits. Ironically, since universities and other nonprofit
institutions have no such bottom line, government or accrediting agencies or the university’s administrative leadership can extend metrics endlessly.25 The effect is to increase costs or to divert spending from the doers to the administrators—which usually suits the latter just fine. It is hard to find a university where the ratio of administrators to professors and of administrators to students has not risen astronomically in recent decades

A
19
Q

A capitalist society depends for its flourishing on a variety of institutions that provide a counterweight to the market, with its focus on monetary gain. To prepare pupils and university students for their roles as citizens, as friends, as spouses, and above all to equip them for a life of intellectual richness— those are among the proper roles of college. Conveying marketable skills is a proper role as well. But to subordinate higher education entirely to the capacity for future earnings is to measure with a very crooked yardstick.

A
20
Q

Unintended Consequences
Instruction in math and English is narrowly focused on the sorts of skills required by the test, rather than broader cognitive processes: that is, students too often learn test- taking strategies rather than substantive knowledge. As depicted in the HBO series The Wire, a great deal of class time is devoted to practicing for tests—hardly a source of stimulation for pupils. Because students in English are taught to answer multiple choice and short- answer questions based on brief passages, the students are worse at reading extended texts and writing extended essays—much as Mathew Arnold had predicted a century and a half earlier.

A

Tests of performance are designed to evaluate the knowledge and ability that students have acquired in their general education. When that education becomes focused instead on developing the students’ performance on the tests, the test no longer measures what it was created to evaluate.

21
Q

Of course, the scores on English and math achievement tests cannot measure the full benefits of K- 12 education. That is not because the NAEP scores are distorted or insignificant. They do provide a useful measure of student knowledge of the subjects tested. But there is much more to school than the learning of English and mathematics: not only other academic subjects but also the stimulation of interest in the world, and the cultivation of habits of behavior (self- control, perseverance, ability to cooperate with others) that increase the likelihood of success in the adult world. Development of these noncognitive qualities may well be going on in classrooms and schools without being reflected in performance metrics based on test scores

A
22
Q

Nowhere are metrics in greater vogue than in the field of medicine. Nowhere, perhaps, are they more promising. And the stakes are high. But here too, metrics play a variety of roles—some genuinely useful, some of more dubious worth. One role is informational and diagnostic: the process of keeping track of various methods and procedures, and then comparing the outcomes, makes it possible to determine which are most successful. The successful methods and procedures can then be followed by others. Another is publicly reported metrics, intended to provide transparency to consumers, and a basis for comparison and competition among providers. Yet another is pay- for- performance, in which accountability is backed up with monetary rewards or penalties. Advocates of the use of metrics in medicine often discuss these very different roles in the same breath. The great push in recent decades has been for metrics to be used not only to improve safety and effectiveness but also to contain costs.

A
23
Q

The Cleveland Clinic, Geisinger, and the Keystone project are frequently cited as proof of the efficacy of measuring performance, and with reason. Yet when we dig more deeply, we find that the metrics matter because of the way they are embedded into a larger institutional culture Is the success of the Cleveland Clinic a function of the fact that the Clinic publishes its outcomes? Or is the Clinic eager to publicize its outcomes precisely because they are so impressive? In fact, the Cleveland Clinic was one of the world’s great medical institutions before the rise of performance metrics, and it maintains that standing in the age of performance metrics. But to conclude that there is a causal relationship between the clinic’s quality and the publication of its performance metrics is to fall prey to the fallacy of post hoc ergo propter hoc. The success may have far more to do with local conditions—the ways in which the organizational culture of the Cleveland Clinic makes use of metrics—than with quality measurement per se.

A
24
Q

Metrics at Geisinger are effective because of the way in which they are embedded in a larger system. Crucially, the establishment of measurement criteria and the evaluation of performance are done by teams that include physicians as well as administrators. The metrics of performance, therefore, are neither imposed nor evaluated from above by administrators devoid of firsthand knowledge. They are based on collaboration and peer review. Geisinger also uses its metrics to continuously improve its performance in outpatient care for a variety of conditions. Here is how Glenn D. Steele, a physician who presided over the transformation of the Geisinger system as CEO, accounts for its successes: “Our new care pathways were effective because they were led by physicians, enabled by real- time data- based feedback, and primarily focused on improving the quality of patient care,” which “fundamentally motivated our physicians to change their behavior.” Crucial too was the fact that “the men and women who actually work in the service lines themselves chose which care processes to change. Involving them directly in decision making secured their buy- in and made success more likely.” What we can learn from the Geisinger example is the importance of having providers develop and monitor performance measures. The fact that the measures were in keeping with their own professional sense of mission was crucial.

A
25
Q

Peter Pronovost, who spearheaded the reduction of central line infections, believes that “The Keystone ICU project demonstrated the potential of voluntary efforts that rely on intrinsic motivation through peer norms and professionalism.” He’s not opposed to supplementing these appeals with public reporting and monetary incentives. But his own interpretation is that the improvement in medical outcomes was brought about primarily by “a shift in clinicians’ belief—by showing them that the rate of infection was not inevitable and could
be controlled, in a way that appealed to their professional ethos as doctors and nurses.” However, the conclusion drawn by the U.S. government’s Centers for Medicare and Medicaid Services was to initiate public reporting of the infection rates in 2011, and a year later, to begin penalizing hospitals with higher infection rates by withholding reimbursements. That created a structure of incentives very different from the institutional successes we’ve examined so far, which relied more on intrinsic than extrinsic motivations.

A
26
Q
A