Generalizability by Representativeness

TL;DR: Many psychological studies rely on reasoning by representativeness to argue that their studies capture the causes of important phenomena in the real world. This is fallacious, and psychologists should stop doing it.

In this post I’ll explain what the representativeness heuristic is, provide an example of a recent paper that reasons using it, and try and explain why this is bad.

This idea has been kicking around in the back of my mind for years, but only recently did a salient enough example pop up that I felt compelled to write this.

What is reasoning by representativeness?

(Note: You can skip this if you are already a Judgment and Decision-Making expert)

Maya Bar Hillel, in her chapter on “Studies of Representativeness” in the classic edited volume, Judgment Under Uncertainty: Heuristics and Biases, describe this reason as follows:

Daniel Kahneman and Amos Tversky have proposed that when judging the probability of some uncertain event, people often resort of heuristics, or rules of thumb, which are less than perfectly correlated (if, indeed, at all) with the variables that actually determine the event’s probability. one such heuristic is representativeness defined as a subjective judgment of the extent to which an event in question is “similar in essential properties to its parent population” or “reflects the salient features of the process by which it is generated” (Kahneman and Tversky, 1972b, p. 431, 3).

Ok, let’s make this a little bit more concrete. The idea is here is that when you ask someone to assess the probability of an item belonging to a group, they think about the features of the item, the prototypical features of the group, and then compare them. To the extent that the item seems “representative” of the group – i.e. the more that the item shares features with the group – the more likely someone would judge the item’s membership in the group to be.

The classic illustration of how this reasoning can go astray is in the so-called ‘Linda problem’. This is the set-up:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement.

The fallacy is that most people will say that Linda being a bank teller and active in the feminist movement is more likely than Linda being a bank teller. This violates the laws of probability – the probability of an event conjoined with another event cannot be greater than the probability of the event alone. That’s why this is called the “conjunction fallacy.” And yet, the description of Linda intuitive fits our concept of a feminist bank teller much more than our concept of a bank teller alone.

To be clear, there’s a lot of debate about what the Linda problem shows, whether it is in fact a mistake that people make, or a function of wording, framing, conversational pragmatics, etc. Whether or not it’s a fallacy I think is not that interesting (others may disagree!). What’s more interesting is how it does a good job illustrating the pull of intuition in cases like this – when I think about this problem it feels like I can perceive the intuitiveness of  these heuristic. It feels to me like it’s more likely that Linda is a feminist bank teller!

Now what’s the problem with reasoning in this way? Bar-Hillel again, with the harmful consequences:

Although in some cases more probable events also appear more representative, and vice versa, reliance on the representativeness of an event as an indicator of it’s probability may introduce two kinds of systematic error into the judgment. First, it may give undue influence to variables that affect the representativeness of an event but not its probability. Second, it may reduce the importance of variables that are crucial to determining the event’s probability but are unrelated to the event’s representativeness.

In other words, reasoning this way, one might (1) be misled to think that the probability is higher than it actually is because some irrelevant features are shared between the item and the category or (2) you might underweight how important other features of the item are to the probability the item is in the category.

How do psychologists rely on representativeness in their reasoning?

Here’s how psychologists argue this way. Sometimes when experimental psychologists demonstrate an effect in their lab, they want to make a claim that this effect actually matters for real world behaviors (i.e. it generalizes). Think about something like stereotype threat – why is it important? It’s important because psychologists believe that stereotype threat is a causal mechanism that underpins outcomes that we see happening in the real world (e.g. a race gap in test taking).

How do they make that argument? Well, the situation of the experiment is designed in such a way that it shares salient features with the real world phenomena – i.e. that the lab situation is representative of the real world. The study is explicitly trying to capture all the most important aspects of the situation in testing the hypothesis. In the case of stereotype threat, the situation of the experiment is nearly identical to that of the real world (test-taking), so as long as they captured enough of the most important aspects of the real world, one could say the effect generalizes. Of course in arguments about generalizability we often argue about whether some important detail truly was captured – e.g. in stereotype threat people sometimes argue that the thing missing from the lab experiment is incentives/stakes. People won’t suffer these effects when the stakes are high, and thus lab studies can’t underpin whatever test gap we see in the real world.

Now sometimes this kind of representative reasoning is probably ok — the situation in the lab can certainly be exactly the same as the real world. And often times psychologists will run field studies to show that their mechanism is driving some real world outcome. Stereotype threat does seem like a pretty good analog of real test taking. The problem is when the the situation of the lab isn’t the same as the real world and yet psychologists rely on the public’s representativeness intuitions to convince people of the phenomenon’s importance.

Which brings me to a recent paper published in Science, maybe the most visible and prestigious scientific journal, “Prevalence-induced concept change in human judgment”, written by a group of perhaps the most prominent social psychologists currently working in our field (the group includes Dan Gilbert and Tim Wilson – this is like seeing Lennon/McCartney in album liner notes). The paper provides a number of  beautiful demonstration of an effect –

In a series of experiments, we show that people often respond to decreases in the prevalence of a stimulus by expanding their concept of it. When blue dots became rare, participants began to see purple dots as blue; when threatening faces became rare, participants began to see neutral faces as threatening; and when unethical requests became rare, participants began to see innocuous requests as unethical. This “prevalence-induced concept change” occurred even when participants were forewarned about it and even when they were instructed and paid to resist it.

The studies are well run, seem well-powered, and don’t in general seem to suffer from any internal validity issues. The problem is that the generalizability of this effect is argued for on the basis of representativeness:

These results may have sobering implications. Many organizations and institutions are dedicated to identifying and reducing the prevalence of social problems, from unethical research to unwarranted aggressions. But our studies suggest that even well-meaning agents may sometimes fail to recognize the success of their own efforts, simply because they view each new instance in the decreasingly problematic context that they themselves have brought about. Although modern societies have made extraordinary progress in solving a wide range of social problems, from poverty and illiteracy to violence and infant mortality (22, 23), the majority of people believe that the world is getting worse (24). The fact that concepts grow larger when their instances grow smaller may be one source of that pessimism.

This is clearly an instance of reasoning by representativeness. The studies do show that people’s judgments can shift within a study when the prevalence of a target decreases. But how do we know that this phenomena has anything to do with these real world cases? In the wider world certainly there are cases where problems are gradually solved, and in those cases it can certainly seem like people often become harsher judges as time goes on 1.

Per Bar-Hillel, reasoning this way about psychology studies  can steer us wrong in two ways.

“First, it may give undue influence to variables that affect the representativeness of an event but not its probability.” On the basis of this study we might believe that the only relevant facts about our judgments of a social phenomena’s prevalence is our own personal history or memory of of that phenomena. I might think that because I perceive that racism has become less prevalent, other people’s focus on it could stem from this change in their standards, and really it’s not so bad. Is that actually true?

“Second, it may reduce the importance of variables that are crucial to determining the event’s probability but are unrelated to the event’s representativeness.” On the basis of this study we might ignore the fact that these judgments are social and historical processes that play out. Whether I judge something to be racist stems from more than just repeated observations – it’s not just ‘in my head.’ There could be plenty of reasons why people continue to be sensitized to instances of injustice that is not simply a shifting standard – they were taught, they have personal experiences that make such behavior salient, a different bias, (what if it’s availability? or an availability cascade, per Cass Sunstein?), or maybe the injustice is simply invisible to people of privilege who don’t perceive all ways in which that injustice in manifest? There’s a lot going on here, and flattening out all this nuance into a basic property of statistical reasoning might miss more than it captures.

I will say this – I appreciate their use of “may” to give themselves an out (they aren’t really making this strong claim!), but I worry that the potential implications of their work is what got this paper into Science in the first place. I would love to live in a future where instead of telling stories about generalizability we actually test them. And instead of the most prominent journal in science publishing such claims uncritically, they demand more from researchers. Or maybe we can just eliminate “Discussion” sections from papers. That would suit me as well.

Notes:

  1. It’s striking to me that they don’t seem willing to name any actual social problems that might be a case of this – my guess is that if they did, they would have to confront all the ways in which their lab studies are not representative of those situations.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.