There has been a lot of recent work on p-hacking (making things statistically significant through taking advantage of analysis degrees-of-freedom), which I think is good (it’s starting to make people aware of the scope of the problem facing social psychology and related fields); however, I think people are missing something fundamental.
As Tal Yarkoni recently pointed out (and as I pointed out in a previous blog post), the incentives in the academy are messed up. Success in funding, in getting a job, etc, all hinges on your ability to produce positive results. When you livelihood literally depends on getting a positive result, it’s very hard to avoid putting your thumb on that scale.
So the solutions thus far proffered involve things like “publishing your data” and other such controls that will purport to “solve” this problem. However, the deep problem with this can be illustrated with a hypothetical computer program called “the Fake-ulator” (I thought about actually writing this program–but I think the thought experiment is enough for now). Version 1 is just a beta, so it only works for Likert scales. But the idea is simple enough–if we scour the literature for Likert scale data and effects we quickly realize that simple random draws from a response distribution will be easy to spot. Humans have lots of unique biases that lead to systematic patterns in response data like Likert scale data. So, the authors of the Fake-ulator have scoured the literature and have built a random data generator that generates data that looks indistinguishable statistically from real human response data! Better yet, you can input an effect size and generate beautiful (but not too beautiful) data that is statistically significant. You can even generate a fake file drawer, since many of these fake experiments will be “failures”! But hey, since your fake effect is positive, random fake experiments on average will find your effect. So with a computer program like this, you could easily imagine someone faking all of their data in a way that no one would ever notice.
Now what keeps me up at night is, does this computer program already exist? Did we only catch the really dumb fakers who didn’t take the time to do it the right way? One objection might be that anyone smart enough to do this will just run the studies–I think this is wrong. Actually running the studies leaves things up to chance. If you really want a 6-figure tenure track job at Harvard or Princeton, real data just won’t do!
The point of this is just to say that we need more than just clever statistics and safeguards–until we fundamentally change the incentives of science to reward process instead of outcome, we aren’t going to solve this problem. We are only going to make it much harder to determine if something is real or not. The adaptations are already upon us!
This link, which of course touches on many of the same themes as Chris Hayes’ Twilight of the Elites, points out that an increasingly metrics focused way of weeding out potential candidates for some elite group leads to a narrowing of the backgrounds and viewpoints of that elite. This happens as applicants increasingly narrow their focus of study to optimize their chances of success (the gaokao in China is another modern day example of this–there are many).
This connects up to a comment that Cosma Shalizi made regarding my previous post on SimGradSchool, objecting that “I like this a lot, but suspect the assumption of a unidimensional ability score misses a lot of why shit is fucked up and bullshit in the current academic job market.” I think I understand Cosma’s objection more broadly, and it connects directly to the notion of cognitive diversity.
If you read Scott Page’s terrific book on diversity, The Difference, he utilizes simulation to compellingly argue that the key to solving difficult problems is having a diversity of viewpoints drawn from a large pool of possible ways of thinking. Cosma and Henry Farrell have made a similar argument for the benefits of democracy–that a the voting mechanism of democracy is the best way to solve the problem of aggregating preferences and solving complex coordination problems among agents.
So, I think these arguments point to another deeper problem for a unidimensional perspective on research ability. Discovery in science requires a diversity of viewpoints to make progress. If we make all the undergrads come from the same background (e.g. research assistant at a top lab from the beginning of undergrad, poster presentations at relevant conferences, etc.), or new faculty (come from these 10 schools and have 2 JPSPs / psych science journal articles), the problem is that we are going to get too narrow of a pool of potential researchers. One of the unique strengths of my graduate program at CMU was that they took students from many different backgrounds (I basically did a psych/econ grad degree with 0 econ classes, 2 psych classes and a philosophy/cs major). I think it definitely gave us a unique perspective. More broadly, I worry about whether a grades/test scores focused society is going to quash the very creativity that has been so central to innovation. Imagine Steve Jobs trying to get a job today in tech as a dropout from Reed with some calligraphy coursework and no technical major–not happening.
Of course, the problem remains–what do you do with the flood of applicants? You still have a sorting problem. How do you select for cognitive diversity in the right way? This has become an increasingly large problem at tech companies which are leaning on referrals even more than before. I have a few thoughts about this that I will share in an upcoming blog post.
Now the only problem is I probably took the window out of Cosma’s sails and he won’t blog about me anymore 🙁