Psychology’s Reliability & Replication Issue is Science-wide Issue

There was a time when psychology was a coveted, while often jeered, academic major. Psychology professionals maintained the scientific integrity of the field while biology, physics, and chemistry professionals—often called the ‘harder sciences’—would dub psychology a pseudoscience. Still, psychology persisted. The scientific method was taught as the means of collecting data. Undergraduate and graduate students alike enrolled in various descriptive and inferential statistics courses to learn how to gather and understand data for the kinds of results relevant to the research study.

It’s been heavily reported by now that psychology is in a bit of a crisis. In a series of studies conducted from 2011-2014, known as The Reproducibility Project initiated by UVA professor Brian Nosek, only one-third of 100 published psychological experiments could be successfully replicated. This is alarming for the psychology community—students, professors, researchers, practitioners alike.


In November 2011, The Reproducibility Project: Psychology was initiated by what would become The Center for Open Science, co-founded in 2013 by aforementioned UVA professor, Brian Nosek. COS is said to be dedicated to, “enabling open and reproducible research practices worldwide.” This brand of research integrity is great news for professionals and students in the psychology field. COS began their mission by replicating the results of 100 studies published in prestigious science periodicals including Psychological Science, the Journal of Personality & Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition. 97 of these studies reported significant statistical results. Of the 100 studies COS attempted to replicate, only 35 could be replicated.

This is an issue because science is nothing without replication. Replication equals reliability. If someone else cannot produce the same results of your study using the same methods you outlined in your research, then your research is not viewed as reliable. For example, in a research study popularly known as The Marshmallow Study (1972), children of varying genders, races and socioeconomic statuses (SES) were presented with a marshmallow and told that if they waited long enough without eating it, they would receive two. The results concluded that there was a strong correlation between children’s disciplined behavior to wait and their academic performance and future development. Today, researchers struggle to replicate the study and arrive at the same conclusion.

child enjoying marshmallows outdoors

What happened:

A 2018 study revealed that the marshmallow study had several limitations, including sampling. Less than 100 children participated in the initial study, and authors of the initial follow up study admitted sample size was so small that different experimental variables were combined. The variables of the study included a group of children participating in trials in which they couldn’t see the marshmallow while other children participated in trials in which the marshmallow was visible. Ultimately, the results of Watts et. al’s study concluded there was a greater correlation between environment and/or socioeconomic status (SES) and delayed gratification performance during marshmallow trials. Further, future performance and success were also correlated with SES. These results are different from the original study in which patience with respect to waiting to receive two marshmallows was heavily insinuated to be an innate quality.

This isn’t the first time psychology’s replication issue has been called into question. With how quickly journals publish new studies in each periodical, publishing rates are being scrutinized. Nosek himself joined the debate, admitting that not only are some of his submissions published at an alarming rate, but that rich people funding and benefiting from the publications have a vested interest in staying rich. So-called publication biases are wreaking havoc on other sciences, not only psychology.

brian nosek tweet about quick submission/acceptance rates
brian nosek tweet about rich people getting richer in academic publishing

What this means:

In relation to the original study, good old fashioned racism and ethnic stereotypes contributed to biased hypotheses and conclusions. Walter Mischel, leader of the Stanford marshmallow experiment, originally ran the experiment in Trinidad between students of African descent (described as negro) and students of Indian descent. Mischel, again, concluded that delayed gratification was an intrinsic quality that many Indian children possessed and many African children did not. Emboldened by his findings, Mischel conducted the experiment again in the U.S. to find children of African descent lacked this intrinsic quality when compared to white children. This may sound silly, and it is. However, racism has been a driving force behind many academic publications still touted today as good science.

The good, or concerning, news is that psychology isn’t alone. A 2005 study of 49 medical research studies concluded that only 44% were replicated. Sixteen percent were contradicted by further research, and another sixteen percent reported stronger results than subsequent studies. Twenty-four percent of the studies, “remained largely unchallenged.” Not to mention the biases of medical professionals and the racial disparities that prevail. Studies show that doctors and medical residents literally think black people have thicker skin. These and other biases/racist ideologies correlate to black people and other people of color being prescribed less pain medication than white people. This is dangerous, as the very physicians we trust with our lives may hold implicit biases that affect the level of care they provide based on skin tone.

child enjoying marshmallows outside

Looking ahead

Perhaps this is the kind of fire psychology and all fields of science need lit underneath them. There is real, reproducible science in these fields. Concrete, reliable methods and schools of thought. But dedicated professionals must look beyond capitalist gain and wade through research for genuine validity, reliability, and reproducibility. It’s also time for experimental and research scientists in all fields to stop being lazy. Studies with sample sizes too small for generalization or high attrition rates are rampant. Studies with sample sizes of one race, gender, ability also hurt generalization as physical, mental, mood changes, symptoms, diagnoses may present differently in different people. This hurts credibility. Racism hurts credibility. Researchers need to check their own biases at the door, and stop publishing articles in which biases and stereotypes are upheld above all. Poorly defined terms and poorly designed experiments also yield unreliable data. All of this makes replication difficult and, as The Reproducibility Project concluded, impossible more often than not.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.