When a study can’t be replicated

Many factors can prevent scientists from repeating research and confirming results

white male scientist

Sometimes the findings of research that was done well can’t be replicated — confirmed by other scientists. The reasons may vary or never be fully understood, new studies find. 

ViktorCap / iStockphoto

In the world of science, the gold standard for accepting a finding is seeing it “replicated.” To achieve this, researchers must repeat a study and find the same conclusion. Doing so helps confirm that the original finding wasn’t a fluke — one due to chance.

Yet try as they might, many research teams cannot replicate, or match, an original study’s results. Sometimes that occurs because the original scientists faked the study. Indeed, a 2012 study looked at more than 2,000 published papers that had to be retracted — eventually labeled by the publisher as too untrustworthy to believe. Of these, more than 65 percent involved cases of misconduct, including fraud.

But even when research teams act honorably, their studies may still prove hard to replicate, a new study finds. Yet a second new analysis shows how important it is to try to replicate studies. It also shows what researchers can learn from the mistakes of others.

The first study focused on 100 human studies in the field of psychology. That field examines how animals or people respond to certain conditions and why. The second study looked at 38 research papers reporting possible explanations for global warming. The papers presented explanations for global warming that run contrary to those of the vast majority of the world’s climate scientists.

Both new studies set out to replicate the earlier ones. Both had great trouble doing so. Yet neither found evidence of fraud. These studies point to how challenging it can be to replicate research. Yet without that replication, the research community may find it hard to trust a study’s data or know how to interpret what those data mean.

Trying to make sense of the numbers

Brian Nosek led the first new study. He is a psychologist the University of Virginia in Charlottesville. His research team recruited 270 scientists. Their mission: to reproduce the findings of 100 previously published studies. All of the studies had appeared in one of three major psychology journals in 2008. In the end, only 35 of the studies could be replicated by this group. The researchers described their efforts in the August 28 issue of Science.

Two types of findings proved hardest to confirm. The first were those that originally had been described as unexpected. The second were ones that had barely achieved statistical significance. That raises concerns, Nosek told Science News, about the common practice of publishing attention-grabbing results. Many of those types of findings appear to have come from data that had not been statistically strong. Such studies may have included too few individuals. Or they may have turned up only weak signs of an effect. There is a greater likelihood that such findings are the result of random chance.

No one can say why the tests by Nosek’s team failed to confirm findings in 65 percent of their tries. It’s possible the initial studies were not done well. But even if they had been done well, conflicting conclusions raise doubts about the original findings. For instance, they may not be applicable to groups other than the ones initially tested.

Rasmus Benestad works at the Norwegian Meteorological Institute in Oslo. He led the second new study. It focused on climate research.

In climate science, some 97 percent of reports and scientists have come to a similar conclusion: that human activities, mostly the burning of fossil fuels, are a major driver of a recent global warming. The 97 percent figure came from the United Nations’ Intergovernmental Panel on Climate Change. This is a group of researchers active in climate science. The group reviewed nearly 12,000 abstracts of published research findings. It also received some 1,200 ratings by climate scientists of what the published data and analyses had concluded about climate change. Nearly all came up with the same source: us.

But what about the other 3 percent? Was there something different about those studies? Or could there be something different about the scientists who felt that humans did not play a big role in global warming? That’s what this new study sought to probe. It took a close look at 38 of these “contrarian” papers.

Benestad’s team attempted to replicate the original analyses in these papers. In doing so, the team pored over the details of each study. Along the way, they identified several common problems. Many started with false assumptions, the new analysis says. Some used a faulty analysis. Others set up an improper hypothesis for testing. Still others used “incorrect statistics” for making their analyses, Benestad’s group reports. Several papers also set up a false either/or situation. They had argued if one thing influenced global warming, then the other must not have. In fact, Benestad’s group noted, that logic was sometimes faulty. In many cases, both explanations for global warming might work together.

Mistakes or an incomplete understanding of previous work by others could lead to faulty assessments, Benestad’s team concluded. Its new analysis appeared August 20 in Theoretical and Applied Climatology.

What to make of this?

It might seem like it should be easy to copy a study and come up with similar findings. As the two new studies show, it’s not. And there can be a host of reasons why.

Some investigators have concluded that it may be next to impossible to redo a study exactly. This can be true especially when a study works with subjects or materials that vary greatly. Cells, animals and people are all things that have a lot of variation. Due to genetic or developmental differences, one cell or individual may respond differently to stimuli than another will. Stimuli might include foods, drugs, infectious germs or some other aspect of the environment.

Similarly, some studies involve conditions that are quite complicated. Examples can include the weather or how crowds of people behave. Consider climate studies. Computers are not yet big enough and fast enough to account for everything that affects climate, scientists note. Many of these factors will vary broadly over time and distance. So climate scientists choose to analyze the conditions that seem the most important. They may concentrate on those for which they have the best or the most data. If the next group of researchers uses a different set of data, their findings may not match the earlier ones.

Eventually, time and more data may show why the findings of an original study and a repeated one differ. One of the studies may be found weak or somewhat flawed. Perhaps both will be.

This points to what can make advancing science so challenging. “Science is never settled, and both the scientific consensus and alternative hypotheses should be subject to ongoing questioning,” Benestad’s group argues.

Researchers should try to prove or disprove even those things that have been considered common knowledge, they add. Resolving differences in an understanding of science and data is essential, they argue. That is true in climate science, psychology and every other field. After all, without a good understanding of science, they say, society won’t be able to make sound decisions on how to create a safer, healthier and more sustainable world.


Janet Raloff is the Editor, Digital of Science News Explores. Prior to this, she was an environmental reporter for Science News, specializing in toxicology. To her never-ending surprise, her daughter became a toxicologist.

More Stories from Science News Explores on Science & Society