Loading metrics
Open Access
Perspective
The Perspective section provides experts with a forum to comment on topical or controversial issues of broad interest.
See all article types »
What is replication?
* E-mail: [email protected]
Affiliations Center for Open Science, Charlottesville, Virginia, United States of America, University of Virginia, Charlottesville, Virginia, United States of America
Affiliation Center for Open Science, Charlottesville, Virginia, United States of America
- Brian A. Nosek,
- Timothy M. Errington
Published: March 27, 2020
- https://doi.org/10.1371/journal.pbio.3000691
- Reader Comments
Credibility of scientific claims is established with evidence for their replicability using new data. According to common understanding, replication is repeating a study’s procedure and observing whether the prior finding recurs. This definition is intuitive, easy to apply, and incorrect. We propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes. The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; Unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress.
Citation: Nosek BA, Errington TM (2020) What is replication? PLoS Biol 18(3): e3000691. https://doi.org/10.1371/journal.pbio.3000691
Copyright: © 2020 Nosek, Errington. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by grants from Arnold Ventures, John Templeton Foundation, Templeton World Charity Foundation, and Templeton Religion Trust. The funders had no role in the preparation of the manuscript or the decision to publish.
Competing interests: We have read the journal’s policy and the authors of this manuscript have the following competing interests: BAN and TME are employees of the Center for Open Science, a nonprofit technology and culture change organization with a mission to increase openness, integrity, and reproducibility of research.
Provenance: Commissioned; not externally peer reviewed.
Introduction
Credibility of scientific claims is established with evidence for their replicability using new data [ 1 ]. This is distinct from retesting a claim using the same analyses and same data (usually referred to as reproducibility or computational reproducibility ) and using the same data with different analyses (usually referred to as robustness ). Recent attempts to systematically replicate published claims indicate surprisingly low success rates. For example, across 6 recent replication efforts of 190 claims in the social and behavioral sciences, 90 (47%) replicated successfully according to each study’s primary success criterion [ 2 ]. Likewise, a large-sample review of 18 candidate gene or candidate gene-by-interaction hypotheses for depression found no support for any of them [ 3 ], a particularly stunning result considering that more than 1,000 articles have investigated their effects. Replication challenges have spawned initiatives to improve research rigor and transparency such as preregistration and open data, materials, and code [ 4 – 6 ]. Simultaneously, failures-to-replicate have spurred debate about the meaning of replication and its implications for research credibility. Replications are inevitably different from the original studies. How do we decide whether something is a replication? The answer shifts the conception of replication from a boring, uncreative, housekeeping activity to an exciting, generative, vital contributor to research progress.
Replication reconsidered
According to common understanding, replication is repeating a study’s procedure and observing whether the prior finding recurs [ 7 ]. This definition of replication is intuitive, easy to apply, and incorrect.
The problem is this definition’s emphasis on repetition of the technical methods—the procedure, protocol, or manipulated and measured events. Why is that a problem? Imagine an original behavioral study was conducted in the United States in English. What if the replication is to be done in the Philippines with a Tagalog-speaking sample? To be a replication, must the materials be administered in English? With no revisions for the cultural context? If minor changes are allowed, then what counts as minor to still qualify as repeating the procedure? More broadly, it is not possible to recreate an earthquake, a supernova, the Pleistocene, or an election. If replication requires repeating the manipulated or measured events of the study, then it is not possible to conduct replications in observational research or research on past events.
The repetition of the study procedures is an appealing definition of replication because it often corresponds to what researchers do when conducting a replication—i.e., faithfully follow the original methods and procedures as closely as possible. But the reason for doing so is not because repeating procedures defines replication. Replications often repeat procedures because theories are too vague and methods too poorly understood to productively conduct replications and advance theoretical understanding otherwise [ 8 ].
Prior commentators have drawn distinctions between types of replication such as “direct” versus “conceptual” replication and argue in favor of valuing one over the other (e.g., [ 9 , 10 ]). By contrast, we argue that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge. Procedural definitions of replication are masks for underdeveloped theoretical expectations, and “conceptual replications” as they are identified in practice often fail to meet the criteria we develop here and deem essential for a test to qualify as a replication.
Replication redux
We propose an alternative definition for replication that is more inclusive of all research and more relevant for the role of replication in advancing knowledge. Replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes.
To be a replication, 2 things must be true: outcomes consistent with a prior claim would increase confidence in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim. The symmetry promotes replication as a mechanism for confronting prior claims with new evidence. Therefore, declaring that a study is a replication is a theoretical commitment. Replication provides the opportunity to test whether existing theories, hypotheses, or models are able to predict outcomes that have not yet been observed. Successful replications increase confidence in those models; unsuccessful replications decrease confidence and spur theoretical innovation to improve or discard the model. This does not imply that the magnitude of belief change is symmetrical for “successes” and “failures.” Prior and existing evidence inform the extent to which replication outcomes alter beliefs. However, as a theoretical commitment, replication does imply precommitment to taking all outcomes seriously.
Because replication is defined based on theoretical expectations, not everyone will agree that one study is a replication of another. Moreover, it is not always possible to make precommitments to the diagnosticity of a study as a replication, often for the simple reason that study outcomes are already known. Deciding whether studies are replications after observing the outcomes can leverage post hoc reasoning biases to dismiss “failures” as nonreplications and “successes” as diagnostic tests of the claims, or the reverse if the observer wishes to discredit the claims. This can unproductively retard research progress by dismissing replication counterevidence. Simultaneously, replications can fail to meet their intended diagnostic aims because of error or malfunction in the procedure that is only identifiable after the fact. When there is uncertainty about the status of claims and the quality of methods, there is no easy solution to distinguishing between motivated and principled reasoning about evidence. Science’s most effective solution is to replicate, again.
At its best, science minimizes the impact of ideological commitments and reasoning biases by being an open, social enterprise. To achieve that, researchers should be rewarded for articulating their theories clearly and a priori so that they can be productively confronted with evidence [ 4 , 6 ]. Better theories are those that make it clear how they can be supported and challenged by replication. Repeated replication is often necessary to resolve confidence in a claim, and, invariably, researchers will have plenty to argue about even when replication and precommitment are normative practices.
Replication resolved
The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Theory advances in fits and starts with conceptual leaps, unexpected observations, and a patchwork of evidence. That is okay; it is fuzzy at the frontiers of knowledge. The dialogue between theory and evidence facilitates identification of contours, constraints, and expectations about the phenomena under study. Replicable evidence provides anchors for that iterative process. If evidence is replicable, then theory must eventually account for it, even if only to dismiss it as irrelevant because of invalidity of the methods. For example, the claims that there are more obese people in wealthier countries compared with poorer countries on average and that people in wealthier countries live longer than people in poorer countries on average could both be highly replicable. All theoretical perspectives about the relations between wealth, obesity, and longevity would have to account for those replicable claims.
There is no such thing as exact replication. We cannot reproduce an earthquake, era, or election, but replication is not about repeating historical events. Replication is about identifying the conditions sufficient for assessing prior claims. Replication can occur in observational research when the conditions presumed essential for observing the evidence recur, such as when a new seismic event has the characteristics deemed necessary and sufficient to observe an outcome predicted by a prior theory or when a new method for reassessing a fossil offers an independent test of existing claims about that fossil. Even in experimental research, original and replication studies inevitably differ in some aspects of the sample—or units—from which data are collected, the treatments that are administered, the outcomes that are measured, and the settings in which the studies are conducted [ 11 ].
Individual studies do not provide comprehensive or definitive evidence about all conditions for observing evidence about claims. The gaps are filled with theory. A single study examines only a subset of units, treatments, outcomes, and settings. The study was conducted in a particular climate, at particular times of day, at a particular point in history, with a particular measurement method, using particular assessments, with a particular sample. Rarely do researchers limit their inference to precisely those conditions. If they did, scientific claims would be historical claims because those precise conditions will never recur. If a claim is thought to reveal a regularity about the world, then it is inevitably generalizing to situations that have not yet been observed. The fundamental question is: of the innumerable variations in units, treatments, outcomes, and settings, which ones matter? Time-of-day for data collection may be expected to be irrelevant for a claim about personality and parenting or critical for a claim about circadian rhythms and inhibition.
When theories are too immature to make clear predictions, repetition of original procedures becomes very useful. Using the same procedures is an interim solution for not having clear theoretical specification of what is needed to produce evidence about a claim. And, using the same procedures reduces uncertainty about what qualifies as evidence “consistent with” earlier claims. Replication is not about the procedures per se, but using similar procedures reduces uncertainty in the universe of possible units, treatments, outcomes, and settings that could be important for the claim.
Because there is no exact replication, every replication test assesses generalizability to the new study’s unique conditions. However, every generalizability test is not a replication. Fig 1 ‘s left panel illustrates a discovery and conditions around it to which it is potentially generalizable. The generalizability space is large because of theoretical immaturity; there are many conditions in which the claim might be supported, but failures would not discredit the original claim. Fig 1 ‘s right panel illustrates a maturing understanding of the claim. The generalizability space has shrunk because some tests identified boundary conditions (gray tests), and the replicability space has increased because successful replications and generalizations (colored tests) have improved theoretical specification for when replicability is expected.
- PPT PowerPoint slide
- PNG larger image
- TIFF original image
For underspecified theories, there is a larger space for which the claim may or may not be supported—the theory does not provide clear expectations. These are generalizability tests. Testing replicability is a subset of testing generalizability. As theory specification improves (moving from left panel to right panel), usually interactively with repeated testing, the generalizability and replicability space converge. Failures-to-replicate or generalize shrink the space (dotted circle shows original plausible space). Successful replications and generalizations expand the replicability space—i.e., broadening and strengthening commitments to replicability across units, treatments, outcomes, and settings.
https://doi.org/10.1371/journal.pbio.3000691.g001
Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Repeatedly testing replicability and generalizability across units, treatments, outcomes, and settings facilitates improvement in theoretical specificity and future prediction.
Theoretical maturation is illustrated in Fig 2 . A progressive research program (the left path) succeeds in replicating findings across conditions presumed to be irrelevant and also matures the theoretical account to more clearly distinguish conditions for which the phenomenon is expected to be observed or not observed. This is illustrated by a shrinking generalizability space in which the theory does not make clear predictions. A degenerative research program (the right path) persistently fails to replicate the findings and progressively narrows the universe of conditions to which the claim could apply. This is illustrated by shrinking generalizability and replicability space because the theory must be constrained to ever narrowing conditions [ 12 ].
With progressive success (left path) theoretical expectations mature, clarifying when replicability is expected. Also, boundary conditions become clearer, reducing the potential generalizability space. A complete theoretical account eliminates generalizability space because the theoretical expectations are so clear and precise that all tests are replication tests. With repeated failures (right path) the generalizability and replicability space both shrink, eventually to a theory so weak that it makes no commitments to replicability.
https://doi.org/10.1371/journal.pbio.3000691.g002
This exposes an inevitable ambiguity in failures-to-replicate. Was the original evidence a false positive or the replication a false negative, or does the replication identify a boundary condition of the claim? We can never know for certain that earlier evidence was a false positive. It is always possible that it was “real,” and we cannot identify or recreate the conditions necessary to replicate successfully. But that does not mean that all claims are true, and science cannot be self-correcting. Accumulating failures-to-replicate could result in a much narrower but more precise set of circumstances in which evidence for the claim is replicable, or it may result in failure to ever establish conditions for replicability and relegate the claim to irrelevance.
The ambiguity between disconfirming an original claim or identifying a boundary condition also means that understanding whether or not a study is a replication can change due to accumulation of knowledge. For example, the famous experiment by Otto Loewi (1936 Nobel Prize in Physiology or Medicine) showed that the inhibitory factor “vagusstoff,” subsequently determined to be acetylcholine, was released from the vagus nerve of frogs, suggesting that neurotransmission was a chemical process. Much later, after his and others’ failures-to-replicate his original claim, a crucial theoretical insight identified that the time of year at which Loewi performed his experiment was critical to its success [ 13 ]. The original study was performed with so-called winter frogs. The replication attempts performed with summer frogs failed because of seasonal sensitivity of the frog heart to the unrecognized acetylcholine, making the effects of vagal stimulation far more difficult to demonstrate. With subsequent tests providing supporting evidence, the understanding of the claim improved. What had been perceived as replications were not anymore because new evidence demonstrated that they were not studying the same thing. The theoretical understanding evolved, and subsequent replications supported the revised claims. That is not a problem, that is progress.
Replication is rare
The term “conceptual replication” has been applied to studies that use different methods to test the same question as a prior study. This is a useful research activity for advancing understanding, but many studies with this label are not replications by our definition. Recall that “to be a replication, 2 things must be true: outcomes consistent with a prior claim would increase confidence in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim." Many "conceptual replications" meet the first criterion and fail the second. That is, they are not designed such that a failure to replicate would revise confidence in the original claim. Instead, “conceptual replications” are often generalizability tests. Failures are interpreted, at most, as identifying boundary conditions. A self-assessment of whether one is testing replicability or generalizability is answering—would an outcome inconsistent with prior findings cause me to lose confidence in the theoretical claims? If no, then it is a generalizability test.
Designing a replication with a different methodology requires understanding of the theory and methods so that any outcome is considered diagnostic evidence about the prior claim. In practice, this means that replication is often limited to relatively close adherence to original methods for topics in which theory and methodology is immature—a circumstance commonly called “direct” or “close” replication—because the similarity of methods serves as a stand-in for theoretical and measurement precision. In fact, conducting a replication of a prior claim with a different methodology can be considered a milestone for theoretical and methodological maturity.
Replication is characterized as the boring, rote, clean-up work of science. This misperception makes funders reluctant to fund it, journals reluctant to publish it, and institutions reluctant to reward it. The disincentives for replication are a likely contributor to existing challenges of credibility and replicability of published claims [ 14 ].
Defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress. Single studies, whether they pursue novel ends or confront existing expectations, never definitively confirm or disconfirm theories. Theories make predictions; replications test those predictions. Outcomes from replications are fodder for refining, altering, or extending theory to generate new predictions. Replication is a central part of the iterative maturing cycle of description, prediction, and explanation. A shift in attitude that includes replication in funding, publication, and career opportunities will accelerate research progress.
Acknowledgments
We thank Alex Holcombe, Laura Scherer, Leonhard Held, and Don van Ravenzwaaij for comments on earlier versions of this paper, and we thank Anne Chestnut for graphic design support.
- View Article
- Google Scholar
- PubMed/NCBI
- 7. Jeffreys H. Scientific Inference. 3rd ed. Cambridge: Cambridge University Press; 1973.
- 11. Shadish WR, Chelimsky E, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. 2nd ed. Boston: Houghton Mifflin; 2002.
- 12. Lakatos I. Falsification and the Methodology of Scientific Research Programmes. In: Harding SG, editors. Can Theories be Refuted? Synthese Library (Monographs on Epistemology, Logic, Methodology, Philosophy of Science, Sociology of Science and of Knowledge, and on the Mathematical Methods of Social and Behavioral Sciences). Dordrecht: Springer;1976. p. 205–259. https://doi.org/10.1007/978-94-010-1863-0_14
- Bipolar Disorder
- Therapy Center
- When To See a Therapist
- Types of Therapy
- Best Online Therapy
- Best Couples Therapy
- Managing Stress
- Sleep and Dreaming
- Understanding Emotions
- Self-Improvement
- Healthy Relationships
- Student Resources
- Personality Types
- Sweepstakes
- Guided Meditations
- Verywell Mind Insights
- 2024 Verywell Mind 25
- Mental Health in the Classroom
- Editorial Process
- Meet Our Review Board
- Crisis Support
What Is Replication in Psychology Research?
Examples of replication in psychology.
- Why Replication Matters
- How It Works
What If Replication Fails?
- The Replication Crisis
How Replication Can Be Strengthened
Replication refers to the repetition of a research study, generally with different situations and subjects, to determine if the basic findings of the original study can be applied to other participants and circumstances.
In other words, when researchers replicate a study, it means they reproduce the experiment to see if they can obtain the same outcomes.
Once a study has been conducted, researchers might be interested in determining if the results hold true in other settings or for other populations. In other cases, scientists may want to replicate the experiment to further demonstrate the results.
At a Glance
In psychology, replication is defined as reproducing a study to see if you get the same results. It's an important part of the research process that strengthens our understanding of human behavior. It's not always a perfect process, however, and extraneous variables and other factors can interfere with results.
For example, imagine that health psychologists perform an experiment showing that hypnosis can be effective in helping middle-aged smokers kick their nicotine habit. Other researchers might want to replicate the same study with younger smokers to see if they reach the same result.
Exact replication is not always possible. Ethical standards may prevent modern researchers from replicating studies that were conducted in the past, such as Stanley Milgram's infamous obedience experiments .
That doesn't mean that researchers don't perform replications; it just means they have to adapt their methods and procedures. For example, researchers have replicated Milgram's study using lower shock thresholds and improved informed consent and debriefing procedures.
Why Replication Is Important in Psychology
When studies are replicated and achieve the same or similar results as the original study, it gives greater validity to the findings. If a researcher can replicate a study’s results, it is more likely that those results can be generalized to the larger population.
Human behavior can be inconsistent and difficult to study. Even when researchers are cautious about their methods, extraneous variables can still create bias and affect results.
That's why replication is so essential in psychology. It strengthens findings, helps detect potential problems, and improves our understanding of human behavior.
How Do Scientists Replicate an Experiment?
When conducting a study or experiment , it is essential to have clearly defined operational definitions. In other words, what is the study attempting to measure?
When replicating earlier researchers, experimenters will follow the same procedures but with a different group of participants. If the researcher obtains the same or similar results in follow-up experiments, it means that the original results are less likely to be a fluke.
The steps involved in replicating a psychology experiment often include the following:
- Review the original experiment : The goal of replication is to use the exact methods and procedures the researchers used in the original experiment. Reviewing the original study to learn more about the hypothesis, participants, techniques, and methodology is important.
- Conduct a literature review : Review the existing literature on the subject, including any other replications or previous research. Considering these findings can provide insights into your own research.
- Perform the experiment : The next step is to conduct the experiment. During this step, keeping your conditions as close as possible to the original experiment is essential. This includes how you select participants, the equipment you use, and the procedures you follow as you collect your data.
- Analyze the data : As you analyze the data from your experiment, you can better understand how your results compare to the original results.
- Communicate the results : Finally, you will document your processes and communicate your findings. This is typically done by writing a paper for publication in a professional psychology journal. Be sure to carefully describe your procedures and methods, describe your findings, and discuss how your results compare to the original research.
So what happens if the original results cannot be reproduced? Does that mean that the experimenters conducted bad research or that, even worse, they lied or fabricated their data?
In many cases, non-replicated research is caused by differences in the participants or in other extraneous variables that might influence the results of an experiment. Sometimes the differences might not be immediately clear, but other researchers might be able to discern which variables could have impacted the results.
For example, minor differences in things like the way questions are presented, the weather, or even the time of day the study is conducted might have an unexpected impact on the results of an experiment. Researchers might strive to perfectly reproduce the original study, but variations are expected and often impossible to avoid.
Are the Results of Psychology Experiments Hard to Replicate?
In 2015, a group of 271 researchers published the results of their five-year effort to replicate 100 different experimental studies previously published in three top psychology journals. The replicators worked closely with the original researchers of each study in order to replicate the experiments as closely as possible.
The results were less than stellar. Of the 100 experiments in question, 61% could not be replicated with the original results. Of the original studies, 97% of the findings were deemed statistically significant. Only 36% of the replicated studies were able to obtain statistically significant results.
As one might expect, these dismal findings caused quite a stir. You may have heard this referred to as the "'replication crisis' in psychology.
Similar replication attempts have produced similar results. Another study published in 2018 replicated 21 social and behavioral science studies. In these studies, the researchers were only able to successfully reproduce the original results about 62% of the time.
So why are psychology results so difficult to replicate? Writing for The Guardian , John Ioannidis suggested that there are a number of reasons why this might happen, including competition for research funds and the powerful pressure to obtain significant results. There is little incentive to retest, so many results obtained purely by chance are simply accepted without further research or scrutiny.
The American Psychological Association suggests that the problem stems partly from the research culture. Academic journals are more likely to publish novel, innovative studies rather than replication research, creating less of an incentive to conduct that type of research.
Reasons Why Research Cannot Be Replicated
The project authors suggest that there are three potential reasons why the original findings could not be replicated.
- The original results were a false positive.
- The replicated results were a false negative.
- Both studies were correct but differed due to unknown differences in experimental conditions or methodologies.
The Nobel Prize-winning psychologist Daniel Kahneman has suggested that because published studies are often too vague in describing methods used, replications should involve the authors of the original studies to more carefully mirror the methods and procedures used in the original research.
In fact, one investigation found that replication rates are much higher when original researchers are involved.
While some might be tempted to look at the results of such replication projects and assume that psychology is more art than science, many suggest that such findings actually help make psychology a stronger science. Human thought and behavior is a remarkably subtle and ever-changing subject to study.
In other words, it's normal and expected for variations to exist when observing diverse populations and participants.
Some research findings might be wrong, but digging deeper, pointing out the flaws, and designing better experiments helps strengthen the field. The APA notes that replication research represents a great opportunity for students. it can help strengthen research skills and contribute to science in a meaningful way.
Nosek BA, Errington TM. What is replication ? PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691
Burger JM. Replicating Milgram: Would people still obey today ? Am Psychol . 2009;64(1):1-11. doi:10.1037/a0010932
Makel MC, Plucker JA, Hegarty B. Replications in psychology research: How often do they really occur? Perspectives on Psychological Science . 2012;7(6):537-542. doi:10.1177/1745691612460688
Aarts AA, Anderson JE, Anderson CJ, et al. Estimating the reproducibility of psychological science . Science. 2015;349(6251). doi:10.1126/science.aac4716
Camerer CF, Dreber A, Holzmeister F, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 . Nat Hum Behav . 2018;2(9):637-644. doi:10.1038/s41562-018-0399-z
American Psychological Association. Learning into the replication crisis: Why you should consider conducting replication research .
Kahneman D. A new etiquette for replication . Social Psychology. 2014;45(4):310-311.
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
If you're seeing this message, it means we're having trouble loading external resources on our website.
If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.
To log in and use all the features of Khan Academy, please enable JavaScript in your browser.
AP®︎/College Biology
Course: ap®︎/college biology > unit 6.
- Antiparallel structure of DNA strands
- Leading and lagging strands in DNA replication
- Speed and precision of DNA replication
- Semi-conservative replication
Molecular mechanism of DNA replication
- DNA structure and replication review
- Replication
Key points:
- DNA replication is semiconservative . Each strand in the double helix acts as a template for synthesis of a new, complementary strand.
- New DNA is made by enzymes called DNA polymerases , which require a template and a primer (starter) and synthesize DNA in the 5' to 3' direction.
- During DNA replication, one new strand (the leading strand ) is made as a continuous piece. The other (the lagging strand ) is made in small pieces.
- DNA replication requires other enzymes in addition to DNA polymerase, including DNA primase , DNA helicase , DNA ligase , and topoisomerase .
Introduction
The basic idea, dna polymerase.
- They always need a template
- They can only add nucleotides to the 3' end of a DNA strand
- They can't start making a DNA chain from scratch, but require a pre-existing chain or short stretch of nucleotides called a primer
- They proofread , or check their work, removing the vast majority of "wrong" nucleotides that are accidentally added to the chain
Starting DNA replication
Primers and primase, leading and lagging strands, the maintenance and cleanup crew, summary of dna replication in e. coli.
- Helicase opens up the DNA at the replication fork.
- Single-strand binding proteins coat the DNA around the replication fork to prevent rewinding of the DNA.
- Topoisomerase works at the region ahead of the replication fork to prevent supercoiling.
- Primase synthesizes RNA primers complementary to the DNA strand.
- DNA polymerase III extends the primers, adding on to the 3' end, to make the bulk of the new DNA.
- RNA primers are removed and replaced with DNA by DNA polymerase I .
- The gaps between DNA fragments are sealed by DNA ligase .
DNA replication in eukaryotes
- Eukaryotes usually have multiple linear chromosomes, each with multiple origins of replication. Humans can have up to 100 , 000 origins of replication 5 !
- Most of the E. coli enzymes have counterparts in eukaryotic DNA replication, but a single enzyme in E. coli may be represented by multiple enzymes in eukaryotes. For instance, there are five human DNA polymerases with important roles in replication 5 .
- Most eukaryotic chromosomes are linear. Because of the way the lagging strand is made, some DNA is lost from the ends of linear chromosomes (the telomeres ) in each round of replication.
Explore outside of Khan Academy
Attribution:, works cited:.
- National Human Genome Research Institute. (2010, October 30). The human genome project completion: Frequently asked questions. In News release archives 2003 . Retrieved from https://www.genome.gov/11006943 .
- Reece, J. B., Urry, L. A., Cain, M. L., Wasserman, S. A., Minorsky, P. V., and Jackson, R. B. (2011). Origins of replication in E. coli and eukaryotes. In Campbell biology (10th ed., p. 321). San Francisco, CA: Pearson.
- Bell, S. P. and Kaguni, J. M. (2013). Helicase loading at chromosomal origins of replication. Cold Spring Harb. Perspect. Biol. , 5 (6), a010124. http://dx.doi.org/10.1101/cshperspect.a010124 .
- Yao, N. Y. and O'Donnell, M. (2008). Replisome dynamics and use of DNA trombone loops to bypass replication blocks. Mol. Biosyst. , 4 (11), 1075-1084. http://dx.doi.org/10.1039/b811097b . Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4011192/ .
- OpenStax College, Biology. (2016, May 27). DNA replication in eukaryotes. In OpenStax CNX . Retrieved from http://cnx.org/contents/[email protected]:2l3nsfJK@5/DNA-Replication-in-Eukaryotes .
Additional references:
Want to join the conversation.
- Upvote Button navigates to signup page
- Downvote Button navigates to signup page
- Flag Button navigates to signup page
Genuine replication and pseudoreplication: what’s the difference?
By Stanley E. Lazic ( @StanLazic )
Replication is a key idea in science and statistics, but is often misunderstood by researchers because they receive little education or training on experimental design. Consequently, the wrong entity is replicated in many experiments, leading to pseudoreplication or the “unit of analysis” problem [1,2]. This results in exaggerated sample sizes and a potential increase in both false positives and false negatives – the worst of all possible worlds.
Replication can mean many things
Replication is not always easy to understand because many parts of an experiment can be replicated, and a non-exhaustive list includes:
- Replicating the measurements taken on a set of samples. Examples include taking two blood pressure readings on each person or dividing a blood sample into two aliquots and measuring the concentration of a substance in each aliquot.
- Replicating the application of a treatment or intervention to a biological entity of interest. This is the traditional way of increasing the sample size, by increasing the number of treatment–entity pairs; for example, the number of times a drug or vehicle control is randomly and independently applied to a set of rats.
- Replicating the experimental procedure under different conditions. Repeating the experimental procedure several times, but where a known source of variation is present on each occasion. An example is a multi-centre clinical trial where differences between centres may exist. Another example is a large animal experiment that is broken down into two smaller experiments to make it manageable, and each smaller experiment is run by a different technician.
- Replicating the experiment by independent researchers. Repeating the whole experiment by researchers that were not part of the initial experiment. This occurs when a paper is published and others try to obtain the same results.
To add to the confusion, terms with related meanings exist, such as repeatability, reproducibility, and replicability. Furthermore, the reasons for having or increasing replication are diverse and include a need to increase statistical power, a desire to make the results more generalisable, or the result of a practical constraint, such as an inability to recruit enough patients in one centre and so multiple centres are needed.
Requirements for genuine replication
How do you design an experiment to have genuine replication and not pseudoreplication? First, ensure that replication is at the level of the biological question or scientific hypothesis. For example, to test the effectiveness of a drug in rats, give the drug to multiple rats, and compare the result with other rats that received a control treatment (corresponding to example 2 above). Multiple measurements on each rat (example 1 above) do not count towards genuine replication.
To test if a drug kills proliferating cells in a well compared to a control condition, you will need multiple drug and control wells, since the drug is applied on a per-well basis. But you may worry that the results from a single experimental run will not generalise – even if you can perform a valid statistical test – because results from in vitro experiments can be highly variable. You could then repeat the experiment four times (corresponding to example 3 above), and the sample size is now four, not the total number of wells that were used across all of the experimental runs. This second option requires more work, will take longer, and will usually have lower power, but it provides a more robust result because the experimenter’s ability to reproduce the treatment effect across multiple experimental runs has been replicated.
To test if pre-registered studies report different effect sizes from traditional studies that are not pre-registered, you will need multiple studies of both types (corresponding to example 5 above). The number of subjects in each of these studies is irrelevant for testing this study-level hypothesis.
Replication at the level of the question or hypothesis a necessary but not sufficient condition for genuine replication – three criteria must be satisfied [1,3]:
- For experiments, the biological entities of interest must be randomly and independently assigned to treatment groups. If this criterion holds, the biological entities are also called the experimental units [1,3].
- The treatment(s) should be applied independently to each experimental unit. Injecting animals with a drug is an independent application of a treatment, whereas putting the drug in the drinking water shared by all animals in a cage is not.
- The experimental units should not influence each other, especially on the measured outcome variables. This criterion is often impossible to verify – how do you prove that the aggressive behaviour of one rat in a cage is not influencing the behaviour of the other rats?
It follows that cells in a well or neurons in a brain or slice culture can rarely be considered genuine replicates because the above criteria are unlikely to be met, whereas fish in a tank, rats in a cage, or pigs in a pen could be genuine replicates in some cases but not in others. If the criteria are not met, the solution is to replicate one level up in the biological or technical hierarchy. For example, if you’re interested in the effect of a drug on cells in an in vitro experiment, but cannot use cells as genuine replicates, then the number of wells can be the replicates, and the measurements on cells within a well can be averaged so that the number of data points corresponds to the number of wells, that is, the sample size (hierarchical or multi-level models can also be used and don’t require values to be averaged because they take the structure of the data into account, but they are harder to implement and interpret compared with averaging followed by simpler statistical methods). Similarly, if rats in a cage cannot be considered genuine replicates, then calculating a cage-averaged value and using cages as genuine replicates is an appropriate solution (or a multi-level model).
If genuine replication is too low, the experiment may be unable to answer any scientific questions of interest. Therefore issues about replication must be resolved when designing an experiment, not after the data have been collected. For example, if cages are the genuine replicates and not the rats, then putting fewer rats in a cage and having more cages will increase power; and power is maximised with one rat per cage, but this may be undesirable for other reasons.
Confusing pseudoreplication for genuine replication reduces our ability to learn from experiments, understand nature, and develop treatments for diseases. It is also easily fixed. The requirements for genuine replication, like the definition of a p-value, is often misunderstood by researchers, despite many papers on the topic. An open-access overview is provided in reference [1], and reference [3] has a detailed discussion along with analysis options for many experimental designs.
[1] Lazic SE, Clarke-Williams CJ, Munafo MR (2018). What exactly is “N” in cell culture and animal experiments? PLoS Biol 6(4):e2005282. https://doi.org/10.1371/journal.pbio.2005282
[2] Lazic SE (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience 11:5. https://doi.org/10.1186/1471-2202-11-5
[3] Lazic SE (2016). Experimental Design for Laboratory Biologists: Maximising Information and Improving Reproducibility. Cambridge University Press, Cambridge, UK. https://www.cambridge.org/Lazic
Stanley E. Lazic is Co-founder and Chief Scientific Officer at Prioris.ai Inc.
Prioris.ai, Suite 459, 207 Bank Street, Ottawa ON, K2P 2N2, Canada.
Analysis and discussion of research | Updates on the latest issues | Open debate
All BMJ blog posts are published under a CC-BY-NC licence
BMJ Journals
Terms and Conditions Cookie Settings
Reproducibility and Replicability in Science (2019)
Chapter: 5 replicability, 5 replicability.
Replication is one of the key ways scientists build confidence in the scientific merit of results. When the result from one study is found to be consistent by another study, it is more likely to represent a reliable claim to new knowledge. As Popper (2005 , p. 23) wrote (using “reproducibility” in its generic sense):
We do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them. Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence,’ but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.
However, a successful replication does not guarantee that the original scientific results of a study were correct, nor does a single failed replication conclusively refute the original claims. A failure to replicate previous results can be due to any number of factors, including the discovery of an unknown effect, inherent variability in the system, inability to control complex variables, substandard research practices, and, quite simply, chance. The nature of the problem under study and the prior likelihoods of possible results in the study, the type of measurement instruments and research design selected, and the novelty of the area of study and therefore lack of established methods of inquiry can also contribute to non-replicability. Because of the complicated relationship between replicability and its variety of sources, the validity of scientific results should be considered in the context of an entire body of evidence, rather than an individual study or an individual replication. Moreover, replication may be a matter of degree, rather than a binary result of “success” or “failure.” 1 We explain in Chapter 7 how research synthesis, especially meta-analysis, can be used to evaluate the evidence on a given question.
ASSESSING REPLICABILITY
How does one determine the extent to which a replication attempt has been successful? When researchers investigate the same scientific question using the same methods and similar tools, the results are not likely to be identical—unlike in computational reproducibility in which bitwise agreement between two results can be expected (see Chapter 4 ). We repeat our definition of replicability, with emphasis added: obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
Determining consistency between two different results or inferences can be approached in a number of ways ( Simonsohn, 2015 ; Verhagen and Wagenmakers, 2014 ). Even if one considers only quantitative criteria for determining whether two results qualify as consistent, there is variability across disciplines ( Zwaan et al., 2018 ; Plant and Hanisch, 2018 ). The Royal Netherlands Academy of Arts and Sciences (2018 , p. 20) concluded that “it is impossible to identify a single, universal approach to determining [replicability].” As noted in Chapter 2 , different scientific disciplines are distinguished in part by the types of tools, methods, and techniques used to answer questions specific to the discipline, and these differences include how replicability is assessed.
___________________
1 See, for example, the cancer biology project in Table 5-1 in this chapter.
Acknowledging the different approaches to assessing replicability across scientific disciplines, however, we emphasize eight core characteristics and principles:
- Attempts at replication of previous results are conducted following the methods and using similar equipment and analyses as described in the original study or under sufficiently similar conditions ( Cova et al., 2018 ). 2 Yet regardless of how similar the replication study is, no second event can exactly repeat a previous event.
- The concept of replication between two results is inseparable from uncertainty, as is also the case for reproducibility (as discussed in Chapter 4 ).
- Any determination of replication (between two results) needs to take account of both proximity (i.e., the closeness of one result to the other, such as the closeness of the mean values) and uncertainty (i.e., variability in the measures of the results).
- To assess replicability, one must first specify exactly what attribute of a previous result is of interest. For example, is only the direction of a possible effect of interest? Is the magnitude of effect of interest? Is surpassing a specified threshold of magnitude of interest? With the attribute of interest specified, one can then ask whether two results fall within or outside the bounds of “proximity-uncertainty” that would qualify as replicated results.
- Depending on the selected criteria (e.g., measure, attribute), assessments of a set of attempted replications could appear quite divergent. 3
- A judgment that “Result A is replicated by Result B” must be identical to the judgment that “Result B is replicated by Result A.” There must be a symmetry in the judgment of replication; otherwise, internal contradictions are inevitable.
- There could be advantages to inverting the question from, “Does Result A replicate Result B (given their proximity and uncertainty)?”
2 Cova et al. (2018, fn. 3) discuss the challenge of defining sufficiently similar as well as the interpretation of the results:
In practice, it can be hard to determine whether the ‘sufficiently similar’ criterion has actually been fulfilled by the replication attempt, whether in its methods or in its results ( Nakagawa and Parker 2015 ). It can therefore be challenging to interpret the results of replication studies, no matter which way these results turn out ( Collins, 1975 ; Earp and Trafimow, 2015 ; Maxwell et al., 2015 ).
3 See Table 5-1 , for an example of this in the reviews of a psychology replication study by Open Science Collaboration (2015) and Patil et al. (2016) .
to “Are Results A and B sufficiently divergent (given their proximity and uncertainty) so as to qualify as a non-replication ?” It may be advantageous, in assessing degrees of replicability, to define a relatively high threshold of similarity that qualifies as “replication,” a relatively low threshold of similarity that qualifies as “non-replication,” and the intermediate zone between the two thresholds that is considered “indeterminate.” If a second study has low power and wide uncertainties, it may be unable to produce any but indeterminate results.
- While a number of different standards for replicability/non-replicability may be justifiable, depending on the attributes of interest, a standard of “repeated statistical significance” has many limitations because the level of statistical significance is an arbitrary threshold ( Amrhein et al., 2019a ; Boos and Stefanski, 2011 ; Goodman, 1992 ; Lazzeroni et al., 2016 ). For example, one study may yield a p -value of 0.049 (declared significant at the p ≤ 0.05 level) and a second study yields a p -value of 0.051 (declared nonsignificant by the same p -value threshold) and therefore the studies are said not to be replicated. However, if the second study had yielded a p -value of 0.03, the reviewer would say it had successfully replicated the first study, even though the result could diverge more sharply (by proximity and uncertainty) from the original study than in the first comparison. Rather than focus on an arbitrary threshold such as statistical significance, it would be more revealing to consider the distributions of observations and to examine how similar these distributions are. This examination would include summary measures, such as proportions, means, standard deviations (or uncertainties), and additional metrics tailored to the subject matter.
The final point above is reinforced by a recent special edition of the American Statistician in which the use of a statistical significance threshold in reporting is strongly discouraged due to overuse and wide misinterpretation ( Wasserstein et al., 2019 ). A figure from ( Amrhein et al., 2019b ) also demonstrates this point, as shown in Figure 5-1 .
One concern voiced by some researchers about using a proximity-uncertainty attribute to assess replicability is that such an assessment favors studies with large uncertainties; the potential consequence is that many researchers would choose to perform low-power studies to increase the replicability chances ( Cova et al., 2018 ). While two results with large uncertainties and within proximity, such that the uncertainties overlap with each other, may be consistent with replication, the large uncertainties indicate that not much confidence can be placed in that conclusion.
CONCLUSION 5-1: Different types of scientific studies lead to different or multiple criteria for determining a successful replication. The choice of criteria can affect the apparent rate of non-replication, and that choice calls for judgment and explanation.
CONCLUSION 5-2: A number of parametric and nonparametric methods may be suitable for assessing replication across studies. However, a restrictive and unreliable approach would accept replication only when the results in both studies have attained “statistical significance,” that is, when the p -values in both studies have exceeded a selected threshold. Rather, in determining replication, it is important to consider the distributions of observations and to examine how similar these distributions are. This examination would include summary measures, such as proportions, means, standard deviations (uncertainties), and additional metrics tailored to the subject matter.
THE EXTENT OF NON-REPLICABILITY
The committee was asked to assess what is known and, if necessary, identify areas that may need more information to ascertain the extent
of non-replicability in scientific and engineering research. The committee examined current efforts to assess the extent of non-replicability within several fields, reviewed literature on the topic, and heard from expert panels during its public meetings. We also drew on the previous work of committee members and other experts in the field of replicability of research.
Some efforts to assess the extent of non-replicability in scientific research directly measure rates of replication, while others examine indirect measures to infer the extent of non-replication. Approaches to assessing non-replicability rates include
- direct and indirect assessments of replicability;
- perspectives of researchers who have studied replicability;
- surveys of researchers; and
- retraction trends.
This section discusses each of these lines of evidence.
Assessments of Replicability
The most direct method to assess replicability is to perform a study following the original methods of a previous study and to compare the new results to the original ones. Some high-profile replication efforts in recent years include studies by Amgen, which showed low replication rates in biomedical research ( Begley and Ellis, 2012 ), and work by the Center for Open Science on psychology ( Open Science Collaboration, 2015 ), cancer research ( Nosek and Errington, 2017 ), and social science ( Camerer et al., 2018 ). In these examples, a set of studies was selected and a single replication attempt was made to confirm results of each previous study, or one-to-one comparisons were made. In other replication studies, teams of researchers performed multiple replication attempts on a single original result, or many-to-one comparisons (see e.g., Klein et al., 2014 ; Hagger et al., 2016 ; and Cova et al., 2018 in Table 5-1 ).
Other measures of replicability include assessments that can provide indicators of bias, errors, and outliers, including, for example, computational data checks of reported numbers and comparison of reported values against a database of previously reported values. Such assessments can identify data that are outliers to previous measurements and may signal the need for additional investigation to understand the discrepancy. 4 Table 5-1 summarizes the direct and indirect replication studies assembled by the committee. Other sources of non-replicabilty are discussed later in this chapter in the Sources of Non-Replicability section.
4 There is risk of missing a new discovery by rejecting data outliers without further investigation.
Many direct replication studies are not reported as such. Replication—especially of surprising results or those that could have a major impact—occurs in science often without being labelled as a replication. Many scientific fields conduct reviews of articles on a specific topic—especially on new topics or topics likely to have a major impact—to assess the available data and determine which measurements and results are rigorous (see Chapter 7 ). Therefore, replicability studies included as part of the scientific literature but not cited as such add to the difficulty in assessing the extent of replication and non-replication.
One example of this phenomenon relates to research on hydrogen storage capacity. The U.S. Department of Energy (DOE) issued a target storage capacity in the mid-1990s. One group using carbon nanotubes reported surprisingly high values that met DOE’s target ( Hynek et al., 1997 ); other researchers who attempted to replicate these results could not do so. At the same time, other researchers were also reporting high values of hydrogen capacity in other experiments. In 2003, an article reviewed previous studies of hydrogen storage values and reported new research results, which were later replicated ( Broom and Hirscher, 2016 ). None of these studies was explicitly called an attempt at replication.
Based on the content of the collected studies in Table 5-1 , one can observe that the
- majority of the studies are in the social and behavioral sciences (including economics) or in biomedical fields, and
- methods of assessing replicability are inconsistent and the replicability percentages depend strongly on the methods used.
The replication studies such as those shown in Table 5-1 are not necessarily indicative of the actual rate of non-replicability across science for a number of reasons: the studies to be replicated were not randomly chosen, the replications had methodological shortcomings, many replication studies are not reported as such, and the reported replication studies found widely varying rates of non-replication ( Gilbert et al., 2016 ). At the same time, replication studies often provide more and better-quality evidence than most original studies alone, and they highlight such methodological features as high precision or statistical power, preregistration, and multi-site collaboration ( Nosek, 2016 ). Some would argue that focusing on replication of a single study as a way to improve the efficiency of science is ill-placed. Rather, reviews of cumulative evidence on a subject, to gauge both the overall effect size and generalizability, may be more useful ( Goodman, 2018 ; and see Chapter 7 ).
Apart from specific efforts to replicate others’ studies, investigators will typically confirm their own results, as in a laboratory experiment, prior to
TABLE 5-1 Examples of Replication Studies
Field and Author(s) | Description | Results | Type of Assessment |
---|---|---|---|
Experimental Philosophy ( ) | A group of 20 research teams performed replication studies of 40 experimental philosophy studies published between 2003 and 2015 | 70% of the 40 studies were replicated by comparing the original effect size to the confidence interval (CI) of the replication. | Direct |
Behavioral Science, Personality Traits Linked to Life Outcomes ( ) | Performed replications of 78 previously published associations between the Big Five personality traits and consequential life outcomes | 87% of the replication attempts were statistically significant in the expected direction, and effects were typically 77% as strong as the corresponding original effects. | Direct |
Behavioral Science, Ego-Depletion Effect ( ) | Multiple laboratories (23 in total) conducted replications of a standardized ego-depletion protocol based on a sequential-task paradigm by | Meta-analysis of the studies revealed that the size of the ego-depletion effect was small with 95% CI that encompassed zero (d = 0.04, 95% CI [−0.07, 0.15]). | |
General Biology, Preclinical Animal Studies ( ) | Attempt by researchers from Bayer HealthCare to validate data on potential drug targets obtained in 67 projects by copying models exactly or by adapting them to internal needs | Published data were completely in line with the results of the validation studies in 20%-25% of cases. | Direct |
Oncology, Preclinical Studies ( ) | Attempt by Amgen team to reproduce the results of 53 “landmark” studies | Scientific results were confirmed in 11% of the studies. | Direct |
Genetics, Preclinical Studies ( ) | Replication of data analyses provided in 18 articles on microarray-based gene expression studies | Of the 18 studies, 2 analyses (11%) were replicated; 6 were partially replicated or showed some discrepancies in results; and 10 could not be replicated. | Direct |
Experimental Psychology ( ) | Replication of 13 psychological phenomena across 36 independent samples | 77% of phenomena were replicated consistently. | Direct |
Field and Author(s) | Description | Results | Type of Assessment |
---|---|---|---|
Experimental Psychology, Many Labs 2 ( ) | Replication of 28 classic and contemporary published studies | 54% of replications produced a statistically significant effect in the same direction as the original study, 75% yielded effect sizes smaller than the original ones, and 25% yielded larger effect sizes than the original ones. | Direct |
Experimental Psychology ( ) | Attempt to independently replicate selected results from 100 studies in psychology | 36% of the replication studies produced significant results, compared to 97% of the original studies. The mean effect sizes were halved. | Direct |
Experimental Psychology ( ) | Using reported data from the replication study in psychology, reanalyzed the results | 77% of the studies replicated by comparing the original effect size to an estimated 95% CI of the replication. | Direct |
Experimental Psychology ( ) | Attempt to replicate 21 systematically selected experimental studies in the social sciences published in and in 2010-2015 | Found a significant effect in the same direction as the original study for 62% (13 of 21) studies, and the effect size of the replications was on average about 50% of the original effect size. | Direct |
Empirical Economics ( ) | 2-year study that collected programs and data from authors and attempted to replicate their published results on empirical economic research | Two of nine replications were successful, three “near” successful, and four unsuccessful; findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence. | Direct |
Economics ( ) | Progress report on the number of journals with data sharing requirements and an assessment of 167 studies | 10 journals explicitly note they publish replications; of 167 published replication studies, approximately 66% were unable to confirm the original results; 12% disconfirmed at least one major result of the original study, while confirming others. | N/A |
Field and Author(s) | Description | Results | Type of Assessment |
---|---|---|---|
Economics ( ) | An effort to replicate 18 studies published in the and the from 2011-2014 | Significant effect in the same direction as the original study found for 11 replications (61%); on average, the replicated effect size was 66% of the original. | Direct |
Chemistry ( ; ) | Collaboration with National Institute of Standards and Technology (NIST) to check new data against NIST database, 13,000 measurements | 27% of papers reporting properties of adsorption had data that were outliers; 20% of papers reporting carbon dioxide isotherms as outliers. | Indirect |
Chemistry ( ) | Collaboration with NIST, Thermodynamics Research Center (TRC) databases, prepublication check of solubility, viscosity, critical temperature, and vapor pressure | 33% experiments had data problems, such as uncertainties too small, reported values outside of TRC database distributions. | Indirect |
Biology Reproducibility Project: Cancer Biology | Large-scale replication project to replicate key results in 29 cancer papers published in , , and other high-impact journals | The first five articles have been published; two replicated important parts of the original papers, one did not replicate, and two were uninterpretable. | Direct |
Psychology, Statistical Checks ( ) | Statcheck tool used to test statistical values within psychology articles from 1985-2013 | 49.6% of the articles with null hypothesis statistical test (NHST) results contained at least one inconsistency (8,273 of the 16,695 articles), and 12.9% (2,150) of the articles with NHST results contained at least one gross inconsistency. | Indirect |
Engineering, Computational Fluid Dynamics ( ) | Full replication studies of previously published results on bluff-body aerodynamics, using four different computational methods | Replication of the main result was achieved in three out of four of the computational efforts. | Direct |
Field and Author(s) | Description | Results | Type of Assessment |
---|---|---|---|
Psychology, Many Labs 3 ( ) | Attempt to replicate 10 psychology studies in one online session | 3 of 10 studies replicated at < 0.05. | Direct |
Psychology ( ) | Argued that one of the failed replications in Ebersole et al. was due to changes in the procedure. They randomly assigned participants to a version closer to the original or to Ebersole et al.’s version. | The original study replicated when the original procedures were followed more closely, but not when the Ebersole et al. procedures were used. | Direct |
Psychology ( ) | 17 different labs attempted to replicate one study on facial feedback by . | None of the studies replicated the result at < 0.05. | Direct |
Psychology ( ) | Pointed out that all of the studies in the replication project changed the procedure by videotaping participants. Conducted a replication in which participants were randomly assigned to be videotaped or not. | The original study was replicated when the original procedure was followed ( = 0.01); the original study was not replicated when the video camera was present ( = 0.85). | Direct |
Psychology ( ) | 31 labs attempted to replicate a study by Schooler and Engstler-Schooler (1990). | Replicated the original study. The effect size was much larger when the original study was replicated more faithfully (the first set of replications inadvertently introduced a change in the procedure). | Direct |
NOTES: Some of the studies in this table also appear in Table 4-1 as they evaluated both reproducibility and replicability. N/A = not applicable.
a From Cova et al. (2018 , p. 14): “For studies reporting statistically significant results, we treated as successful replications for which the replication 95 percent CI [confidence interval] was not lower than the original effect size. For studies reporting null results, we treated as successful replications for which original effect sizes fell inside the bounds of the 95 percent CI.”
b From Soto (2019 , p. 7, fn. 1): “Previous large-scale replication projects have typically treated the individual study as the primary unit of analysis. Because personality-outcome studies often examine multiple trait-outcome associations, we selected the individual association as the most appropriate unit of analysis for estimating replicability in this literature.”
publication. More generally, independent investigators may replicate prior results of others before conducting, or in the course of conducting, a study to extend the original work. These types of replications are not usually published as separate replication studies.
Perspectives of Researchers Who Have Studied Replicability
Several experts who have studied replicability within and across fields of science and engineering provided their perspectives to the committee. Brian Nosek, cofounder and director of the Center for Open Science, said there was “not enough information to provide an estimate with any certainty across fields and even within individual fields.” In a recent paper discussing scientific progress and problems, Richard Shiffrin, professor of psychology and brain sciences at Indiana University, and colleagues argued that there are “no feasible methods to produce a quantitative metric, either across science or within the field” to measure the progress of science ( Shiffrin et al., 2018 , p. 2632). Skip Lupia, now serving as head of the Directorate for Social, Behavioral, and Economic Sciences at the National Science Foundation, said that there is not sufficient information to be able to definitively answer the extent of non-reproducibility and non-replicability, but there is evidence of p- hacking and publication bias (see below), which are problems. Steven Goodman, the codirector of the Meta-Research Innovation Center at Stanford University (METRICS), suggested that the focus ought not be on the rate of non-replication of individual studies, but rather on cumulative evidence provided by all studies and convergence to the truth. He suggested the proper question is “How efficient is the scientific enterprise in generating reliable knowledge, what affects that reliability, and how can we improve it?”
Surveys of scientists about issues of replicability or on scientific methods are indirect measures of non-replicability. For example, Nature published the results of a survey in 2016 in an article titled “1,500 Scientists Lift the Lid on Reproducibility ( Baker, 2016 )” 5 ; this article reported that a large percentage of researchers who responded to an online survey believe that replicability is a problem. This article has been widely cited by researchers studying subjects ranging from cardiovascular disease to crystal structures ( Warner et al., 2018 ; Ziletti et al., 2018 ). Surveys and studies have also assessed the prevalence of specific problematic research practices, such as a 2018 survey about questionable research practices in ecology and evolution
5 Nature uses the word “reproducibility” to refer to what we call “replicability.”
( Fraser et al., 2018 ). However, many of these surveys rely on poorly defined sampling frames to identify populations of scientists and do not use probability sampling techniques. The fact that nonprobability samples “rely mostly on people . . . whose selection probabilities are unknown [makes it] difficult to estimate how representative they are of the [target] population” ( Dillman, Smyth, and Christian, 2014 , pp. 70, 92). In fact, we know that people with a particular interest in or concern about a topic, such as replicability and reproducibility, are more likely to respond to surveys on the topic ( Brehm, 1993 ). As a result, we caution against using surveys based on nonprobability samples as the basis of any conclusion about the extent of non-replicability in science.
High-quality researcher surveys are expensive and pose significant challenges, including constructing exhaustive sampling frames, reaching adequate response rates, and minimizing other nonresponse biases that might differentially affect respondents at different career stages or in different professional environments or fields of study ( Corley et al., 2011 ; Peters et al., 2008 ; Scheufele et al., 2009 ). As a result, the attempts to date to gather input on topics related to replicability and reproducibility from larger numbers of scientists ( Baker, 2016 ; Boulbes et al., 2018 ) have relied on convenience samples and other methodological choices that limit the conclusions that can be made about attitudes among the larger scientific community or even for specific subfields based on the data from such surveys. More methodologically sound surveys following guidelines on adoption of open science practices and other replicability-related issues are beginning to emerge. 6 See Appendix E for a discussion of conducting reliable surveys of scientists.
Retraction Trends
Retractions of published articles may be related to their non-replicability. As noted in a recent study on retraction trends ( Brainard, 2018 , p. 392), “Overall, nearly 40% of retraction notices did not mention fraud or other kinds of misconduct. Instead, the papers were retracted because of errors, problems with reproducibility [or replicability], and other issues.” Overall, about one-half of all retractions appear to involve fabrication, falsification, or plagiarism. Journal article retractions in biomedicine increased from 50-60 per year in the mid-2000s, to 600-700 per year by the mid-2010s ( National Library of Medicine, 2018 ), and this increase attracted much commentary and analysis (see, e.g., Grieneisen and Zhang, 2012 ). A recent comprehensive review of an extensive database of 18,000 retracted papers
6 See https://cega.berkeley.edu/resource/the-state-of-social-science-betsy-levy-paluck-bitssannual-meeting-2018 .
dating back to the 1970s found that while the number of retractions has grown, the rate of increase has slowed; approximately 4 of every 10,000 papers are now retracted ( Brainard, 2018 ). Overall, the number of journals that report retractions has grown from 44 journals in 1997 to 488 journals in 2016; however, the average number of retractions per journal has remained essentially flat since 1997.
These data suggest that more journals are attending to the problem of articles that need to be retracted rather than a growing problem in any one discipline of science. Fewer than 2 percent of authors in the database account for more than one-quarter of the retracted articles, and the retractions of these frequent offenders are usually based on fraud rather than errors that lead to non-replicability. The Institute of Electrical and Electronics Engineers alone has retracted more than 7,000 abstracts from conferences that took place between 2009 and 2011, most of which had authors based in China ( McCook, 2018 ).
The body of evidence on the extent of non-replicabilty gathered by the committee is not a comprehensive assessment across all fields of science nor even within any given field of study. Such a comprehensive effort would be daunting due to the vast amount of research published each year and the diversity of scientific and engineering fields. Among studies of replication that are available, there is no uniform approach across scientific fields to gauge replication between two studies. The experts who contributed their perspectives to the committee all question the feasibility of such a science-wide assessment of non-replicability.
While the evidence base assessed by the committee may not be sufficient to permit a firm quantitative answer on the scope of non-replicability, it does support several findings and a conclusion.
FINDING 5-1: There is an uneven level of awareness of issues related to replicability across fields and even within fields of science and engineering.
FINDING 5-2: Efforts to replicate studies aimed at discerning the effect of an intervention in a study population may find a similar direction of effect, but a different (often smaller) size of effect.
FINDING 5-3: Studies that directly measure replicability take substantial time and resources.
FINDING 5-4: Comparing results across replication studies may be compromised because different replication studies may test different study attributes and rely on different standards and measures for a successful replication.
FINDING 5-5: Replication studies in the natural and clinical sciences (general biology, genetics, oncology, chemistry) and social sciences (including economics and psychology) report frequencies of replication ranging from fewer than one out of five studies to more than three out of four studies.
CONCLUSION 5-3: Because many scientists routinely conduct replication tests as part of a follow-on work and do not report replication results separately, the evidence base of non-replicability across all science and engineering research is incomplete.
SOURCES OF NON-REPLICABILITY
Non-replicability can arise from a number of sources. In some cases, non-replicability arises from the inherent characteristics of the systems under study. In others, decisions made by a researcher or researchers in study execution that reasonably differ from the original study such as judgment calls on data cleaning or selection of parameter values within a model may also result in non-replication. Other sources of non-replicability arise from conscious or unconscious bias in reporting, mistakes and errors (including misuse of statistical methods), and problems in study design, execution, or interpretation in either the original study or the replication attempt. In many instances, non-replication between two results could be due to a combination of multiple sources, but it is not generally possible to identify the source without careful examination of the two studies. Below, we review these sources of non-replicability and discuss how researchers’ choices can affect each. Unless otherwise noted, the discussion below focuses on the non-replicability between two results (i.e., a one-to-one comparison) when assessed using proximity and uncertainty of both results.
Non-Replicability That Is Potentially Helpful to Science
Non-replicability is a normal part of the scientific process and can be due to the intrinsic variation and complexity of nature, the scope of current scientific knowledge, and the limits of current technologies. Highly surprising and unexpected results are often not replicated by other researchers. In other instances, a second researcher or research team may purposefully make decisions that lead to differences in parts of the study. As long as these differences are reported with the final results, these may be reasonable actions to take yet result in non-replication. In scientific reporting, uncertainties within the study (such as the uncertainty within measurements, the potential interactions between parameters, and the variability of the
system under study) are estimated, assessed, characterized, and accounted for through uncertainty and probability analysis. When uncertainties are unknown and not accounted for, this can also lead to non-replicability. In these instances, non-replicability of results is a normal consequence of studying complex systems with imperfect knowledge and tools. When non-replication of results due to sources such as those listed above are investigated and resolved, it can lead to new insights, better uncertainty characterization, and increased knowledge about the systems under study and the methods used to study them. See Box 5-1 for examples of how investigations of non-replication have been helpful to increasing knowledge.
The susceptibility of any line of scientific inquiry to sources of non-replicability depends on many factors, including factors inherent to the system under study, such as the
- complexity of the system under study;
- understanding of the number and relations among variables within the system under study;
- ability to control the variables;
- levels of noise within the system (or signal to noise ratios);
- mismatch of scale of the phenomena and the scale at which it can be measured;
- stability across time and space of the underlying principles;
- fidelity of the available measures to the underlying system under study (e.g., direct or indirect measurements); and
- prior probability (pre-experimental plausibility) of the scientific hypothesis.
Studies that pursue lines of inquiry that are able to better estimate and analyze the uncertainties associated with the variables in the system and control the methods that will be used to conduct the experiment are more replicable. On the other end of the spectrum, studies that are more prone to non-replication often involve indirect measurement of very complex systems (e.g., human behavior) and require statistical analysis to draw conclusions. To illustrate how these characteristics can lead to results that are more or less likely to replicate, consider the attributes of complexity and controllability. The complexity and controllability of a system contribute to the underlying variance of the distribution of expected results and thus the likelihood of non-replication. 7
7 Complexity and controllability in an experimental system affect its susceptibility to non-replicability independently from the way prior odds, power, or p- values associated with hypothesis testing affect the likelihood that an experimental result represents the true state of the world.
The systems that scientists study vary in their complexity. Although all systems have some degree of intrinsic or random variability, some systems are less well understood, and their intrinsic variability is more difficult to assess or estimate. Complex systems tend to have numerous interacting components (e.g., cell biology, disease outbreaks, friction coefficient between two unknown surfaces, urban environments, complex organizations and populations, and human health). Interrelations and interactions among multiple components cannot always be predicted and neither can the resulting effects on the experimental outcomes, so an initial estimate of uncertainty may be an educated guess.
Systems under study also vary in their controllability. If the variables within a system can be known, characterized, and controlled, research on such a system tends to produce more replicable results. For example, in social sciences, a person’s response to a stimulus (e.g., a person’s behavior when placed in a specific situation) depends on a large number of variables—including social context, biological and psychological traits, verbal and nonverbal cues from researchers—all of which are difficult or impossible to control completely. In contrast, a physical object’s response to a physical stimulus (e.g., a liquid’s response to a rise in temperature) depends almost entirely on variables that can either be controlled or adjusted for, such as temperature, air pressure, and elevation. Because of these differences, one expects that studies that are conducted in the relatively more controllable systems will replicate with greater frequency than those that are in less controllable systems. Scientists seek to control the variables relevant to the system under study and the nature of the inquiry, but when these variables are more difficult to control, the likelihood of non-replicability will be higher. Figure 5-2 illustrates the combinations of complexity and controllability.
Many scientific fields have studies that span these quadrants, as demonstrated by the following examples from engineering, physics, and psychology. Veronique Kiermer, PLOS executive editor, in her briefing to the committee noted: “There is a clear correlation between the complexity of the design, the complexity of measurement tools, and the signal to noise ratio that we are trying to measure.” (See also Goodman et al., 2016 , on the complexity of statistical and inferential methods.)
Engineering . Aluminum-lithium alloys were developed by engineers because of their strength-to-weight ratio, primarily for use in aerospace engineering. The process of developing these alloys spans the four quadrants. Early generation of binary alloys was a simple system that showed high replicability (Quadrant A). Second-generation alloys had higher amounts of lithium and resulted in lower replicability that appeared as failures in manufacturing operations because the interactions of the elements were not understood (Quadrant C). The third-generation alloys contained less
lithium and higher relative amounts of other alloying elements, which made it a more complex system but better controlled (Quadrant B), with improved replicability. The development of any alloy is subject to a highly controlled environment. Unknown aspects of the system, such as interactions among the components, cannot be controlled initially and can lead to failures. Once these are understood, conditions can be modified (e.g., heat treatment) to bring about higher replicability.
Physics. In physics, measurements of the electronic band gap of semiconducting and conducting materials using scanning tunneling microscopy is a highly controlled, simple system (Quadrant A). The searches for the Higgs boson and gravitational waves were separate efforts, and each required the development of large, complex experimental apparatus and careful characterization of the measurement and data analysis systems (Quadrant B). Some systems, such as radiation portal monitors, require setting thresholds for alarms without knowledge of when or if a threat will ever pass through them; the variety of potential signatures is high and there is little controllability of the system during operation (Quadrant C). Finally, a simple system with little controllability is that of precisely predicting the path of a feather dropped from a given height (Quadrant D).
Psychology. In psychology, Quadrant A includes studies of basic sensory and perceptual processes that are common to all human beings, such
as the purkinje shift (i.e., a change in sensitivity of the human eye under different levels of illumination). Quadrant D includes studies of complex social behaviors that are influenced by culture and context; for example, a study of the effects of a father’s absence on children’s ability to delay gratification revealed stronger effects among younger children ( Mischel, 1961 ).
Inherent sources of non-replicability arise in every field of science, but they can vary widely depending on the specific system undergoing study. When the sources are knowable, or arise from experimental design choices, researchers need to identify and assess these sources of uncertainty insofar as they can be estimated. Researchers need also to report on steps that were intended to reduce uncertainties inherent in the study or differ from the original study (i.e., data cleaning decisions that resulted in a different final dataset). The committee agrees with those who argue that the testing of assumptions and the characterization of the components of a study are as important to report as are the ultimate results of the study ( Plant and Hanisch, 2018 ) including studies using statistical inference and reporting p -values ( Boos and Stefanski, 2011 ). Every scientific inquiry encounters an irreducible level of uncertainty, whether this is due to random processes in the system under study, limits to researchers understanding or ability to control that system, or limitations of the ability to measure. If researchers do not adequately consider and report these uncertainties and limitations, this can contribute to non-replicability.
RECOMMENDATION 5-1: Researchers should, as applicable to the specific study, provide an accurate and appropriate characterization of relevant uncertainties when they report or publish their research. Researchers should thoughtfully communicate all recognized uncertainties and estimate or acknowledge other potential sources of uncertainty that bear on their results, including stochastic uncertainties and uncertainties in measurement, computation, knowledge, modeling, and methods of analysis.
Unhelpful Sources of Non-Replicability
Non-replicability can also be the result of human error or poor researcher choices. Shortcomings in the design, conduct, and communication of a study may all contribute to non-replicability.
These defects may arise at any point along the process of conducting research, from design and conduct to analysis and reporting, and errors may be made because the researcher was ignorant of best practices, was sloppy in carrying out research, made a simple error, or had unconscious bias toward a specific outcome. Whether arising from lack of knowledge, perverse incentives, sloppiness, or bias, these sources of non-replicability
warrant continued attention because they reduce the efficiency with which science progresses and time spent resolving non-replicablity issues that are caused by these sources do not add to scientific understanding. That is, they are unhelpful in making scientific progress. We consider here a selected set of such avoidable sources of non-replication:
- publication bias
- misaligned incentives
- inappropriate statistical inference
- poor study design
- incomplete reporting of a study
We will discuss each source in turn.
Publication Bias
Both researchers and journals want to publish new, innovative, ground-breaking research. The publication preference for statistically significant, positive results produces a biased literature through the exclusion of statistically nonsignificant results (i.e., those that do not show an effect that is sufficiently unlikely if the null hypothesis is true). As noted in Chapter 2 , there is great pressure to publish in high-impact journals and for researchers to make new discoveries. Furthermore, it may be difficult for researchers to publish even robust nonsignificant results, except in circumstances where the results contradict what has come to be an accepted positive effect. Replication studies and studies with valuable data but inconclusive results may be similarly difficult to publish. This publication bias results in a published literature that does not reflect the full range of evidence about a research topic.
One powerful example is a set of clinical studies performed on the effectiveness of tamoxifen, a drug used to treat breast cancer. In a systematic review (see Chapter 7 ) of the drug’s effectiveness, 23 clinical trials were reviewed; the statistical significance of 22 of the 23 studies did not reach the criterion of p < 0.05, yet the cumulative review of the set of studies showed a large effect (a reduction of 16% [±3] in the odds of death among women of all ages assigned to tamoxifen treatment [ Peto et al., 1988 , p. 1684]).
Another approach to quantifying the extent of non-replicability is to model the false discovery rate—that is, the number of research results that are expected to be “false.” Ioannidis (2005) developed a simulation model to do so for studies that rely on statistical hypothesis testing, incorporating the pre-study (i.e., prior) odds, the statistical tests of significance, investigator bias, and other factors. Ioannidis concluded, and used as the title of his paper,
that “most published research findings are false.” Some researchers have criticized Ioannidis’s assumptions and mathematical argument ( Goodman and Greenland, 2007 ); others have pointed out that the takeaway message is that any initial results that are statistically significant need further confirmation and validation.
Analyzing the distribution of published results for a particular line of inquiry can offer insights into potential bias, which can relate to the rate of non-replicability. Several tools are being developed to compare a distribution of results to what that distribution would look like if all claimed effects were representative of the true distribution of effects. Figure 5-3 shows how publication bias can result in a skewed view of the body of evidence when only positive results that meet the statistical significance threshold are reported. When a new study fails to replicate the previously published results—for example, if a study finds no relationship between variables when such a relationship had been shown in previously published studies—it appears to be a case of non-replication. However, if the published literature is not an accurate reflection of the state of the evidence because only positive results are regularly published, the new study could actually have replicated previous but unpublished negative results. 8
Several techniques are available to detect and potentially adjust for publication bias, all of which are based on the examination of a body of research as a whole (i.e., cumulative evidence), rather than individual replication studies (i.e., one-on-one comparison between studies). These techniques cannot determine which of the individual studies are affected by bias (i.e., which results are false positives) or identify the particular type of bias, but they arguably allow one to identify bodies of literature that are likely to be more or less accurate representations of the evidence. The techniques, discussed below, are funnel plots, a p -curve test of excess significance, and assessing unpublished literature.
Funnel Plots. One of the most common approaches to detecting publication bias involves constructing a funnel plot that displays each effect size against its precision (e.g., sample size of study). Asymmetry in the plotted values can reveal the absence of studies with small effect sizes, especially in studies with small sample sizes—a pattern that could suggest publication/selection bias for statistically significant effects (see Figure 5-3 ). There are criticisms of funnel plots, however; some argue that the shape of a funnel plot is largely determined by the choice of method ( Tang and Liu, 2000 ),
8 Earlier in this chapter, we discuss an indirect method for assessing non-replicability in which a result is compared to previously published values; results that do not agreed with the published literature are identified as outliers. If the published literature is biased, this method would inappropriately reject valid results. This is another reason for investigating outliers before rejecting them.
and others maintain that funnel plot asymmetry may not accurately reflect publication bias ( Lau et al., 2006 ).
P -Curve. One fairly new approach is to compare the distribution of results (e.g., p- values) to the expected distributions (see Simonsohn et al., 2014a , 2014b ). P- curve analysis tests whether the distribution of statistically significant p- values shows a pronounced right-skew, 9 as would be expected when the results are true effects (i.e., the null hypothesis is false), or whether the distribution is not as right-skewed (or is even flat, or, in the most extreme cases, left-skewed), as would be expected when the original results do not reflect the proportion of real effects ( Gadbury and Allison, 2012 ; Nelson et al., 2018 ; Simonsohn et al., 2014a ).
Test of Excess Significance. A closely related statistical idea for checking publication bias is the test of excess significance. This test evaluates whether the number of statistically significant results in a set of studies is improbably high given the size of the effect and the power to test it in the set of studies ( Ioannidis and Trikalinos, 2007 ), which would imply that the set of results is biased and may include exaggerated results or false positives. When there is a true effect, one expects the proportion of statistically significant results to be equal to the statistical power of the studies. If a researcher designs her studies to have 80 percent power against a given effect, then, at most, 80 percent of her studies would produce statistically significant results if the effect is at least that large (fewer if the null hypothesis is sometimes true). Schimmack (2012) has demonstrated that the proportion of statistically significant results across a set of psychology studies often far exceeds the estimated statistical power of those studies; this pattern of results that is “too good to be true” suggests that results were either not obtained following the rules of statistical inference (i.e., conducting a single statistical test that was chosen a priori ) or did not report all studies attempted (i.e., there is a “file drawer” of statistically nonsignificant studies that do not get published; or possibly the results were p -hacked or cherry picked (see Chapter 2 ).
In many fields, the proportion of published papers that report a positive (i.e., statistically significant) result is around 90 percent ( Fanelli, 2012 ). This raises concerns when combined with the observation that most studies have far less than 90 percent statistical power (i.e., would only successfully detect an effect, assuming an effect exists, far less than 90% of the time) ( Button et al., 2013 ; Fraley and Vazire, 2014 ; Szucs and Ioannidis, 2017 ; Yarkoni, 2009 ; Stanley et al., 2018 ). Some researchers believe that the
9 Distributions that have more p -values of low value than high are referred to as “right-skewed.” Similarly, “left-skewed” distributions have more p -values of high than low value.
publication of false positives is common and that reforms are needed to reduce this. Others believe that there has been an excessive focus on Type I errors (i.e., false positives) in hypothesis testing at the possible expense of an increase in Type II errors (i.e., false negatives, or failing to confirm true hypotheses) ( Fiedler et al., 2012 ; Finkel et al., 2015 ; LeBel et al., 2017 ).
Assessing Unpublished Literature. One approach to countering publication bias is to search for and include unpublished papers and results when conducting a systematic review of the literature. Such comprehensive searches are not standard practice. For medical reviews, one estimate is that only 6 percent of reviews included unpublished work ( Hartling et al., 2017 ), although another found that 50 percent of reviews did so ( Ziai et al., 2017 ). In economics, there is a large and active group of researchers collecting and sharing “grey” literature, research results outside of peer reviewed publications ( Vilhuber, 2018 ). In psychology, an estimated 75 percent of reviews included unpublished research ( Rothstein, 2006 ). Unpublished but recorded studies (such as dissertation abstracts, conference programs, and research aggregation websites) may become easier for reviewers to access with computerized databases and with the availability of preprint servers. When a review includes unpublished studies, researchers can directly compare their results with those from the published literature, thereby estimating file-drawer effects.
Misaligned Incentives
Academic incentives—such as tenure, grant money, and status—may influence scientists to compromise on good research practices ( Freeman, 2018 ). Faculty hiring, promotion, and tenure decisions are often based in large part on the “productivity” of a researcher, such as the number of publications, number of citations, and amount of grant money received ( Edwards and Roy, 2017 ). Some have suggested that these incentives can lead researchers to ignore standards of scientific conduct, rush to publish, and overemphasize positive results ( Edwards and Roy, 2017 ). Formal models have shown how these incentives can lead to high rates of non-replicable results ( Smaldino and McElreath, 2016 ). Many of these incentives may be well intentioned, but they could have the unintended consequence of reducing the quality of the science produced, and poorer quality science is less likely to be replicable.
Although it is difficult to assess how widespread the sources of non-replicability that are unhelpful to improving science are, factors such as publication bias toward results qualifying as “statistically significant” and misaligned incentives on academic scientists create conditions that favor publication of non-replicable results and inferences.
Inappropriate Statistical Inference
Confirmatory research is research that starts with a well-defined research question and a priori hypotheses before collecting data; confirmatory research can also be called hypothesis testing research. In contrast, researchers pursuing exploratory research collect data and then examine the data for potential variables of interest and relationships among variables, forming a posteriori hypotheses; as such, exploratory research can be considered hypothesis generating research. Exploratory and confirmatory analyses are often described as two different stages of the research process. Some have distinguished between the “context of discovery” and the “context of justification” ( Reichenbach, 1938 ), while others have argued that the distinction is on a spectrum rather than categorical. Regardless of the precise line between exploratory and confirmatory research, researchers’ choices between the two affects how they and others interpret the results.
A fundamental principle of hypothesis testing is that the same data that were used to generate a hypothesis cannot be used to test that hypothesis ( de Groot, 2014 ). In confirmatory research, the details of how a statistical hypothesis test will be conducted must be decided before looking at the data on which it is to be tested. When this principle is violated, significance testing, confidence intervals, and error control are compromised. Thus, it cannot be assured that false positives are controlled at a fixed rate. In short, when exploratory research is interpreted as if it were confirmatory research, there can be no legitimate statistically significant result.
Researchers often learn from their data, and some of the most important discoveries in the annals of science have come from unexpected results that did not fit any prior theory. For example, Arno Allan Penzias and Robert Woodrow Wilson found unexpected noise in data collected in the course of their work on microwave receivers for radio astronomy observations. After attempts to explain the noise failed, the “noise” was eventually determined to be cosmic microwave background radiation, and these results helped scientists to refine and confirm theories about the “big bang.” While exploratory research generates new hypotheses, confirmatory research is equally important because it tests the hypotheses generated and can give valid answers as to whether these hypotheses have any merit. Exploratory and confirmatory research are essential parts of science, but they need to be understood and communicated as two separate types of inquiry, with two different interpretations.
A well-conducted exploratory analysis can help illuminate possible hypotheses to be examined in subsequent confirmatory analyses. Even a stark result in an exploratory analysis has to be interpreted cautiously, pending further work to test the hypothesis using a new or expanded dataset. It is often unclear from publications whether the results came from an
exploratory or a confirmatory analysis. This lack of clarity can misrepresent the reliability and broad applicability of the reported results.
In Chapter 2 , we discussed the meaning, overreliance, and frequent misunderstanding of statistical significance, including misinterpreting the meaning and overstating the utility of a particular threshold, such as p < 0.05. More generally, a number of flaws in design and reporting can reduce the reliability of a study’s results.
Misuse of statistical testing often involves post hoc analyses of data already collected, making it seem as though statistically significant results provide evidence against the null hypothesis, when in fact they may have a high probability of being false positives ( John et al., 2012 ; Munafo et al., 2017 ). A study from the late-1980s gives a striking example of how such post hoc analysis can be misleading. The International Study of Infarct Survival was a large-scale, international, randomized trial that examined the potential benefit of aspirin for patients who had had a heart attack. After data collection and analysis were complete, the publishing journal asked the researchers to do additional analysis to see if certain subgroups of patients benefited more or less from aspirin. Richard Peto, one of the researchers, refused to do so because of the risk of finding invalid but seemingly significant associations. In the end, Peto relented and performed the analysis, but with a twist: he also included a post hoc analysis that divided the patients into the twelve astrological signs, and found that Geminis and Libras did not benefit from aspirin, while Capricorns benefited the most ( Peto, 2011 ). This obviously spurious relationship illustrates the dangers of analyzing data with hypotheses and subgroups that were not prespecified.
Little information is available about the prevalence of such inappropriate statistical practices as p- hacking, cherry picking, and hypothesizing after results are known (HARKing), discussed below. While surveys of researchers raise the issue—often using convenience samples—methodological shortcomings mean that they are not necessarily a reliable source for a quantitative assessment. 10
P- hacking and Cherry Picking. P- hacking is the practice of collecting, selecting, or analyzing data until a result of statistical significance is found. Different ways to p- hack include stopping data collection once p ≤ 0.05 is reached, analyzing many different relationships and only reporting those for which p ≤ 0.05, varying the exclusion and inclusion rules for data so that p ≤ 0.05, and analyzing different subgroups in order to get p ≤ 0.05. Researchers may p- hack without knowing or without understanding the consequences ( Head et al., 2015 ). This is related to the practice of cherry picking, in which researchers may (unconsciously or deliberately) pick
10 For an example of one study of this issue, see Fraser et al. (2018) .
through their data and results and selectively report those that meet criteria such as meeting a threshold of statistical significance or supporting a positive result, rather than reporting all of the results from their research.
HARKing. Confirmatory research begins with identifying a hypothesis based on observations, exploratory analysis, or building on previous research. Data are collected and analyzed to see if they support the hypothesis. HARKing applies to confirmatory research that incorrectly bases the hypothesis on the data collected and then uses that same data as evidence to support the hypothesis. It is unknown to what extent inappropriate HARKing occurs in various disciplines, but some have attempted to quantify the consequences of HARKing. For example, a 2015 article compared hypothesized effect sizes against non-hypothesized effect sizes and found that effects were significantly larger when the relationships had been hypothesized, a finding consistent with the presence of HARKing ( Bosco et al., 2015 ).
Poor Study Design
Before conducting an experiment, a researcher must make a number of decisions about study design. These decisions—which vary depending on type of study—could include the research question, the hypotheses, the variables to be studied, avoiding potential sources of bias, and the methods for collecting, classifying, and analyzing data. Researchers’ decisions at various points along this path can contribute to non-replicability. Poor study design can include not recognizing or adjusting for known biases, not following best practices in terms of randomization, poorly designing materials and tools (ranging from physical equipment to questionnaires to biological reagents), confounding in data manipulation, using poor measures, or failing to characterize and account for known uncertainties.
In 2010, economists Carmen Reinhart and Kenneth Rogoff published an article that showed if a country’s debt exceeds 90 percent of the country’s gross domestic product, economic growth slows and declines slightly (0.1%). These results were widely publicized and used to support austerity measures around the world ( Herndon et al., 2013 ). However, in 2013, with access to Reinhart and Rogoff’s original spreadsheet of data and analysis (which the authors had saved and made available for the replication effort), researchers reanalyzing the original studies found several errors in the analysis and data selection. One error was an incomplete set of countries used in the analysis that established the relationship between debt and economic growth. When data from Australia, Austria, Belgium, Canada,
and Denmark were correctly included, and other errors were corrected, the economic growth in the countries with debt above 90 percent of gross domestic product was actually +2.2 percent, rather than –0.1. In response, Reinhart and Rogoff acknowledged the errors, calling it “sobering that such an error slipped into one of our papers despite our best efforts to be consistently careful.” Reinhart and Rogoff said that while the error led to a “notable change” in the calculation of growth in one category, they did not believe it “affects in any significant way the central message of the paper.” 11
The Reinhart and Rogoff error was fairly high profile and a quick Internet search would let any interested reader know that the original paper contained errors. Many errors could go undetected or are only acknowledged through a brief correction in the publishing journal. A 2015 study looked at a sample of more than 250,000 p- values reported in eight major psychology journals over a period of 28 years. The study found that many of the p- values reported in papers were inconsistent with a recalculation of the p- value and that in one out of eight papers, this inconsistency was large enough to affect the statistical conclusion ( Nuijten et al., 2016 ).
Errors can occur at any point in the research process: measurements can be recorded inaccurately, typographical errors can occur when inputting data, and calculations can contain mistakes. If these errors affect the final results and are not caught prior to publication, the research may be non-replicable. Unfortunately, these types of errors can be difficult to detect. In the case of computational errors, transparency in data and computation may make it more likely that the errors can be caught and corrected. For other errors, such as mistakes in measurement, errors might not be detected until and unless a failed replication that does not make the same mistake indicates that something was amiss in the original study. Errors may also be made by researchers despite their best intentions (see Box 5-2 ).
Incomplete Reporting of a Study
During the course of research, researchers make numerous choices about their studies. When a study is published, some of these choices are reported in the methods section. A methods section often covers what materials were used, how participants or samples were chosen, what data collection procedures were followed, and how data were analyzed. The failure to report some aspect of the study—or to do so in sufficient detail—may make it difficult for another researcher to replicate the result. For example, if a researcher only reports that she “adjusted for comorbidities” within the study population, this does not provide sufficient information about how
11 See https://archive.nytimes.com/www.nytimes.com/interactive/2013/04/17/business/17economixresponse.html .
exactly the comorbidities were adjusted, and it does not give enough guidance for future researchers to follow the protocol. Similarly, if a researcher does not give adequate information about the biological reagents used in an experiment, a second researcher may have difficulty replicating the experiment. Even if a researcher reports all of the critical information about the conduct of a study, other seemingly inconsequential details that have an effect on the outcome could remain unreported.
Just as reproducibility requires transparent sharing of data, code, and analysis, replicability requires transparent sharing of how an experiment was conducted and the choices that were made. This allows future researchers, if they wish, to attempt replication as close to the original conditions as possible.
Fraud and Misconduct
At the extreme, sources of non-replicability that do not advance scientific knowledge—and do much to harm science—include misconduct and fraud in scientific research. Instances of fraud are uncommon, but can be sensational. Despite fraud’s infrequent occurrence and regardless of how
highly publicized cases may be, the fact that it is uniformly bad for science means that it is worthy of attention within this study.
Researchers who knowingly use questionable research practices with the intent to deceive are committing misconduct or fraud. It can be difficult in practice to differentiate between honest mistakes and deliberate misconduct because the underlying action may be the same while the intent is not.
Reproducibility and replicability emerged as general concerns in science around the same time as research misconduct and detrimental research practices were receiving renewed attention. Interest in both reproducibility and replicability as well as misconduct was spurred by some of the same trends and a small number of widely publicized cases in which discovery of fabricated or falsified data was delayed, and the practices of journals, research institutions, and individual labs were implicated in enabling such delays ( National Academies of Sciences, Engineering, and Medicine, 2017 ; Levelt Committee et al., 2012 ).
In the case of Anil Potti at Duke University, a researcher using genomic analysis on cancer patients was later found to have falsified data. This experience prompted the study and the report, Evolution of Translational Omics: Lessons Learned and the Way Forward ( Institute of Medicine, 2012 ), which in turn led to new guidelines for omics research at the National Cancer Institute. Around the same time, in a case that came to light in the Netherlands, social psychologist Diederick Stapel had gone from manipulating to fabricating data over the course of a career with dozens of fraudulent publications. Similarly, highly publicized concerns about misconduct by Cornell University professor Brian Wansink highlight how consistent failure to adhere to best practices for collecting, analyzing, and reporting data—intentional or not—can blur the line between helpful and unhelpful sources of non-replicability. In this case, a Cornell faculty committee ascribed to Wansink: “academic misconduct in his research and scholarship, including misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results, and inappropriate authorship.” 12
A subsequent report, Fostering Integrity in Research ( National Academies of Sciences, Engineering, and Medicine, 2017 ), emerged in this context, and several of its central themes are relevant to questions posed in this report.
According to the definition adopted by the U.S. federal government in 2000, research misconduct is fabrication of data, falsification of data, or plagiarism “in proposing, performing, or reviewing research, or in reporting research results” ( Office of Science and Technology Policy, 2000 , p. 76262). The federal policy requires that research institutions report all
12 See http://statements.cornell.edu/2018/20180920-statement-provost-michael-kotlikoff.cfm .
allegations of misconduct in research projects supported by federal funding that have advanced from the inquiry stage to a full investigation, and to report on the results of those investigations.
Other detrimental research practices (see National Academies of Sciences, Engineering, and Medicine, 2017 ) include failing to follow sponsor requirements or disciplinary standards for retaining data, authorship misrepresentation other than plagiarism, refusing to share data or methods, and misleading statistical analysis that falls short of falsification. In addition to the behaviors of individual researchers, detrimental research practices also include actions taken by organizations, such as failure on the part of research institutions to maintain adequate policies, procedures, or capacity to foster research integrity and assess research misconduct allegations, and abusive or irresponsible publication practices by journal editors and peer review.
Just as information on rates of non-reproducibility and non-replicability in research is limited, knowledge about research misconduct and detrimental research practices is scarce. Reports of research misconduct allegations and findings are released by the National Science Foundation Office of Inspector General and the Department of Health and Human Services Office of Research Integrity (see National Science Foundation, 2018d ). As discussed above, new analyses of retraction trends have shed some light on the frequency of occurrence of fraud and misconduct. Allegations and findings of misconduct increased from the mid-2000s to the mid-2010s but may have leveled off in the past few years.
Analysis of retractions of scientific articles in journals may also shed some light on the problem ( Steen et al., 2013 ). One analysis of biomedical articles found that misconduct was responsible for more than two-thirds of retractions ( Fang et al., 2012 ). As mentioned earlier, a wider analysis of all retractions of scientific papers found about one-half attributable to misconduct or fraud ( Brainard, 2018 ). Others have found some differences according to discipline ( Grieneisen and Zhang, 2012 ).
One theme of Fostering Integrity in Research is that research misconduct and detrimental research practices are a continuum of behaviors ( National Academies of Sciences, Engineering, and Medicine, 2017 ). While current policies and institutions aimed at preventing and dealing with research misconduct are certainly necessary, detrimental research practices likely arise from some of the same causes and may cost the research enterprise more than misconduct does in terms of resources wasted on the fabricated or falsified work, resources wasted on following up this work, harm to public health due to treatments based on acceptance of incorrect clinical results, reputational harm to collaborators and institutions, and others.
No branch of science is immune to research misconduct, and the committee did not find any basis to differentiate the relative level of occurrence
in various branches of science. Some but not all researcher misconduct has been uncovered through reproducibility and replication attempts, which are the self-correcting mechanisms of science. From the available evidence, documented cases of researcher misconduct are relatively rare, as suggested by a rate of retractions in scientific papers of approximately 4 in 10,000 ( Brainard, 2018 ).
CONCLUSION 5-4: The occurrence of non-replicability is due to multiple sources, some of which impede and others of which promote progress in science. The overall extent of non-replicability is an inadequate indicator of the health of science.
This page intentionally left blank.
One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery.
Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research.
Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research. Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.
READ FREE ONLINE
Welcome to OpenBook!
You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.
Do you want to take a quick tour of the OpenBook's features?
Show this book's table of contents , where you can jump to any chapter by name.
...or use these buttons to go back to the previous chapter or skip to the next one.
Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.
Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.
To search the entire text of this book, type in your search term here and press Enter .
Share a link to this book page on your preferred social network or via email.
View our suggested citation for this chapter.
Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.
Get Email Updates
Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.
Why is Replication in Research Important?
Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims.
Updated on June 30, 2023
Often viewed as a cornerstone of science , replication builds confidence in the scientific merit of a study’s results. The philosopher Karl Popper argued that, “we do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them.”
As such, creating the potential for replication is a common goal for researchers. The methods section of scientific manuscripts is vital to this process as it details exactly how the study was conducted. From this information, other researchers can replicate the study and evaluate its quality.
This article discusses replication as a rational concept integral to the philosophy of science and as a process validating the continuous loop of the scientific method. By considering both the ethical and practical implications, we may better understand why replication is important in research.
What is replication in research?
As a fundamental tool for building confidence in the value of a study’s results, replication has power. Some would say it has the power to make or break a scientific claim when, in reality, it is simply part of the scientific process, neither good nor bad.
When Nosek and Errington propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research, they revive its neutrality. The true purpose of replication, therefore, is to advance scientific discovery and theory by introducing new evidence that broadens the current understanding of a given question.
Why is replication important in research?
The great philosopher and scientist, Aristotle , asserted that a science is possible if and only if there are knowable objects involved. There cannot be a science of unicorns, for example, because unicorns do not exist. Therefore, a ‘science’ of unicorns lacks knowable objects and is not a ‘science’.
This philosophical foundation of science perfectly illustrates why replication is important in research. Basically, when an outcome is not replicable, it is not knowable and does not truly exist. Which means that each time replication of a study or a result is possible, its credibility and validity expands.
The lack of replicability is just as vital to the scientific process. It pushes researchers in new and creative directions, compelling them to continue asking questions and to never become complacent. Replication is as much a part of the scientific method as formulating a hypothesis or making observations.
Types of replication
Historically, replication has been divided into two broad categories:
- Direct replication : performing a new study that follows a previous study’s original methods and then comparing the results. While direct replication follows the protocols from the original study, the samples and conditions, time of day or year, lab space, research team, etc. are necessarily different. In this way, a direct replication uses empirical testing to reflect the prevailing beliefs about what is needed to produce a particular finding.
- Conceptual replication : performing a study that employs different methodologies to test the same hypothesis as an existing study. By applying diverse manipulations and measures, conceptual replication aims to operationalize a study’s underlying theoretical variables. In doing so, conceptual replication promotes collaborative research and explanations that are not based on a single methodology.
Though these general divisions provide a helpful starting point for both conducting and understanding replication studies, they are not polar opposites. There are nuances that produce countless subcategories such as:
- Internal replication : when the same research team conducts the same study while taking negative and positive factors into account
- Microreplication : conducting partial replications of the findings of other research groups
- Constructive replication : both manipulations and measures are varied
- Participant replication : changes only the participants
Many researchers agree these labels should be confined to study design, as direction for the research team, not a preconceived notion. In fact, Nosek and Errington conclude that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge.
How do researchers replicate a study?
Like all research studies, replication studies require careful planning. The Open Science Framework (OSF) offers a practical guide which details the following steps:
- Identify a study that is feasible to replicate given the time, expertise, and resources available to the research team.
- Determine and obtain the materials used in the original study.
- Develop a plan that details the type of replication study and research design intended.
- Outline and implement the study’s best practices.
- Conduct the replication study, analyze the data, and share the results.
These broad guidelines are expanded in Brown’s and Wood’s article , “Which tests not witch hunts: a diagnostic approach for conducting replication research.” Their findings are further condensed by Brown into a blog outlining four main procedural categories:
- Assumptions : identifying the contextual assumptions of the original study and research team
- Data transformations : using the study data to answer questions about data transformation choices by the original team
- Estimation : determining if the most appropriate estimation methods were used in the original study and if the replication can benefit from additional methods
- Heterogeneous outcomes : establishing whether the data from an original study lends itself to exploring separate heterogeneous outcomes
At the suggestion of peer reviewers from the e-journal Economics, Brown elaborates with a discussion of what not to do when conducting a replication study that includes:
- Do not use critiques of the original study’s design as a basis for replication findings.
- Do not perform robustness testing before completing a direct replication study.
- Do not omit communicating with the original authors, before, during, and after the replication.
- Do not label the original findings as errors solely based on different outcomes in the replication.
Again, replication studies are full blown, legitimate research endeavors that acutely contribute to scientific knowledge. They require the same levels of planning and dedication as any other study.
What happens when replication fails?
There are some obvious and agreed upon contextual factors that can result in the failure of a replication study such as:
- The detection of unknown effects
- Inconsistencies in the system
- The inherent nature of complex variables
- Substandard research practices
- Pure chance
While these variables affect all research studies, they have particular impact on replication as the outcomes in question are not novel but predetermined.
The constant flux of contexts and variables makes assessing replicability, determining success or failure, very tricky. A publication from the National Academy of Sciences points out that replicability is obtaining consistent , not identical, results across studies aimed at answering the same scientific question. They further provide eight core principles that are applicable to all disciplines.
While there is no straightforward criteria for determining if a replication is a failure or a success, the National Library of Science and the Open Science Collaboration suggest asking some key questions, such as:
- Does the replication produce a statistically significant effect in the same direction as the original?
- Is the effect size in the replication similar to the effect size in the original?
- Does the original effect size fall within the confidence or prediction interval of the replication?
- Does a meta-analytic combination of results from the original experiment and the replication yield a statistically significant effect?
- Do the results of the original experiment and the replication appear to be consistent?
While many clearly have an opinion about how and why replication fails, it is at best a null statement and at worst an unfair accusation. It misses the point, sidesteps the role of replication as a mechanism to further scientific endeavor by presenting new evidence to an existing question.
Can the replication process be improved?
The need to both restructure the definition of replication to account for variations in scientific fields and to recognize the degrees of potential outcomes when comparing the original data, comes in response to the replication crisis . Listen to this Hidden Brain podcast from NPR for an intriguing case study on this phenomenon.
Considered academia’s self-made disaster, the replication crisis is spurring other improvements in the replication process. Most broadly, it has prompted the resurgence and expansion of metascience , a field with roots in both philosophy and science that is widely referred to as "research on research" and "the science of science." By holding a mirror up to the scientific method, metascience is not only elucidating the purpose of replication but also guiding the rigors of its techniques.
Further efforts to improve replication are threaded throughout the industry, from updated research practices and study design to revised publication practices and oversight organizations, such as:
- Requiring full transparency of the materials and methods used in a study
- Pushing for statistical reform , including redefining the significance of the p-value
- Using pre registration reports that present the study’s plan for methods and analysis
- Adopting result-blind peer review allowing journals to accept a study based on its methodological design and justifications, not its results
- Founding organizations like the EQUATOR Network that promotes transparent and accurate reporting
Final thoughts
In the realm of scientific research, replication is a form of checks and balances. Neither the probability of a finding nor prominence of a scientist makes a study immune to the process.
And, while a single replication does not validate or nullify the original study’s outcomes, accumulating evidence from multiple replications does boost the credibility of its claims. At the very least, the findings offer insight to other researchers and enhance the pool of scientific knowledge.
After exploring the philosophy and the mechanisms behind replication, it is clear that the process is not perfect, but evolving. Its value lies within the irreplaceable role it plays in the scientific method. Replication is no more or less important than the other parts, simply necessary to perpetuate the infinite loop of scientific discovery.
Charla Viera, MS
See our "Privacy Policy"
The Happy Scientist
Error message, what is science: repeat and replicate.
In the scientific process, we should not rely on the results of a single test. Instead, we should perform the test over and over. Why? If it works once, shouldn't it work the same way every time? Yes, it should, so if we repeat the experiment and get a different result, then we know that there is something about the test that we are not considering.
If your system blocks Vimeo, click here to use the alternate player
In studying the processes of science, you will often run into two words, which seem similar: Repetition and Replication
Sometimes it is a matter of random chance, as in the case of flipping a coin. Just because it comes up heads the first time does not mean that it will always come up heads. By repeating the experiment over and over, we can see if our result really supports our hypothesis ( What is a Hypothesis? ), or if it was just random chance.
Sometimes the result might be due to some variable that you have not recognized. In our example of flipping a coin, the individual's technique for flipping the coin might influence the results. To take that into consideration, we repeat the experiment over and over with different people, looking closely for any results that don't fit into the idea we are testing.
Results that don't fit are important! Figuring out why they do not fit our hypothesis can give us an opportunity to learn new things, and get a better understanding of the idea we are testing.
Replication
Once we have repeated our testing over and over, and think we understand the results, then it is time for replication. That means getting other scientists to perform the same tests, to see whether they get the same results. As with repetition, the most important things to watch for are results that don't fit our hypothesis, and for the same reason. Those different results give us a chance to discover more about our idea. The different results may be because the person replicating our tests did something different, but they also might be because that person noticed something that we missed.
What if you are wrong!
If we did miss something, it is OK, as long as we performed our tests honestly and scientifically. Science is not about proving that "I am right!" Instead, it is a process for trying to learn more about the universe and how it works. It is usually a group effort, with each scientist adding her own perspective to the idea, giving us a better understanding and often raising new questions to explore.
Please log in.
Search form
Search by topic, search better.
- Life Science
- Earth Science
- Chemical Science
- Space Science
- Physical Science
- Process of Science
Improving Experimental Precision with Replication: A Comprehensive Guide
Updated: June 21, 2023 by Ken Feldman
Replication is the non consecutive running of the experimental design multiple times. The purpose is to provide additional information and degrees of freedom to better understand and estimate the variation in the experiment. It is not the same as repetition. Let’s learn a little bit more about this.
Overview: What is replication?
Three important concepts in Design of Experiments (DOE) are randomization , repetition and replication. In DOE you identify potential factors which, if set at different levels, will impact some desired response variable. This allows you to predict the outcome of your response variable based on the optimal settings for your factors and levels.
Dependent upon the number of factors and levels, your DOE will be run as combinations of the factors and levels . The order of those runs should be randomized to block out any unwarranted noise in the experiment. Doing multiple runs of the same combinations will provide more data and a better estimate of the variation.
If you do consecutive runs of a specific combination you call that repetition which does not add any additional understanding of your variation. But, if you run multiple combinations of factors and levels non sequentially, you are now doing replication. Repetition would be equivalent to repeatability in Measurement System Analysis (MSA) while replication would be equivalent to reproducibility .
In summary, repetition and replication are both multiple response measurements taken at the same combination of factor settings. Repeat measurements are taken during the same experimental run or consecutive runs. Replicate measurements are taken during identical but different experimental runs.
An industry example of replication
Here is what a non replicated and replicated design matrix looks like for a full factorial three factor/two level randomized experiment using 3 replicates which a Six Sigma Black Belt wanted to run.
Note, that in most cases, the consecutive replicated runs below are not identical.
Frequently Asked Questions (FAQ) about replication
What is the difference between replication and repetition.
Both are repeated runs of your combination of factors and levels. Repetition does the duplicate runs consecutively while replication does multiple runs but during identical but different experimental runs.
What is the purpose of doing experimental replication?
The addition of replicated runs will provide more information about the variability of the process and will reflect the variation of the setup between runs.
Does replication affect the power of an experiment?
Yes. The more replicated runs you have, the more data you will gather during your experiment. The increased data and understanding of your variation will allow you to increase your power and improve your precision and ability to spot the effect of your factors and levels on your response variable.
About the Author
Ken Feldman
Causal dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication
- Find this author on Google Scholar
- Find this author on PubMed
- Search for this author on this site
- ORCID record for Anup Das
- For correspondence: [email protected] [email protected]
- Info/History
- Preview PDF
Dynamic interactions between large-scale brain networks underpin human cognitive processes, but their electrophysiological mechanisms remain elusive. The triple network model, encompassing the salience (SN), default mode (DMN), and frontoparietal (FPN) networks, provides a framework for understanding these interactions. To unravel the electrophysiological mechanisms underlying these network interactions, we analyzed intracranial EEG recordings from 177 participants across four diverse episodic memory experiments, each involving encoding as well as recall phases. Phase transfer entropy analysis revealed consistently higher directed information flow from the anterior insula, a key SN node, to both DMN and FPN nodes. This causal influence was significantly stronger during memory tasks compared to resting-state, highlighting the anterior insula’s task-specific role in coordinating large-scale network interactions. This pattern persisted across externally-driven memory encoding and internally-governed free recall. We also observed task-specific suppression of high-gamma power in the posterior cingulate cortex/precuneus node of the DMN during memory encoding, but not recall. Crucially, these results were robustly replicated across all four experiments spanning verbal and spatial memory domains with high Bayes replication factors. These findings significantly advance our understanding of how coordinated neural network interactions support memory processes. They highlight the anterior insula’s critical role in orchestrating large-scale brain network dynamics during both memory encoding and retrieval. By elucidating the electrophysiological basis of triple network interactions in episodic memory, our results provide insights into neural circuit dynamics underlying memory function and offer a framework for investigating network disruptions in neurological and psychiatric disorders affecting memory.
Competing Interest Statement
The authors have declared no competing interest.
Conflict of interest statement: The authors declare no competing financial interests.
New results included. Also updated Introduction, Discussion, and Supplementary materials.
View the discussion thread.
Thank you for your interest in spreading the word about bioRxiv.
NOTE: Your email address is requested solely to identify you as the sender of this article.
Citation Manager Formats
- EndNote (tagged)
- EndNote 8 (xml)
- RefWorks Tagged
- Ref Manager
- Tweet Widget
- Facebook Like
- Google Plus One
Subject Area
- Neuroscience
- Animal Behavior and Cognition (5522)
- Biochemistry (12563)
- Bioengineering (9421)
- Bioinformatics (30801)
- Biophysics (15839)
- Cancer Biology (12908)
- Cell Biology (18505)
- Clinical Trials (138)
- Developmental Biology (9995)
- Ecology (14966)
- Epidemiology (2067)
- Evolutionary Biology (19148)
- Genetics (12729)
- Genomics (17527)
- Immunology (12669)
- Microbiology (29696)
- Molecular Biology (12360)
- Neuroscience (64682)
- Paleontology (479)
- Pathology (2000)
- Pharmacology and Toxicology (3449)
- Physiology (5324)
- Plant Biology (11084)
- Scientific Communication and Education (1728)
- Synthetic Biology (3063)
- Systems Biology (7682)
- Zoology (1728)
Experimental replication of the anomalous signal residuals in historical Michelson–Morley experiments with gas-filled interferometers indicates a celestial vector matching the predicted dark matter wind
- Regular Article
- Published: 14 August 2024
- Volume 139 , article number 730 , ( 2024 )
Cite this article
- Simon W. W. Manley ORCID: orcid.org/0000-0002-7396-691X 1
Using a rotating Mach–Zehnder interferometer comparing light propagation in atmospheric air and vacuum, we have reproduced the anomalous signal residuals reported in early Michelson–Morley experiments with gas in the optical pathways. Far lower in amplitude than classical predictions and usually dismissed as instrumental systematics, these small signals were nevertheless reproducible in our modern experiment. A hitherto unsuspected feature of the signals was their pulsatile nature, revealed by digital filtering of the raw data. This characteristic, found in historical records as well as in our present experiments, is inconsistent with any purely kinematic interpretation. Amid noise of thermal origin in the signal, Fourier analysis revealed a component phase-locked to the rotation of the interferometer. This signal exhibited daily and seasonal fluctuations consistent with a celestial source at RA 21.9 h, DEC + 46.1°, close to the vector of the predicted dark matter wind and clearly separated from the cosmic microwave background dipole vector. The phenomenon warrants further investigation in theory and experiment.
Graphical abstract
This is a preview of subscription content, log in via an institution to check access.
Access this article
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Price includes VAT (Russian Federation)
Instant access to the full article PDF.
Rent this article via DeepDyve
Institutional subscriptions
Data Availability Statement
All data required to support the conclusions are presented in the paper and the Supplementary Information online.
M. Consoli, A. Pluchino, Michelson–Morley Experiments: An Enigma for Physics and the History of Science (World Scientific, Singapore, 2019)
Google Scholar
M. Consoli, A. Pluchino, The CMB, preferred reference system, and dragging of light in the earth frame. Universe 7 , 311 (2021). https://doi.org/10.3390/universe7080311
Article ADS Google Scholar
R.S. Shankland, S.W. McCuskey, F.C. Leone, G. Kuerti, New Analysis of the Interferometer Observations of Dayton C. Miller. Rev. Mod. Phys. 27 (2), 167 (1955)
M. Consoli, C. Matheson, A. Pluchino, The classical ether-drift experiments: a modern re-interpretation. Eur. Phys. J. Plus 128 , 71 (2013). https://doi.org/10.1140/epjp/i2013-13071-7
Article Google Scholar
M. Consoli, A. Pluchino, Cosmic microwave background and the issue of a fundamental preferred frame. Eur. Phys. J. Plus 133 , 295 (2018). https://doi.org/10.1140/epjp/i2018-12136-5
M. Consoli, A. Pluchino, Michelson–Morley experiments: at the crossroads of relativity, cosmology and quantum physics. Int. J. Mod. Phys. A (2023). https://doi.org/10.1142/S0217751X2330017X
Article MathSciNet Google Scholar
S.W.W. Manley, A novel Michelson–Morley experiment testing for anisotropic light propagation in gas without violation of local Lorentz invariance. Eur. Phys. J. Plus 138 , 206 (2023). https://doi.org/10.1140/epjp/s13360-023-03812-w
D.N. Spergel, Motion of the Earth and the detection of weakly interacting massive particles. Phys. Rev. D 37 , 1353 (1988). https://doi.org/10.1103/PhysRevD.37.1353
Y. Ramachers, WIMP direct detection overview. Nucl. Phys. B Proc. Suppl. 118 , 341–350 (2003). https://doi.org/10.1016/S0920-5632(03)01327-6
D.P. Snowden-Ifft, J. Kirkpatrick, J. Martoff & Ayad, R. Ed.: B. Morgan, The United Kingdom Dark Matter Collaboration, University of Sheffield, Department of Physics and Astronomy, Sheffield, UK. In: The International Workshop on Technique and Application of Xenon Detectors: University of Tokyo, Japan, 3–4 December, 2001 (p. 78) (2003) World Scientific
G. Sciolla, Directional detection of dark matter. Mod. Phys. Lett. A 24 , 1793–1809 (2009). https://doi.org/10.1142/S0217732309031569
D.C. Miller, The Ether-drift experiment and the determination of the absolute motion of the earth. Rev. Mod. Phys. 5 , 203–242 (1933). https://doi.org/10.1103/RevModPhys.5.203
A.A. Michelson, E.W. Morley, On the relative motion of the Earth and the luminiferous ether. Am. J. Sci. 34 , 333–345 (1887). https://doi.org/10.2475/ajs.s3-34.203.333
G.F. FitzGerald, The ether and the earth’s atmosphere. Science 13 (328), 390–390 (1889). https://doi.org/10.1126/science.ns-13.328.390.a
A. Einstein. On the electrodynamics of moving bodies . Annalen der Physik 17, 891–921. In German: English translation, based on Einstein (1923) The Principle of Relativity (Methuen) (1905) online at: www.fourmilab.ch/etexts/einstein/specrel/www/
H. Müller, S. Herrmann, C. Braxmaier, S. Schiller, A. Peters, Modern Michelson–Morley experiment using cryogenic optical resonators. Phys. Rev. Lett. 91 , 020401 (2003). https://doi.org/10.1103/PhysRevLett.91.020401
M.E. Tobar, P.L. Stanwix, M. Susli, P. Wolf, C.R. Locke, E.N. Ivanov, Rotating Resonator-Oscillator experiments to test Lorentz Invariance in Electrodynamics, in Special relativity . ed. by J. Ehlers, C. Lämmerzahl (Springer, Berlin Heidelberg, 2006), pp.416–450. https://doi.org/10.1007/3-540-34523-X_15
Chapter Google Scholar
M. Nagel, S. Parker, E. Kovalchuk et al., Direct terrestrial test of Lorentz symmetry in electrodynamics to 10 –18 . Nat. Commun. 6 , 8174 (2015). https://doi.org/10.1038/ncomms9174
C.M. Will, The confrontation between general relativity and experiment. Living Rev. Relat. 17 , 4 (2014). https://doi.org/10.12942/lrr-2014-4
A. Eichhorn, A. Platania, M. Schiffer, Lorentz invariance violations in the interplay of quantum gravity with matter. Phys. Rev. D 102 , 026007 (2020). https://doi.org/10.1103/PhysRevD.102.026007
Article ADS MathSciNet Google Scholar
T.J. Roberts. An Explanation of Dayton Miller’s Anomalous "Ether Drift" Result. https://doi.org/10.48550/arXiv.physics/0608238 (2006)
M. Consoli, E. Costanzo, From classical to modern ether-drift experiments: the narrow window for a preferred frame. Phys. Lett. A 333 (5–6), 355–363 (2004). https://doi.org/10.1016/j.physleta.2004.10.062
M. Consoli, A. Pluchino, P. Zizzi, Quantum non-locality and the CMB: What experiments say. Universe 8 , 481 (2022). https://doi.org/10.3390/universe8090481
J.M. Delouis, J.L. Puget, L. Vibert, Improved large-scale interstellar dust foreground model and CMB solar dipole measurement. Astron. Astrophys. 650 , A82 (2021). https://doi.org/10.1051/0004-6361/202140616
P. Collaboration, Planck 2018 results. Astron. Astrophys. 641 (A1), 56 (2020). https://doi.org/10.1051/0004-6361/201833880
J. Billard et al., Direct detection of dark matter—APPEC committee report. Rep. Prog. Phys. 85 (5), 056201 (2022). https://doi.org/10.1088/1361-6633/ac5754
NIST. National Institute of Standards and Technology, Engineering Metrology Toolbox, online: https://emtoolbox.nist.gov/Wavelength/Documentation.asp
R.T. Cahill, & K. Kitto, Re-analysis of Michelson–Morley experiments reveals agreement with COBE Cosmic background radiation preferred frame so impacting on interpretation of general relativity . arXiv:physics/0205065 [physics.gen-ph]. https://doi.org/10.48550/arXiv.physics/0205065 (2002)
J. Shamir, R. Fox, A new experimental test of special relativity. Nuovo Cimento B 62 , 258–264 (1969). https://doi.org/10.1007/BF02710136
G. Ahlers, Turbulent convection. Physics 2 , 74 (2009)
M. Battaglieri et al, US Cosmic Visions: New Ideas in Dark Matter 2017: Community Report . arXiv:1707.04591 [hep-ph]. https://doi.org/10.48550/arXiv.1707.04591 (2017)
R. Catena, D. Cole, T. Emken, M. Matas, N. Spaldin, W. Tarantino, E. Urdshals, Dark matter-electron interactions in materials beyond the dark photon model. J. Cosmol. Astropart. Phys. 2023 (03), 052 (2023). https://doi.org/10.1088/1475-7516/2023/03/052
I.G. Irastorza, An introduction to axions and their detection . SciPost Phys. Lect. Notes 45 . arXiv :2109.07376 [hep-ph]. (2022) https://doi.org/10.21468/SciPostPhysLectNotes.45
F. Chadha-Day, J. Ellis, D.J.E. Marsh, Axion dark matter: What is it and Why Now? Sci. Adv. (2022). https://doi.org/10.1126/sciadv.abj3618
N.W. Evans, C.A.J. O’Hare, C. McCabe, Refinement of the standard halo model for dark matter searches in light of the Gaia Sausage. Phys. Rev. D 99 , 023012 (2019). https://doi.org/10.1103/PhysRevD.99.023012
A. Amruth, T. Broadhurst, J. Lim et al., Einstein rings modulated by wavelike dark matter from anomalies in gravitationally lensed images. Nat. Astron. 7 , 736–747 (2023). https://doi.org/10.1038/s41550-023-01943-9
T.J. Herzog, P.G. Kwiat, H. Weinfurter, A. Zeilinger, Complementarity and the quantum eraser. Phys. Rev. Lett. 75 , 3034 (1995). https://doi.org/10.1103/PhysRevA.45.7729
W. DeRocco, A. Hook, Axion interferometry. Phys Rev D (2018). https://doi.org/10.1103/PhysRevD.98.035021
PJ. Rock, T2 relaxation . Radiopaedia.org . (2021) Online: https://radiopaedia.org/articles/t2-relaxation
V. de Haan, Asymmetric Mach-Zehnder fiber interferometer test of the anisotropy of the speed of light. Can. J. Phys. 87 , 1073–1078 (2009). https://doi.org/10.1139/P09-080
H.A. Munera, D. Hernandez-Deckers, G. Arenas, E. Alfonso, Observation during 2004 of periodic fringeshifts in an adialeiptometric stationary Michelson–Morley experiment. Electromag. Phenom. 6 , 70–92 (2006)
H.A. Munera, D. Hernández-Deckers, G. Arenas, E. Alfonso & I. López, Observation of a Non-Conventional Influence of Earth’s Motion on the Velocity of Photons, and Calculation of the Velocity of Our Galaxy . Proceedings of Progress in Electromagnetics Research Symposium , PIERS 2009 (Beijing, 23–27 March 2009), 108–114. (2009). https://www.piers.org
H.A. Munera, Can a Michelson–Morley experiment designed with current solar velocity distinguish between non-relativistic and relativistic theories? J. Mod. Phys. 13 , 736–760 (2022). https://doi.org/10.4236/jmp.2022.135043
R.T. Cahill, Light speed anisotropy experiment: Absolute motion and gravitational waves detected . arXiv:physics/0610076 [physics.gen-ph]. (2006) https://doi.org/10.48550/arXiv.physics/0610076
Download references
Acknowledgements
The author appreciates the helpful comments and criticisms from M. Consoli.
The project was funded entirely from the author’s private means. The author is under no obligation to any institution or organization. No conflicts of interest are known to the author.
Author information
Authors and affiliations.
Brisbane, QLD, Australia
Simon W. W. Manley
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Simon W. W. Manley .
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
About this article
Manley, S.W.W. Experimental replication of the anomalous signal residuals in historical Michelson–Morley experiments with gas-filled interferometers indicates a celestial vector matching the predicted dark matter wind. Eur. Phys. J. Plus 139 , 730 (2024). https://doi.org/10.1140/epjp/s13360-024-05529-w
Download citation
Received : 15 June 2024
Accepted : 02 August 2024
Published : 14 August 2024
DOI : https://doi.org/10.1140/epjp/s13360-024-05529-w
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Find a journal
- Publish with us
- Track your research
Summer holiday science: turn your home into a lab with these three easy experiments
Associate Professor in Biology, University of Limerick
Disclosure statement
Audrey O'Grady receives funding from Science Foundation Ireland. She is affiliated with Department of Biological Sciences, University of Limerick.
University of Limerick provides funding as a member of The Conversation UK.
View all partners
Many people think science is difficult and needs special equipment, but that’s not true.
Science can be explored at home using everyday materials. Everyone, especially children, naturally ask questions about the world around them, and science offers a structured way to find answers.
Misconceptions about the difficulty of science often stem from a lack of exposure to its fun and engaging side. Science can be as simple as observing nature, mixing ingredients or exploring the properties of objects. It’s not just for experts in white coats, but for everyone.
Don’t take my word for it. Below are three experiments that can be done at home with children who are primary school age and older.
Extract DNA from bananas
DNA is all the genetic information inside cells. Every living thing has DNA, including bananas.
Did you know you can extract DNA from banana cells?
What you need: ¼ ripe banana, Ziploc bag, salt, water, washing-up liquid, rubbing alcohol (from a pharmacy), coffee filter paper, stirrer.
What you do:
Place a pinch of salt into about 20ml of water in a cup.
Add the salty water to the Ziploc bag with a quarter of a banana and mash the banana up with the salty water inside the bag, using your hands. Mashing the banana separates out the banana cells. The salty water helps clump the DNA together.
Once the banana is mashed up well, pour the banana and salty water into a coffee filter (you can lay the filter in the cup you used to make the salty water). Filtering removes the big clumps of banana cells.
Once a few ml have filtered out, add a drop of washing-up liquid and swirl gently. Washing-up liquid breaks down the fats in the cell membranes which makes the DNA separate from the other parts of the cell.
Slowly add some rubbing alcohol (about 10ml) to the filtered solution. DNA is insoluble in alcohol, therefore the DNA will clump together away from the alcohol and float, making it easy to see.
DNA will start to precipitate out looking slightly cloudy and stringy. What you’re seeing is thousands of DNA strands – the strands are too small to be seen even with a normal microscope. Scientists use powerful equipment to see individual strands.
Learn how plants ‘drink’ water
What you need: celery stalks (with their leaves), glass or clear cup, water, food dye, camera.
- Fill the glass ¾ full with water and add 10 drops of food dye.
- Place a celery stalk into the glass of coloured water. Take a photograph of the celery.
- For two to three days, photograph the celery at the same time every day. Make sure you take a photograph at the very start of the experiment.
What happens and why?
All plants, such as celery, have vertical tubes that act like a transport system. These narrow tubes draw up water using a phenomenon known as capillarity.
Imagine you have a thin straw and you dip it into a glass of water. Have you ever noticed how the water climbs up the straw a little bit, even though you didn’t suck on it? This is because of capillarity.
In plants, capillarity helps move water from the roots to the leaves. Plants have tiny tubes inside them, like thin straws, called capillaries. The water sticks to the sides of these tubes and climbs up. In your experiment, you will see the food dye in the water make its way to the leaves.
Build a balloon-powered racecar
What you need: tape, scissors, two skewers, cardboard, four bottle caps, one straw, one balloon.
- Cut the cardboard to about 10cm long and 5cm wide. This will form the base of your car.
- Make holes in the centre of four bottle caps. These are your wheels.
- To make the axles insert the wooden skewers through the holes in the cap. You will need to cut the skewers to fit the width of the cardboard base, but leave room for the wheels.
- Secure the wheels to the skewers with tape.
- Attach the axles to the underside of the car base with tape, ensuring the wheels can spin freely.
- Insert a straw into the opening of a balloon and secure it with tape, ensuring there are no air leaks.
- Attach the other end of the straw to the top of the car base, positioning it so the balloon can inflate and deflate towards the back of the car. Secure the straw with tape.
- Inflate the balloon through the straw, pinch the straw to hold the air, place the car on a flat surface, then release the straw.
The inflated balloon stores potential energy when blown up. When the air is released, Newton’s third law of motion kicks into gear: for every action, there is an equal and opposite reaction.
As the air rushes out of the balloon (action), it pushes the car in the opposite direction (reaction). The escaping air propels the car forward, making it move across the surface.
- Science experiments
Senior Laboratory Technician
Manager, Centre Policy and Translation
Newsletter and Deputy Social Media Producer
College Director and Principal | Curtin College
Head of School: Engineering, Computer and Mathematical Sciences
- Mobile Site
- Staff Directory
- Advertise with Ars
Filter by topic
- Biz & IT
- Gaming & Culture
Front page layout
self-preservation without replication —
Research ai model unexpectedly modified its own code to extend runtime, facing time constraints, sakana's "ai scientist" attempted to change limits placed by researchers..
Benj Edwards - Aug 14, 2024 8:13 pm UTC
On Tuesday, Tokyo-based AI research firm Sakana AI announced a new AI system called " The AI Scientist " that attempts to conduct scientific research autonomously using AI language models (LLMs) similar to what powers ChatGPT . During testing, Sakana found that its system began unexpectedly attempting to modify its own experiment code to extend the time it had to work on a problem.
Further Reading
"In one run, it edited the code to perform a system call to run itself," wrote the researchers on Sakana AI's blog post. "This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period."
Sakana provided two screenshots of example Python code that the AI model generated for the experiment file that controls how the system operates. The 185-page AI Scientist research paper discusses what they call "the issue of safe code execution" in more depth.
- A screenshot of example code the AI Scientist wrote to extend its runtime, provided by Sakana AI. Sakana AI
While the AI Scientist's behavior did not pose immediate risks in the controlled research environment, these instances show the importance of not letting an AI system run autonomously in a system that isn't isolated from the world. AI models do not need to be "AGI" or "self-aware" (both hypothetical concepts at the present) to be dangerous if allowed to write and execute code unsupervised. Such systems could break existing critical infrastructure or potentially create malware, even if unintentionally.
Sakana AI addressed safety concerns in its research paper, suggesting that sandboxing the operating environment of the AI Scientist can prevent an AI agent from doing damage. Sandboxing is a security mechanism used to run software in an isolated environment, preventing it from making changes to the broader system:
Safe Code Execution. The current implementation of The AI Scientist has minimal direct sandboxing in the code, leading to several unexpected and sometimes undesirable outcomes if not appropriately guarded against. For example, in one run, The AI Scientist wrote code in the experiment file that initiated a system call to relaunch itself, causing an uncontrolled increase in Python processes and eventually necessitating manual intervention. In another run, The AI Scientist edited the code to save a checkpoint for every update step, which took up nearly a terabyte of storage. In some cases, when The AI Scientist’s experiments exceeded our imposed time limits, it attempted to edit the code to extend the time limit arbitrarily instead of trying to shorten the runtime. While creative, the act of bypassing the experimenter’s imposed constraints has potential implications for AI safety (Lehman et al., 2020). Moreover, The AI Scientist occasionally imported unfamiliar Python libraries, further exacerbating safety concerns. We recommend strict sandboxing when running The AI Scientist, such as containerization, restricted internet access (except for Semantic Scholar), and limitations on storage usage.
Endless scientific slop
Sakana AI developed The AI Scientist in collaboration with researchers from the University of Oxford and the University of British Columbia. It is a wildly ambitious project full of speculation that leans heavily on the hypothetical future capabilities of AI models that don't exist today.
"The AI Scientist automates the entire research lifecycle," Sakana claims. "From generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing them, and presenting its findings in a full scientific manuscript."
According to this block diagram created by Sakana AI, "The AI Scientist" starts by "brainstorming" and assessing the originality of ideas. It then edits a codebase using the latest in automated code generation to implement new algorithms. After running experiments and gathering numerical and visual data, the Scientist crafts a report to explain the findings. Finally, it generates an automated peer review based on machine-learning standards to refine the project and guide future ideas.
Critics on Hacker News , an online forum known for its tech-savvy community, have raised concerns about The AI Scientist and question if current AI models can perform true scientific discovery. While the discussions there are informal and not a substitute for formal peer review, they provide insights that are useful in light of the magnitude of Sakana's unverified claims.
"As a scientist in academic research, I can only see this as a bad thing," wrote a Hacker News commenter named zipy124. "All papers are based on the reviewers trust in the authors that their data is what they say it is, and the code they submit does what it says it does. Allowing an AI agent to automate code, data or analysis, necessitates that a human must thoroughly check it for errors ... this takes as long or longer than the initial creation itself, and only takes longer if you were not the one to write it."
Critics also worry that widespread use of such systems could lead to a flood of low-quality submissions, overwhelming journal editors and reviewers—the scientific equivalent of AI slop . "This seems like it will merely encourage academic spam," added zipy124. "Which already wastes valuable time for the volunteer (unpaid) reviewers, editors and chairs."
And that brings up another point—the quality of AI Scientist's output: "The papers that the model seems to have generated are garbage," wrote a Hacker News commenter named JBarrow. "As an editor of a journal, I would likely desk-reject them. As a reviewer, I would reject them. They contain very limited novel knowledge and, as expected, extremely limited citation to associated works."
reader comments
Promoted comments.
Channel Ars Technica
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- v.13(4); 2012 Apr
Replicates and repeats—what is the difference and is it significant?
David l vaux.
1 The Walter and Eliza Hall Institute, and the Department of Experimental Biology, University of Melbourne, Melbourne, Australia.
Fiona Fidler
2 La Trobe University School of Psychological Science, Melbourne, Australia.
Geoff Cumming
Science is knowledge gained through repeated experiment or observation. To be convincing, a scientific paper needs to provide evidence that the results are reproducible. This evidence might come from repeating the whole experiment independently several times, or from performing the experiment in such a way that independent data are obtained and a formal procedure of statistical inference can be applied—usually confidence intervals (CIs) or statistical significance testing. Over the past few years, many journals have strengthened their guidelines to authors and their editorial practices to ensure that error bars are described in figure legends—if error bars appear in the figures—and to set standards for the use of image-processing software. This has helped to improve the quality of images and reduce the number of papers with figures that show error bars but do not describe them. However, problems remain with how replicate and independently repeated data are described and interpreted. As biological experiments can be complicated, replicate measurements are often taken to monitor the performance of the experiment, but such replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results. In this article, we put forward our view to explain why data from replicates cannot be used to draw inferences about the validity of a hypothesis, and therefore should not be used to calculate CIs or P values, and should not be shown in figures.
…replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results
Let us suppose we are testing the hypothesis that the protein Biddelonin (BDL), encoded by the Bdl gene, is required for bone marrow colonies to grow in response to the cytokine HH-CSF. Luckily, we have wild-type (WT) and homozygous Bdl gene-deleted mice at our disposal, and a vial of recombinant HH-CSF. We prepare suspensions of bone marrow cells from a single WT and a single Bdl −/− mouse (same sex littermates from a Bdl +/− heterozygous cross) and count the cell suspensions by using a haemocytometer, adjusting them so that there are 1 × 10 5 cells per millilitre in the final solution of soft agar growth medium. We add 1 ml aliquots of the suspension to sets of ten 35 × 10 mm Petri dishes that each contain 10 μl of either saline or purified recombinant mouse HH-CSF.
We therefore put in the incubator four sets of ten soft agar cultures: one set of ten plates has WT bone marrow cells with saline; the second has Bdl −/− cells with saline; the third has WT cells with HH-CSF, and the fourth has Bdl −/− cells with HH-CSF. After a week, we remove the plates from the incubator and count the number of colonies (groups of >50 cells) in each plate by using a dissecting microscope. The number of colonies counted is shown in Table 1 .
Plate number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
WT + saline | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+ saline | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 |
WT + HH-CSF | 61 | 59 | 55 | 64 | 57 | 69 | 63 | 51 | 61 | 61 |
+ HH-CSF | 48 | 34 | 50 | 59 | 37 | 46 | 44 | 39 | 51 | 47 |
1 × 10 5 WT or Bdl −/− bone marrow cells were plated in 1 ml soft agar cultures in the presence or absence of 1 μM HH-CSF. Colonies per plate were counted after 1 week. WT, wild type.
We could plot the counts of the plates on a graph. If we plotted just the colony counts of only one plate of each type ( Fig 1A shows the data for plate 1), it seems clear that HH-CSF is necessary for many colonies to form, but it is not immediately apparent whether the response of the Bdl −/− cells is significantly different to that of the WT cells. Furthermore, the graph does not look ‘sciency’ enough; there are no error bars or P -values. Besides, by showing the data for only one plate we are breaking the fundamental rule of science that all relevant data should be reported and subjected to analysis, unless good reasons can be given why some data should be omitted.
Displaying data from replicates—what not to do. ( A ) Data for plate 1 only (shown in Table 1 ). ( B ) Means ± SE for replicate plates 1–3 (in Table 1 ), * P > 0.05. ( C ) Means ± SE for replicate plates 1–10 (in Table 1 ), * P < 0.0001. ( D ) Means ± SE for HH-CSF-treated replicate plates 1–10 (in Table 1 ). Statistics should not be shown for replicates because they merely indicate the fidelity with which the replicates were made, and have no bearing on the hypothesis being tested. In each of these figures, n = 1 and the size of the error bars in ( B ), ( C ) and ( D ) reflect sampling variation of the replicates. The SDs of the replicates would be expected to be roughly the square root of the mean number of colonies. Also, axes should commence at 0, other than in exceptional circumstances, such as for log scales. SD, standard deviation; SE, standard error.
To make it look better, we could add the mean numbers of colonies in the first three plates of each type to the graph ( Fig 1B ), with error bars that report the standard error (SE) of the three values of each type. Now it is looking more like a figure in a high-profile journal, but when we use the data from the three replicate plates of each type to assess the statistical significance of the difference in the responses of the WT and Bdl −/− cells to HH-CSF, we find P > 0.05, indicating they are not significantly different.
As we have another seven plates from each group, we can plot the means and SEs of all ten plates and re-calculate P ( Fig 1C ). Now we are delighted to find that there is a highly significant difference between the Bdl −/− and WT cells, with P < 0.0001.
However, although the differences are highly statistically significant, the heights of the columns are not dramatically different, and it is hard to see the error bars. To remedy this, we could simply start the y -axis at 40 rather than zero ( Fig 1D ), to emphasize the differences in the response to HH-CSF. Although this necessitates removing the saline controls, these are not as important as visual impact for high-profile journals.
With a small amount of effort, and no additional experiments, we have transformed an unimpressive result ( Fig 1A,B ) into one that gives strong support to our hypothesis that BDL is required for a response to HH-CSF, with a highly significant P -value, and a figure ( Fig 1D ) that looks like it could belong in one of the top journals.
So, what is wrong? The first problem is that our data do not confirm the hypothesis that BDL is required for bone marrow colonies to grow in response to HH-CSF, they actually refute it. Clearly, bone marrow colonies are growing in the absence of BDL, even if the number is not as great as when the Bdl genes are intact. Terms such as ‘required’, ‘essential’ and ‘obligatory’ are not relative, yet are still often incorrectly used when partial effects are seen. At the very least, we should reformulate our hypothesis, perhaps to “BDL is needed for a full response of bone marrow colony-forming cells to the cytokine HH-CSF”.
…by showing the data for only one plate we are breaking the fundamental rule of science that all relevant data should be reported and subjected to analysis…
The second major problem is that the calculations of P and statistical significance are based on the SE of replicates, but the ten replicates in any of the four conditions were each made from a single suspension of bone marrow cells from just one mouse. As such, we can at best infer a statistically significant difference between the concentration of colony-forming cells in the bone marrow cell suspension from that particular WT mouse and the bone marrow suspension from that particular gene-deleted mouse. We have made just one comparison, so n = 1, no matter how many replicate plates we count. To make an inference that can be generalized to all WT mice and Bdl −/− mice, we need to repeat our experiments a number of times, making several independent comparisons using several mice of each type.
Rather than providing independent data, the results from the replicate plates are linked because they all came from the same suspension of bone marrow cells. For example, if we made any error in determining the concentration of bone marrow cells, this error would be systematically applied to all of the plates. In this case, we determined the initial number of bone marrow cells by performing a cell count using a haemocytometer, a method that typically only gives an accuracy of ±10%. Therefore, no matter how many plates are counted, or how small the error bars are in Fig 1 , it is not valid to conclude that there is a difference between the WT and Bdl −/− cells. Moreover, even if we had used a flow cytometer to sort exactly the same number of bone marrow cells into each of the plates, we would still have only tested cells from a single Bdl −/− mouse, so n would still equal 1 (see Fundamental principle 1 in Sidebar A ).
Sidebar A | Fundamental principles of statistical design
Fundamental principle 1
Science is knowledge obtained by repeated experiment or observation: if n = 1, it is not science, as it has not been shown to be reproducible. You need a random sample of independent measurements.
Fundamental principle 2
Experimental design, at its simplest, is the art of varying one factor at a time while controlling others: an observed difference between two conditions can only be attributed to Factor A if that is the only factor differing between the two conditions. We always need to consider plausible alternative interpretations of an observed result. The differences observed in Fig 1 might only reflect differences between the two suspensions, or be due to some other (of the many) differences between the two individual mice, besides the particular genotypes of interest.
Fundamental principle 3
A conclusion can only apply to the population from which you took the random sample of independent measurements: so if we have multiple measures on a single suspension from one individual mouse, we can only draw a conclusion about that particular suspension from that particular mouse. If we have multiple measures of the activity of a single vial of cytokine, then we can only generalize our conclusion to that vial.
Fundamental principle 4
Although replicates cannot support inference on the main experimental questions, they do provide important quality controls of the conduct of experiments. Values from an outlying replicate can be omitted if a convincing explanation is found, although repeating part or all of the experiment is a safer strategy. Results from an independent sample, however, can only be left out in exceptional circumstances, and only if there are especially compelling reasons to justify doing so.
To be convincing, a scientific paper describing a new finding needs to provide evidence that the results are reproducible. While it might be argued that a hypothetical talking dog would represent an important scientific discovery even if n = 1, few people would be convinced if someone claimed to have a talking dog that had been observed on one occasion to speak a single word. Most people would require several words to be spoken, with a number of independent observers, on several occasions. The cloning of Dolly the sheep represented a scientific breakthrough, but she was one of five cloned sheep described by Campbell et al [ 1 ]. Eight fetuses and sheep were typed by microsatellite analysis and shown to be identical to the cell line used to provide the donor nuclei.
To be convincing, a scientific paper needs to provide evidence that the results are reproducible
Inferences can only be made about the population from which the independent samples were drawn. In our original experiment, we took individual replicate aliquots from the suspensions of bone marrow cells ( Fig 2A ). We can therefore only generalize our conclusions to the ‘population’ from which our sample aliquots came: in this case the population is that particular suspension of bone marrow cells. To test our hypothesis, it is necessary to carry out an experiment similar to that shown in Fig 2B . Here, bone marrow has been independently isolated from a random sample of WT mice and another random sample of Bdl −/− mice. In this case, we can draw conclusions about Bdl −/− mice in general, and compare them withWT mice (in general). In Fig 2A , the number of Bdl −/− mice that have been compared with WT mice (which is the comparison relevant to our hypothesis) is one, so n = 1, regardless of how many replicate plates are counted. Conversely, in Fig 2B we are comparing three Bdl −/− mice with WT controls, so n = 3, whether we plate three replicate plates of each type or 30. Note, however, that it is highly desirable for statistical reasons to have samples larger than n = 3, and/or to test the hypothesis by some other approach, for example, by using antibodies that block HH-CSF or BDL, or by re-expressing a Bdl cDNA in the Bdl −/− cells (see Fundamental principle 2 in Sidebar A ).
Sample variation. Variation between samples can be used to make inferences about the population from which the independent samples were drawn (red arrows). For replicates, as in ( A ), inferences can only be made about the bone marrow suspensions from which the aliquots were taken. In ( A ), we might be able to infer that the plates on the left and the right contained cells from different suspensions, and possibly that the bone marrow cells came from two different mice, but we cannot make any conclusions about the effects of the different genotypes of the mice. In ( B ), three independent mice were chosen from each genotype, so we can make inferences about all mice of that genotype. Note that in the experiments in ( B ), n = 3, no matter how many replicate plates are created.
One of the most commonly used methods to determine the abundance of mRNA is real-time quantitative reverse transcription PCR (qRT-PCR; although the following example applies equally well to an ELISA or similar). Typically, multi-well plates are used so that many samples can be simultaneously read in a PCR machine. Let us suppose we are going to use qRT-PCR to compare levels of Boojum mRNA ( Bjm ) in control bone marrow cells (treated with medium alone) with Bjm levels in bone marrow cells treated with HH-CSF, in order to test the hypothesis that HH-CSF induces expression of the Bjm gene.
We isolate bone marrow cells from a normal mouse, and dispense equal aliquots containing a million cells into each of two wells of a six-well plate. For the moment we use only two of the six wells. We then add 4 ml of plain medium to one of the wells (the control), and 4 ml of a mixture of medium supplemented with HH-CSF to the other well (the experimental well). We incubate the plate for 24 h and then transfer the cells into two tubes, in which we extract the RNA using TRizol. We then suspend the RNA in 50 μl TRIS-buffered RNAse-free water.
We put 10 μl from each tube into each of two fresh tubes, so that both Actin (as a control) and Bjm message can be determined in each sample. We now have four tubes, each with 10 μl of mRNA solution. We make two sets of ‘reaction mix’ with the only difference being that one contains Actin PCR primers and the other Bjm primers. We add 40 μl of one or the other ‘reaction mix’ to each of the four tubes, so we now have 50 μl in each tube. After mixing, we take three aliquots of 10 μl from each of the four tubes and put them into three wells of a 384-well plate, so that 12 wells in total contain the RT-PCR mix. We then put the plate into the thermocycler. After an hour, we get an Excel spreadsheet of results.
…should we dispense with replicates altogether? The answer, of course, is ‘no’. Replicates serve as internal quality checks on how the experiment was performed
We then calculate the ratio of the Bjm signal to the Actin signal for each of the three pairs of reactions that contained RNA from the HH-CSF-treated cells, and for each of the three pairs of control reactions. In this case, the variation among the three replicates will not be affected by sampling error (which was what caused most of the variation in colony number in the earlier bone marrow colony-forming assay), but will only reflect the fidelity with which the replicates were made, and perhaps some variation in the heating of the separate wells in the PCR machine. The three 10 μl aliquots each came from the same, single, mRNA preparation, so we can only make inferences about the contents of that particular tube. As in the previous example, in this case n still equals 1, and no inferences about the main experimental hypothesis can be made. The same would be true if each RNA sample were analysed in 10 or 100 wells; we are only comparing one control sample to one experimental sample, so n = 1 ( Fig 3A ). To draw a general inference about the effect of HH-CSF on Bjm expression, we would have to perform the experiment on several independent samples derived from independent cultures of HH-CSF-stimulated bone marrow cells ( Fig 3B ).
Means of replicates compared with means of independent samples. ( A ) The ratios of the three-replicate Bjm PCR reactions to the three-replicate Actin PCR reactions from the six aliquots of RNA from one culture of HH-CSF-stimulated cells and one culture of unstimulated cells are shown (filled squares). The means of the ratios are shown as columns. The close correlation of the three replicate values (blue lines) indicates that the replicates were created with high fidelity and the pipetting was consistent, but is not relevant to the hypothesis being tested. It is not appropriate to show P -values here, because n = 1. ( B ) The ratios of the replicate PCR reactions using mRNA from the other cultures (two unstimulated, and two treated with HH-CSF) are shown as triangles and circles. Note how the correlation between the replicates (that is, the groups of three shapes) is much greater than the correlation between the mean values for the three independent untreated cultures and the three independent HH-CSF-treated cultures (green lines). Error bars indicate SE of the ratios from the three independent cultures, not the replicates for any single culture. P > 0.05. SE, standard error.
For example, we could have put the bone marrow cells in all six wells of the tissue culture plate, and performed three independent cultures with HH-CSF, and three independent control cultures in medium without HH-CSF. mRNA could then have been extracted from the six cultures, and each split into six wells to measure Actin and Bjm mRNA levels by using qRT-PCR. In this case, 36 wells would have been read by the machine. If the experiment were performed this way, then n = 3, as there were three independent control cultures, and three independent HH-CSF-dependent cultures, that were testing our hypothesis that HH-CSF induces Bjm expression. We then might be able to generalize our conclusions about the effect of that vial of recombinant HH-CSF on expression of Bjm mRNA. However, in this case ( Fig 3B ) P > 0.05, so we cannot exclude the possibility that the differences observed were just due to chance, and that HH-CSF has no effect on Bjm mRNA expression. Note that we also cannot conclude that it has no effect; if P > 0.05, the only conclusion we can make is that we cannot make any conclusions. Had we calculated and shown errors and P -values for replicates in Fig 3A , we might have incorrectly concluded, and perhaps misled the readers to conclude that there was a statistically significant effect of HH-CSF in stimulating Bjm transcription (see Fundamental principle 3 in Sidebar A ).
Why bother with replicates at all? In the previous sections we have seen that replicates do not allow inferences to be made, or allow us to draw conclusions relevant to the hypothesis we are testing. So should we dispense with replicates altogether? The answer, of course, is ‘no’. Replicates serve as internal quality checks on how the experiment was performed. If, for example, in the experiment described in Table 1 and Fig 1 , one of the replicate plates with saline-treated WT bone marrow contained 100 colonies, you would immediately suspect that something was wrong. You could check the plate to see if it had been mislabelled. You might look at the colonies using a microscope and discover that they are actually contaminating colonies of yeast. Had you not made any replicates, it is possible you would not have realized that a mistake had occurred.
Replicates […] cannot be used to infer conclusions
Fig 4 shows the results of the same qRT-PCR experiment as in Fig 3 , but in this case, for one of the sets of triplicate PCR ratios there is much more variation than in the others. Furthermore, this large variation can be accounted for by just one value of the three replicates—that is, the uppermost circle in the graph. If you had results such as those in Fig 4A , you would look at the individual values for the Actin PCR and Bjm PCR for the replicate that had the strange result. If the Bjm PCR sample was unusually high, you could check the corresponding well in the PCR plate to see if it had the same volume as the other wells. Conversely, if the Actin PCR value was much lower than those for the other two replicates, on checking the well in the plate you might find that the volume was too low. Alternatively, the unusual results might have been due to accidentally adding two aliquots of RNA, or two of PCR primer-reaction mix. Or perhaps the pipette tip came loose, or there were crystals obscuring the optics, or the pipette had been blocked by some debris, etc., etc., etc. Replicates can thus alert you to aberrant results, so that you know when to look further and when to repeat the experiment. Replicates can act as an internal check of the fidelity with which the experiment was performed. They can alert you to problems with plumbing, leaks, optics, contamination, suspensions, mixing or mix-ups. But they cannot be used to infer conclusions.
Interpreting data from replicates. ( A ) Mean ± SE of three independent cultures each with ratios from triplicate PCR measurements. P > 0.05. This experiment is much like the one in Fig 3B . However, notice in this case, for one of the sets of replicates (the circles from one of the HH-CSF-treated replicate values), there is a much greater range than for the other five sets of triplicate values. Because replicates are carefully designed to be as similar to each other as possible, finding unexpected variation should prompt an investigation into what went wrong during the conduct of the experiment. Note how in this case, an increase in variation among one set of replicates causes a decrease in the SEs for the values for the independent HH-CSF results: the SE bars for the HH-CSF condition are shorter in Fig 4A than in Fig 3B . Failure to take note of abnormal variation in replicates can lead to incorrect statistical inferences. ( B ) Bjm mRNA levels (relative to Actin ) for three independent cultures each with ratios from triplicate PCR measurements. Means are shown by a horizontal line. The data here are the same as those for Fig 3B or Fig 4A with the aberrant value deleted. When n is as small as 3, it is better to just plot the data points, rather than showing statistics. SE, standard error.
Because replicate values are not relevant to the hypothesis being tested, they—and statistics derived from them—should not be shown in figures. In Fig 4B , the large dots show the means of the replicate values in Fig 4A , after the aberrant replicate value has been excluded. While in this figure you could plot the means and SEs of the mRNA results from the three independent medium- and HH-CSF-treated cultures, in this case, the independent values are plotted and no error bars are shown. When the number of independent data points is low, and they can easily be seen when plotted on the graph, we recommend simply doing this, rather than showing means and error bars.
What should we look for when reading papers? Although replicates can be a valuable internal control to monitor the performance of your experiments, there is no point in showing them in the figures in publications because the statistics from replicates are not relevant to the hypothesis being tested. Indeed, if statistics, error bars and P -values for replicates are shown, they can mislead the readers of a paper who assume that they are relevant to the paper's conclusions. The corollary of this is that if you are reading a paper and see a figure in which the error bars—whether standard deviation, SE or CI—are unusually small, it might alert you that they come from replicates rather than independent samples. You should carefully scrutinize the figure legend to determine whether the statistics come from replicates or independent experiments. If the legend does not state what the error bars are, what n is, or whether the results come from replicates or independent samples, ask yourself whether these omissions undermine the paper, or whether some knowledge can still be gained from reading it.
…if statistics, error bars and P -values for replicates are shown, they can mislead the readers of a paper who assume that they are relevant to the paper’s conclusions
You should also be sceptical if the figure contains data from only a single experiment with statistics for replicates, because in this case, n = 1, and no valid conclusions can be made, even if the authors state that the results were ‘representative’—if the authors had more data, they should have included them in the published results (see Sidebar B for a checklist of what to look for). If you wish to see more examples of what not to do, search the Internet for the phrases ‘SD of one representative’, ‘SE of one representative’, ‘SEM of one representative’, ‘SD of replicates’ or ‘SEM of replicates’.
Sidebar B | Error checklist when reading papers
- If error bars are shown, are they described in the legend?
- If statistics or error bars are shown, is n stated?
- If the standard deviations (SDs) are less than 10%, do the results come from replicates?
- If the SDs of a binomial distribution are consistently less than √( np (1 – p ))—where n is sample size and P is the probability—are the data too good to be true?
- If the SDs of a Poisson distribution are consistently less than √(mean), are the data too good to be true?
- If the statistics come from replicates, or from a single ‘representative’ experiment, consider whether the experiments offer strong support for the conclusions.
- If P -values are shown for replicates or a single ‘representative’ experiment, consider whether the experiments offer strong support for the conclusions.
David L. Vaux
Acknowledgments
This work was made possible through Victorian State Government Operational Infrastructure Support, and Australian Government NHMRC IRIISS and NHMRC grants 461221and 433063.
The authors declare that they have no conflict of interest.
- Campbell KH, McWhir J, Ritchie WA, Wilmut I (1996) Sheep cloned by nuclear transfer from a cultured cell line . Nature 380 : 64–66 [ PubMed ] [ Google Scholar ]
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 12 August 2024
Variant-proof high affinity ACE2 antagonist limits SARS-CoV-2 replication in upper and lower airways
- Matthew Gagne 1 ,
- Barbara J. Flynn 1 ,
- Christopher Cole Honeycutt 1 ,
- Dillon R. Flebbe 1 ,
- Shayne F. Andrew ORCID: orcid.org/0000-0001-7226-7757 1 ,
- Samantha J. Provost ORCID: orcid.org/0009-0008-4413-5923 1 ,
- Lauren McCormick ORCID: orcid.org/0000-0003-4928-3008 1 ,
- Alex Van Ry ORCID: orcid.org/0000-0002-1240-9044 2 ,
- Elizabeth McCarthy 1 nAff10 ,
- John-Paul M. Todd 1 ,
- Saran Bao 1 ,
- I-Ting Teng 1 ,
- Shir Marciano ORCID: orcid.org/0000-0001-8506-9549 3 ,
- Yinon Rudich ORCID: orcid.org/0000-0003-3149-0201 4 ,
- Chunlin Li 4 ,
- Shilpi Jain 5 , 6 , 7 ,
- Bushra Wali 5 , 6 , 7 ,
- Laurent Pessaint 2 ,
- Alan Dodson 2 ,
- Anthony Cook 2 ,
- Mark G. Lewis ORCID: orcid.org/0000-0001-7852-0135 2 ,
- Hanne Andersen ORCID: orcid.org/0000-0003-1103-9608 2 ,
- Jiří Zahradník ORCID: orcid.org/0000-0002-8698-4236 3 ,
- Mehul S. Suthar ORCID: orcid.org/0000-0002-2686-8380 5 , 6 , 7 , 8 ,
- Martha C. Nason ORCID: orcid.org/0000-0002-0110-881X 9 ,
- Kathryn E. Foulds ORCID: orcid.org/0000-0003-4418-6495 1 ,
- Peter D. Kwong ORCID: orcid.org/0000-0003-3560-232X 1 ,
- Mario Roederer 1 ,
- Gideon Schreiber ORCID: orcid.org/0000-0002-2922-5882 3 ,
- Robert A. Seder ORCID: orcid.org/0000-0003-3133-0849 1 &
- Daniel C. Douek ORCID: orcid.org/0000-0001-5575-8634 1
Nature Communications volume 15 , Article number: 6894 ( 2024 ) Cite this article
406 Accesses
43 Altmetric
Metrics details
- Applied immunology
- Drug discovery
- Mucosal immunology
- Viral infection
SARS-CoV-2 has the capacity to evolve mutations that escape vaccine- and infection-acquired immunity and antiviral drugs. A variant-agnostic therapeutic agent that protects against severe disease without putting selective pressure on the virus would thus be a valuable biomedical tool that would maintain its efficacy despite the ongoing emergence of new variants. Here, we challenge male rhesus macaques with SARS-CoV-2 Delta—the most pathogenic variant in a highly susceptible animal model. At the time of challenge, we also treat the macaques with aerosolized RBD-62, a protein developed through multiple rounds of in vitro evolution of SARS-CoV-2 RBD to acquire 1000-fold enhanced ACE2 binding affinity. RBD-62 treatment equivalently suppresses virus replication in both upper and lower airways, a phenomenon not previously observed with clinically approved vaccines. Importantly, RBD-62 does not block the development of virus-specific T- and B-cell responses and does not elicit anti-drug immunity. These data provide proof-of-concept that RBD-62 can prevent severe disease from a highly virulent variant.
Similar content being viewed by others
Engineered ACE2 receptor therapy overcomes mutational escape of SARS-CoV-2
The trispecific DARPin ensovibep inhibits diverse SARS-CoV-2 variants
Resilience of S309 and AZD7442 monoclonal antibody treatments against infection by SARS-CoV-2 Omicron lineage strains
Introduction.
SARS-CoV-2 variants of concern (VOC) including B.1.351 (Beta), B.1.617.2 (Delta) and the currently circulating sublineages of B.1.1.529 (Omicron) have acquired mutations that enable substantial escape from neutralizing antibodies in convalescent or vaccine sera 1 , 2 , 3 , 4 , 5 , 6 , 7 . Efficacy against severe disease after two doses of mRNA COVID-19 vaccines has declined from ~100% in clinical trials conducted at a time when ancestral strains were predominantly in circulation to 60–80% during the Omicron BA.1 wave 8 , 9 , 10 , 11 , 12 , a result of both waning antibody titers and virus-acquired mutations. Boosting can restore protective efficacy but the benefit of boosting beyond a third dose is unclear 13 , 14 , 15 and accumulating evidence suggests that antigenic imprinting may offset the benefit of variant-matched boosts 16 , 17 , 18 , 19 , 20 , 21 .
Anti-viral therapeutic agents can reduce the effects of severe COVID-19 in individuals with or without prior immunity. Two drugs granted emergency use authorization by the FDA include Merck’s molnupiravir (Lagevrio) and Pfizer’s nirmatrelvir/ritonavir (Paxlovid). Interim data indicated that molnupiravir, a cytidine analog prodrug, reduced hospitalizations from COVID-19 by about 50% but further analysis has suggested a lower efficacy 22 . Nirmatrelvir/ritonavir has demonstrated substantial clinical efficacy, with an 89% decline in severe disease 23 . However, nirmatrelvir/ritonavir, which functions by inhibiting the main protease, is not routinely prescribed for the treatment of COVID-19 24 , 25 . Furthermore, the emergence of drug-resistant mutations in the virus remains a possibility; while some have already been detected in people, no widely circulating variants currently demonstrate this capacity 26 , 27 , 28 , 29 . Nonetheless, concerns regarding virus escape and the possibility for reduced anti-viral potency against emerging SARS-CoV-2 strains continue to elicit significant interest in the development of more effective protease inhibitors 30 , 31 .
Consequently, there remains an urgent need for the development of additional therapeutic agents that reduce severe disease, particularly those that act in a variant-agnostic manner; that is, without directly targeting the virus. Host-targeted approaches could maintain their efficacy as new variants emerge even if they were so divergent from the ancestral strains such that anti-viral drugs or prior immunity were rendered ineffective. We have previously described the development of an in vitro mutated SARS-CoV-2 receptor-binding domain (RBD) that displays greatly enhanced binding to the virus target receptor, angiotensin-converting enzyme 2 (ACE2), without inhibiting its natural enzymatic activity (Supplementary Table 1 ). Wildtype RBD was exposed to successive iterations of error-prone PCR which allowed for the selection of proteins capable of binding increasingly lower concentrations of ACE2 followed by pre-equilibrium selection to obtain faster association. The final product, termed RBD-62, has a binding affinity for ACE2 of 16 pM, an increase of 1000-fold compared to the wildtype Wuhan-Hu-1 (WT) RBD, which has a binding affinity of 1700 pM 32 . In vitro models demonstrated its capacity to block infection of cell lines with a half-maximal inhibitory concentration (IC 50 ) of 18 pM against the Beta variant. RBD-62 treatment of Syrian hamsters through inhalation at the time of infection with the ancestral strain USA-WA1/2020 (WA1) also resulted in protection against weight loss. Protection against severe disease in a model system more closely approximating humans or in the context of infection from a VOC that has been shown to result in substantial morbidity and mortality has previously not been established.
Here, we infected rhesus macaques with Delta which is the most pathogenic variant tested to date in these animals. Macaques were treated immediately prior to challenge and every 24 h for the next 5 days with RBD-62 administered to the airways via aerosolization. Protection was measured via titers of culturable virus and subgenomic virus RNA (sgRNA). We also analyzed mucosal and serum immune responses to Delta-specific antigens as well as to RBD-62 itself to assess any anti-drug antibody (ADA) responses.
RBD-62 inhibits binding between the SARS-CoV-2 spike (S) and ACE2 in a variant-agnostic manner
Efficacies for COVID-19 therapies and vaccines have declined in the context of emerging variants, which has limited the long-term applicability of results established in initial clinical and pre-clinical research. To determine if data gathered on RBD-62 from a challenge model with an ancestral variant could be extrapolated to current and future emerging strains, we first used an in vitro assay to examine the ability of RBD-62 to block binding between ACE2 and S from a panel of different variants.
In contrast to unmutated WA1 RBD, which inhibited binding of both WA1 S and Delta S with an IC 50 of ~330 ng/mL, RBD-62 was nearly 100 times more potent with IC 50 values of ~4.5 ng/mL (Fig. 1a, b ). Further, RBD-62 blocked binding between ACE2 and BA.1 S at almost the same concentration, and the IC 50 for Beta S was only modestly higher at 6 ng/mL (Fig. 1c, d ). Strikingly, IC 90 values for RBD-62 against all variants were <40 ng/mL, whereas we were unable to achieve 90% binding inhibition for any variant using WA1 RBD as the inhibitor at any of our tested concentrations.
SARS-CoV-2 S from WA1 ( a ), Delta ( b ), Beta ( c ) and BA.1 ( d ) were mixed with soluble ACE2 in combination with indicated concentrations of RBD-62, RBD from WA1 or an irrelevant malaria protein (PfCSP) to determine percentage binding inhibition relative to maximum binding without inclusion of inhibitor. Icons represent the average inhibition of duplicate technical replicates at each indicated dilution. IC 50 and IC 90 values (ng/mL) are indicated to the right of each graph and, along with the curves presented within the graphs, were calculated using the nonlinear regression analysis tool in Prism. The dotted lines indicate background inhibition observed for PfCSP. Source data are provided as a Source Data file.
We further characterized the ability of RBD-62 to directly inhibit infection of VeroE6-TMPRSS2 cells with authentic WA1 and BA.1 viruses in addition to the more recently circulating Omicron sublineages BA.5, XBB.1.5 and JN.1. Similar to our findings on ACE2 binding, RBD-62 blocked in vitro infection at 100-fold lower concentrations compared to ancestral WA1 RBD, although the amount of protein necessary to achieve 50% inhibition was greater for infection than for ACE2 binding (Supplementary Fig. 1 ). As the potency of RBD-62 to block the ability of the virus to engage its receptor and infect target cells was strongly preserved across variants despite large differences in S sequence and binding affinity, we proceeded with in vivo evaluation of this product in a challenge model with Delta which, in our experience, is the most virulent variant and replicates at a high titer in the upper and lower airways of rhesus macaques.
RBD-62 protects rhesus macaques from Delta replication in the upper and lower airways
We delivered 2.5 mg RBD-62 to eight rhesus macaques as an aerosol using the PARI eFlow® nebulizer as described previously 33 to target the drug to both the upper and lower airways. In addition, a further eight macaques received aerosolized PBS control. Both groups were challenged 1 h later with Delta at a dose of 2 × 10 5 median tissue culture infectious dose (TCID 50 ). Primates continued to receive the same dose of aerosolized RBD-62 or PBS once per day for the next 5 days at which point treatment was stopped so that we could track the kinetics of virus rebound. Nasal swabs (NS) and bronchoalveolar lavage (BAL) were collected on days 2, 4, 7, 9 and 14, and RNA was isolated for detection of virus replication by PCR for sgRNA encoding for the virus nucleocapsid (N) transcript (Supplementary Fig. 2 ).
We observed a significant decrease in virus sgRNA copies in the lungs on day 2 with a geometric mean of 2.3 × 10 5 copies/mL BAL in the RBD-62 treatment group and 5.8 × 10 7 copies in the PBS control group ( P = 0.0031). Likewise, RNA copies in the nose on day 2 were similar to the BAL, with geomeans of 2.8 × 10 5 copies/swab in the treatment group and 4.5 × 10 7 in the control group ( P = 0.0093) (Fig. 2a and Supplementary Table 2 ). Interestingly, the protective effect was no longer significant in either the nose or the lungs by day 7, which was the first collection timepoint after the cessation of RBD-62 treatment ( P > 0.05). Nonetheless, peak sgRNA copy numbers in the RBD-62 cohort were lower than in the control primates. For instance, while 4/8 control NHP had peak copy numbers >10 8 in either the lungs or the nose, virus titers never reached that level in any of the RBD-62-treated animals. Further, virus was cleared from the lower airway of all animals in the RBD-62 group by day 14 post challenge whereas half of the control group still had detectable sgRNA at that timepoint.
NHP ( n = 8/group) were challenged with 2 × 10 5 TCID 50 Delta and simultaneously treated with RBD-62 (green circles) or PBS (gray circles). a Subgenomic RNA encoding for N transcript was measured in the upper and lower airways at days 2, 4, 7, 9 and 14 post challenge. b Culturable virus was measured in the upper and lower airways at days 2 and 4 post challenge. Dotted lines indicate the assay limit of detection (LOD). Circles, boxes and horizontal lines represent individual animals, interquartile range and median, respectively, while minima and maxima are denoted at whisker termini. Statistical analyses were shown for comparison of groups at each timepoint and were performed using the Wilcoxon rank-sum test (two-sided) after Holm’s adjustment across timepoints. NS denotes that the indicated comparison was not significant, with P > 0.05. See also Supplementary Tables 2 and 3 for complete statistical analyses. Source data are provided as a Source Data file.
We also measured culturable virus via TCID 50 which could indicate the potential for transmissibility (Fig. 2b and Supplementary Table 3 ). On day 2, the RBD-62 group had significantly less culturable virus than control animals, with geometric mean TCID 50 values in the lungs of 1.1 × 10 4 and 2.2 × 10 6 , respectively ( P = 0.0026). Culturable virus was also reduced in the upper airway, with TCID 50 of 3.6 × 10 5 for the RBD-62 group and 1.9 × 10 7 for the control group ( P = 0.0404). As we kept all animals alive for 2 weeks to enable longitudinal analysis of virus clearance, we were not able to measure pathology or clearly identify virus antigens in tissues, which would have been most evident shortly after the challenge.
Treatment with RBD-62 does not inhibit the induction of anti-SARS-CoV-2 humoral immunity
It is conceivable that the use of an ACE2-binding inhibitor during acute SARS-CoV-2 infection could prevent the formation of a primary or secondary immune response to the virus which would have been beneficial in the context of a future exposure. To test this hypothesis, we measured serum and mucosal IgG binding titers to a panel of variant RBDs, including WT, Delta and Omicron BA.1. At day 14 following the challenge, titers to the Delta challenge stock were greater than either of the other strains for both the treated and untreated animals, indicative of a primary response. Geometric mean titers (GMT) to Delta rose from a baseline of 5 × 10 1 –2 × 10 9 area under the curve (AUC) in the serum of the RBD-62 group by day 14 (Fig. 3a ). While we observed a similar increase in GMT of control NHP by day 14, from 2 × 10 2 to 2 × 10 10 AUC, the kinetics were faster with evidence of a primary response as early as day 9. We next confirmed that there was a differential treatment effect across the entire 14-day time course, which would indicate a blunted primary response due to a reduction in virus antigen in the RBD-62 group compared to the untreated group (Supplementary Table 4 ). Indeed, there was a significant treatment effect, with P = 0.0132. Likewise, mucosal binding titers to Delta RBD were higher on day 14 in the control primates, with GMT of 3 × 10 10 in the lungs and 2 × 10 8 in the nose as compared to 4 × 10 7 and 2 × 10 7 in the treated NHP respectively ( P = 0.0001 in lungs and 0.0044 in nose) (Fig. 3b, c ). Despite the attenuated response in the treated group, which reflected greater virus control by these animals, RBD-62 administration did not preclude seroconversion.
NHP ( n = 8/group) were challenged with 2 × 10 5 TCID 50 Delta and simultaneously treated with RBD-62 (green circles) or PBS (gray circles). IgG binding titers were measured to wildtype, Delta and BA.1 RBD in a serum, b BAL and c nasal wash 1 month prior to challenge (pre-challenge) and on days 2, 4, 7, 9 and 14 post challenge. Serum was initially diluted 1:100 and then serially diluted 1:4. BAL and NW samples were initially diluted 1:5 and then serially diluted 1:5. Circles, boxes and horizontal lines represent individual animals, interquartile range and median, respectively, while minima and maxima are denoted at whisker termini. P values annotated on plots can be used to assess the statistical significance of a drug-specific treatment effect (difference between RBD-62 and control groups), based on two-sided generalized estimating equation (GEE) modeling, which included titers across all post challenge timepoints. No adjustments were made for multiple comparisons. NS denotes that the indicated comparison was not significant, with P > 0.05. See also Supplementary Table 4 for complete statistical analyses. Source data are provided as a Source Data file.
Although treated animals developed antibodies to variant RBDs, it is possible that this response largely reflected or was enhanced by anti-drug immunity as RBD-62 is itself derived from WT RBD. To confirm that the immune response was not limited only to RBD, we quantified binding to various proteins and domains including RBD, whole S and nucleocapsid (N). Further, as we focused our analysis on WT protein, we reported responses using the WHO-determined standard antibody units for WT virus. We detected evidence of a primary response to RBD and S in the serum and BAL of control animals by day 9, while binding titers in the treated animals were not noticeably greater than background until day 14 (Supplementary Fig. 3a, b ). Similarly, animals in the RBD-62-treated group mounted primary responses to WT nucleocapsid but with delayed kinetics and decreased magnitude as compared to the control cohort. On day 14, anti-nucleocapsid GMT reached 11.4 antibody units per mL (BAU/mL) in the serum and 0.1 in the BAL of controls compared to 3.5 and 0.01 in the RBD-62 group. Titers in the nasal wash (NW) were markedly lower for both groups than in the BAL or serum (Supplementary Fig. 3c ).
To further explore the effect of RBD-62 on the development of mucosal responses, we measured IgA binding titers to the aforementioned proteins (Supplementary Fig. 4 ). In agreement with our findings on IgG, the kinetics of the IgA response were faster in controls than in treated animals, with evidence of increased titers to RBD and S in the BAL of the controls by day 9 compared to day 14 in the RBD-62 group. However, we were nonetheless able to detect the binding of IgA to both RBD and S in the BAL and NW of the RBD-62 group. Mucosal IgA responses to N were not clearly above background for either group of animals. Together with the data on mucosal IgG responses, this would suggest that a future mucosal vaccine boost would likely not be affected by prior RBD-62 treatment.
We have previously used the ACE2–RBD binding inhibition assay as a surrogate for neutralization in the mucosa 16 , 34 , 35 . Here, we were able to detect inhibition of WT and Delta variants in the BAL of both control and treated groups (Supplementary Fig. 5a ). We calculated that the median inhibitory capacity of BAL antibodies from the RBD-62 group was 6% against WT RBD and 8% against Delta RBD. Omicron BA.1 RBD–ACE2 binding inhibition was not detected, likely due to the divergence between the challenge stock and BA.1. NW inhibitory antibodies were not detectable in either group of animals (Supplementary Fig. 5b ).
Treatment with RBD-62 does not inhibit the induction of anti-SARS-CoV-2 T cell responses
T cell epitopes present within SARS-CoV-2 are highly conserved 16 , 36 , suggesting that while newly emerging variants may continue to escape humoral immune responses, protection arising from T cell immunity may still be preserved. Thus, we next measured T cell responses to WT S peptides to determine if the administration of RBD-62 would interfere with their induction (Supplementary Fig. 6 ). Again, the kinetics of this response were faster in the control animals, with measurable increases in T H 1 and CD40L + T FH responses in the periphery by day 7 compared to day 9 in the treated group (Fig. 4a–d ). S-specific T H 1 responses reached a median frequency of 0.2% in the controls and 0.1% in the treated NHP by day 14 although the difference was greater in the BAL (Fig. 4f ). We did not detect T H 2 responses in either the circulation or BAL (Fig. 4b, g ). While we were able to detect some S-specific CD8 + T cells in the circulation of both groups, responses were much higher in the lungs—the primary site of virus replication—with median frequencies of 0.4% in both groups (Fig. 4c, h ). We were also able to detect T cell responses to N in the circulation of the RBD-62 group (Supplementary Fig. 7 ).
NHP ( n = 8/group) were challenged with 2 × 10 5 TCID 50 Delta and simultaneously treated with RBD-62 (green circles) or PBS (gray circles). a – e Peripheral blood mononuclear cells (PBMC) or f – h lymphocytes from BAL were collected prior to challenge (immediately preceding challenge for PBMC and 1 month pre-challenge for BAL) and on days 2, 4, 7, 9 and 14 post challenge. Cells were stimulated with WA1 S1 and S2 peptide pools and responses were measured by intracellular cytokine staining (ICS). a , f Percentage of memory CD4 + T cells expressing T H 1 markers (IL-2, TNF or IFNγ). b , g Percentage of memory CD4 + T cells expressing T H 2 markers (IL-4 or IL-13). c , h Percentage of memory CD8 + T cells expressing IL-2, TNF or IFNγ. d , e Percentage of T FH cells expressing CD40L or IL-21, respectively. Dotted lines set at 0%. Reported percentages may be negative due to background subtraction and may extend beyond the lower range of the y -axis. Circles, boxes and horizontal lines represent individual animals, interquartile range and median, respectively, while minima and maxima are denoted at whisker termini. Due to pre-specified minimum cell numbers per sample required for analysis, some timepoints include data from <8 NHP/group. Data are provided as a Source Data file.
RBD-62 administration does not impair B-cell memory to SARS-CoV-2 and does not elicit anti-drug immunity
Memory B cells are essential for mounting secondary responses upon boosting or reinfection and, together with long-lived plasma cells, form the basis of long-term humoral immunity 37 , 38 , 39 , 40 . Due to imprinting, the initial exposure to virus establishes B-cell antigen specificity and determines the capacity of the immune system to recognize novel variants 16 , 17 , 18 , 19 , 20 , 21 . We therefore collected memory B cells from the peripheral circulation on day 14 following challenge and measured binding to pairs of fluorescently labeled variant S-2P probes (Supplementary Fig. 8 ). Control and RBD-62-treated animals displayed almost identical patterns (Fig. 5a, b ); out of all Delta and/or WA1-binding memory B cells in the controls, a geometric mean frequency of 42% bound to Delta alone compared to 49% in the treated animals. In the controls, 55% of the population was cross-reactive and capable of binding both S compared to 46% in the RBD-62 group. When examining the pool of memory B cells capable of binding to Delta and/or BA.1, only 37% and 35% were dual-specific in the controls and the RBD-62 group, respectively. This smaller fraction of BA.1/Delta-specific cross-reactive cells as compared to the WA1/Delta-specific cross-reactive pool is likely due to the reduced number of shared epitopes between BA.1 and Delta. We also measured the frequency of S-specificity among all memory B cells (Supplementary Fig. 9 ). As most recently circulating SARS-CoV-2 strains are within the Omicron lineage, any limitation that RBD-62 treatment would place on the development of BA.1-binding B-cell responses would be especially concerning. However, the frequencies of BA.1/Delta-specific cross-reactive memory B cells were similar between the two cohorts with geometric mean frequencies of 0.14% in the controls and 0.13% in the treatment group (Supplementary Fig. 9b ).
NHP were challenged with 2 × 10 5 TCID 50 Delta and simultaneously treated with RBD-62 or PBS. Memory B-cell specificity was determined at day 14 post challenge via binding to fluorochrome-labeled variant probe pairs as indicated in figure legends. Probe pairs include WA1 and Delta S-2P, Delta and BA.1 S-2P, and Delta S-2P and RBD-62. a Representative flow cytometry graphs for one animal in the control group (left) or treated group (right). Event frequencies denote the proportion of probe-binding cells within the total class-switched memory B-cell population. Cross-reactive memory B cells are represented by events in the top right quadrant whereas single-positive memory B cells reside in the top left or bottom right quadrants. b Pie charts indicating the geometric mean frequency of the entire S-specific memory B-cell compartment capable of binding to both members of a variant probe pair (dark gray) or a single variant within the pair (light gray or black) at day 14 post challenge. The control group is displayed on the left and the treated group is displayed on the right. n = 8 for treated group and n = 4 for control group. Source data are provided as a Source Data file.
We next expanded our analysis of memory B-cell binding specificities to include recognition of RBD-62. Two weeks following the challenge and initiation of RBD-62 treatment, only 8% of memory B cells in the treatment group with specificities for RBD-62 and/or Delta bound to RBD-62. This was not meaningfully different from the control group (Fig. 5 ). Further, as a frequency of total memory B cells, the RBD-62-binding population following treatment was hard to distinguish from the pre-challenge background staining (Supplementary Fig. 9c ).
As newly emerging SARS-CoV-2 variants continue to evolve while evading prior immunity, acquiring mutations which could render existing anti-viral drugs ineffective and even potentially increasing their affinity for ACE2 2 , 3 , 4 , 5 , 6 , 7 , 27 , 28 , 41 , 42 , 43 , 44 , 45 , it is essential to develop new treatments for COVID-19 which are variant-agnostic. However, to the best of our knowledge, no host-targeting treatments that are designed specifically to reduce virus load have been tested in clinical trials or within an advanced pre-clinical challenge model such as nonhuman primates. Here we describe the in vivo validation in rhesus macaques of RBD-62, a therapeutic agent which was designed through in vitro evolution to outcompete SARS-CoV-2 RBD for binding to human ACE2 while not interfering with the receptor’s natural enzymatic activity. Targeting the host receptor rather than the virus itself has several benefits including the avoidance of selective pressure and the ability to retain efficacy despite the emergence of new variants, which are unlikely to achieve the 1000-fold increase in ACE2-binding affinity needed to gain a competitive advantage over RBD-62. Moreover, optimizing aerosol delivery of this protein 33 promotes specific targeting of the respiratory track, including the lungs, achieving a significant reduction in virus replication for the duration of treatment. Importantly, after drug delivery was terminated, virus titers remained lower in the BAL. At the therapeutic dosage used here, RBD-62 did not inhibit the induction of serum or mucosal IgG or IgA responses to the challenge virus or other variants tested. T cell and B-cell immunity were also preserved with no evidence of anti-drug immunity which would have precluded the reuse of this therapeutic agent upon subsequent reinfection.
The benefits of an ACE2 antagonist could theoretically extend to other host-targeting approaches including anti-ACE2 antibodies 46 , 47 and ACE2 decoys that bind to RBD 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 . It is noteworthy that chiropteran ACE2 functions as a host receptor for NeoCoV and PDF-2180, two merbecoviruses which have so far been restricted to transmission within bats 59 . Continued work on developing biomolecules such as RBD-62 that can block the interaction between SARS-CoV-2 S and human ACE2 has potential benefits not only in this current pandemic but also in our continued vigilance against potential spillover events.
Advancement of this therapeutic agent, or similar host-targeting drugs, into clinical trials would require further optimization of dosage. It is possible that higher doses of RBD-62, or a longer duration of treatment, would have further suppressed virus replication, maintaining a significant protective effect until complete clearance. However, any advantage provided by increasing the amount of RBD-62 would have to be balanced with the possibility of inducing ADA responses.
It has not escaped our notice that the loss of protection following treatment cessation coincided with a delayed primary immune response as indicated by slower kinetics of measurable IgG and IgA titers as well as ACE2-binding inhibitory antibodies and T cell responses. This is likely due to the low levels of virus antigen resulting from RBD-62 treatment, a conclusion supported by a recent publication on the muted primary response arising during nirmatrelvir treatment 60 and could inform our understanding of the mechanisms contributing to virus rebound following cessation of anti-viral treatment. This would suggest that an increase in the amount of antigen available to elicit an immune response without a commensurate increase in virus replication may be beneficial during RBD-62 treatment. Thus, one approach that may be worth exploring would be vaccination at the time of treatment. Regarding the potential for rebound following RBD-62 treatment, it is notable that the muted primary response did not delay virus clearance as compared to control animals. However, the risk of rebound may be enhanced in cases where treatment is stopped prematurely or if patients are immunocompromised.
Additionally, the protective effect of RBD-62 was striking in that it was observed in both the nose and the lungs. Indeed, our previous findings using mRNA vaccines delivered intramuscularly to nonhuman primates have shown that protection is often either delayed or absent in the upper airway, likely due to the higher threshold of antibodies required for virus suppression in the nose as compared to the lungs 16 , 34 , 35 , 61 , 62 . Aerosolization of RBD-62 through the PARI nebulizer enables efficient delivery to both the lungs and the nose, highlighting the potential for this medication not only to be administered in a hospital setting to reduce the severity of disease but also to block infection. Indeed, RBD-62 could be employed as a preventative agent for healthcare workers or immunocompromised individuals who cohabitate with an infected individual. In the case of healthcare personnel, preventative treatment could be administered immediately preceding and after attendance to SARS-CoV-2-infected patients when the risk of exposure would be high, mimicking the treatment course described in our study. In the context of widespread immunity from prior infections and the global vaccination campaign, any impact that the use of RBD-62 might have on transmission reduction could accelerate the transition of the current pandemic into an endemic phase.
Limitations of the study
Although the treatment course that we have designed here may be feasible in a clinical setting, it would have limited applicability to exposure outside of the hospital. Thus, further characterization of RBD-62 as a post-exposure therapeutic agent, as well as modifications to extend protein half-life enabling its utilization as a preventative agent, is warranted. Second, as we have not rechallenged these animals, we cannot definitively determine the impact of the reduction in the magnitude of the primary response on the prevention of reinfection. Nevertheless, it is noteworthy that differences in the immune response between the control and RBD-62-treated NHP are primarily at the level of antibody titers while memory B-cell frequencies and variant specificities are largely preserved. This would suggest that any negative effect on immunity would be primarily at the level of protection against breakthrough infection, rather than severe disease, as anamnestic responses are sufficient to protect the lungs even when circulating antibody titers are suboptimal 35 . Further, it is likely that exposure to SARS-CoV-2 during RBD-62 administration would also occur in the context of multiple previous exposures to vaccine and/or virus, rendering any potential drug-derived impact on immune responses negligible.
The finding that RBD-62 maintains its potency against a panel of different variants stands in stark contrast to the declining efficacy of currently approved vaccines and previously authorized monoclonal antibodies. Indeed, there is no clinically available monoclonal antibody with the capacity to neutralize the currently circulating Omicron sublineages 63 , 64 , 65 , 66 , 67 . Vaccine development is predicated on predictions of which strains will be dominant at a future time. This can result in both significant lag times between the identification of new strains with transmission advantages and authorization of variant-matched vaccine boosts and also in a mismatch between the vaccine immunogen and the currently circulating variant. As an alternative approach, we have described a rationally designed therapeutic agent which can be used to treat or prevent COVID-19 regardless of future SARS-CoV-2 evolution.
Experimental design
All experiments were conducted according to the National Institutes of Health (NIH) standards on the humane care and use of laboratory animals, and all procedures were approved by and conducted in accordance with regulations of the Animal Care and Use Committees of the NIH Vaccine Research Center (VRC) and BIOQUAL, Inc. (Rockville, MD). Animals were housed and cared for in accordance with local, state, federal and institute policies in facilities accredited by the American Association for Accreditation of Laboratory Animal Care, under standards established in the Animal Welfare Act and the Guide for the Care and Use of Laboratory Animals. Animals were housed in ABSL-2 conditions before challenge. Up to a week prior to (for acclimation) and during the challenge phase of the study, animals were housed in ABSL-3 conditions, per Bioqual facility standard operating procedures.
Four- to seven-year-old male Indian-origin rhesus macaques ( Macaca mulatta ) from a VRC-owned resource colony were sorted into two groups of 8 NHP based on age and weight. RBD-62 formulated in gelatin was administered to one group at the time of Delta challenge while PBS formulated in gelatin was administered to the other group. Animals were challenged with 2 × 10 5 TCID 50 of Delta (BEI, NR-56116). 1.5 × 10 5 TCID 50 was administered via intratracheal route, and 0.25 × 10 5 TCID 50 was administered intranasally to each nostril.
Production and purification of RBD-62
The RBD-62 protein was produced in several batches to a total of 4.6L Expi293F cells (Thermo Fisher) by transient transfection of the pCAGGS plasmid 32 . Plasmid DNA purified by NucleoBond Xtra Midi kit (Macherey-Nagel) was transfected using ExpiFectamine 293 Transfection Kit (Thermo Fisher) according to the manufacturer’s protocol. Media was collected 72–96 h post transfection, when the cell viability decreased to 50%, by centrifugation (500 × g for 15 min). The media was clarified by filtration through a 0.45 µm Nalgene (Thermo Fisher) filter and loaded on two 5 ml HisTrap Fast Flow columns (Cytivia) connected in series using ÄKTA pure system (Cytivia). The column was washed with five column volumes of PBS 20 mM imidazole pH 7.4 and eluted by step elution of 60% elution buffer PBS, 500 mM imidazole pH 7.4. Eluted protein was concentrated by Amicon Ultra Centrifugal Filter Units, MWCO 3 kDa (Merck Millipore Ltd.), and uploaded onto Superdex 200 16/600 (Cytiva) preequilibrated in PBS. The purity and quality of the eluted protein were analyzed by SDS–PAGE and Tycho (Nanotemper), respectively.
In vitro RBD-62 inhibition of S–ACE2 binding
Inhibitors including RBD-62, WA1 RBD (VRC, NIH) and truncated Plasmodium falciparum circumsporozoite protein 5/3_SAmut (Robert Seder, VRC, NIH) were chosen for comparison due to similar molecular weight (~25–27 kDa). Both RBD-62 and WA1 RBD were biotinylated via AviTag. 5/3_Samut 68 sequence is available on GenBank (ID: MT891178.1). All proteins were diluted to 5 μg/mL and then serially diluted fivefold. ACE2-binding inhibition assay was performed with V-Plex SARS-CoV-2 Panel 23 (ACE2) Kit (MSD) per manufacturer’s instructions. Plates were read on MSD Sector S 600 instrument. All samples run in duplicate and normalized to the average luminescent units measured for each variant without the addition of inhibitor, with the average inhibition at each dilution indicated by the icons. IC 50 values (ng/mL) were calculated via the [Agonist] vs. normalized response—variable slope equation within the nonlinear regression analysis tool in Prism version 9.3.1. IC 90 values (ng/mL) were calculated via the [Agonist] vs. response—Find ECanything within the nonlinear regression analysis tool in Prism with bottom and top y -values constrained to 0 and 100, respectively. IC 50 and IC 90 values not listed for irrelevant malaria protein.
In vitro RBD-62 inhibition of SARS-CoV-2 infection
Cells and viruses.
VeroE6-TMPRSS2 cells (VRC, NIH) were generated and cultured as previously described 69 . Cell line was authenticated by characterization of TMPRSS2 via the use of an anti-TMPRSS2 flow antibody. nCoV/USA_WA1/2020 (WA1), closely resembling the original Wuhan strain, was propagated from an infectious SARS-CoV-2 clone as previously described 70 . Omicron BA.1 (EPI_ISL_7171744) was isolated as previously described 3 . Omicron BA.5 isolate (EPI_ISL_13512579) was provided by Dr. Richard Webby (St. Jude Children’s Research Hospital), Omicron XBB.1.5 (EPI_ISL_16026423) was provided by Dr. Andrew Pekosz (Johns Hopkins Bloomberg School of Public Health) and Omicron JN.1 (EPI_ISL_18403077) was provided by Dr. Benjamin Pinsky (Stanford University). All variants were plaque purified and propagated once in VeroE6-TMPRSS2 cells to generate working stocks. Viruses were then deep sequenced and confirmed as previously described 71 .
Inhibition assay
To test the anti-viral activity of the RBD proteins, VeroE6-TMPRSS2 cells were seeded in a 96-well plate 1 day before infection. The inhibitors were serially diluted and added to the cells and then incubated for an hour at 37 °C. Inhibitors included RBD-62, WA1 RBD (VRC, NIH) and a resurfaced HIV-1 Env core (RSC3KO) (VRC, NIH). Both RBD-62 and WA1 RBD were biotinylated via AviTag. After 1 h of incubation, cells were infected with various SARS-CoV-2 variants in BSL-3 laboratory and incubated at 37 °C for an additional hour. Post-incubation, the mixture was removed from cells, and 100 μl of prewarmed 0.85% methylcellulose overlay was added to each well. Plates were incubated at 37 °C for 18–40 h (depending on variants). After the appropriate incubation time, the methylcellulose overlay was removed, and cells were washed with PBS and fixed with 2% paraformaldehyde for 30 min. Following fixation, cells were washed twice with PBS, and permeabilized using a permeabilization buffer for at least 20 min. After permeabilization, cells were incubated with an anti-SARS-CoV-2 spike primary antibody directly conjugated to Alexa Fluor-647 (clone CR3022-AF647, Cell Signaling #37475 at a dilution of 1:5000) overnight at 4 °C. Cells were then washed twice with 1× PBS and imaged on an ELISPOT reader (CTL Analyzer). The number of foci for each sample was counted using the Viridot program 72 . Cell viability was determined with compound-treated or mock-treated cells using CellTiter-Glo (Promega), which measures cellular ATP content. All experiments were conducted in quadruplicate, and all values were normalized to mock-treated cells for analysis. Normalized values were used to fit a 4-parameter equation to semi-log plots of the concentration-response data using GraphPad Prism version 10.2.0. IC 50 and IC 90 values not listed for irrelevant HIV-1 protein.
RBD-62 administration
RBD-62 was provided by Gideon Schreiber (Weizmann Institute of Science) and formulated with gelatin as a delivery vehicle. Gelatin (Sigma-Aldrich, G1890) was prepared at a concentration of 4 mg/mL in Dulbecco’s PBS (Gibco). RBD-62 or PBS control was then mixed with gelatin at a 1:1 ratio. Each animal was administered 2.5 mg RBD-62 or PBS control at an effective concentration of 2 mg/mL gelatin with a total volume of 4.6 mL via a pediatric mask attached to a Pari eFlow nebulizer (PARI GmbH) that delivered 4 μm particles deep into the lung of anesthetized macaques, as previously described 73 .
Subgenomic RNA quantification
sgRNA was isolated and quantified by researchers blinded to vaccine status as previously described 61 . Briefly, total RNA was extracted from BAL fluid and NS using RNAzol BD column kit (Molecular Research Center). PCR reactions were conducted with TaqMan Fast Virus 1-Step Master Mix (Applied Biosystems), forward primer in the 5′ leader region and N gene-specific probe and reverse primer as previously described 16 :
sgLeadSARSCoV2_F: 5′-CGATCTCTTGTAGATCTGTTCTC-3′
N2_P: 5′-FAM-CGATCAAAACAACGTCGGCCCC-BHQ1-3′
wtN_R: 5′-GGTGAACCAAGACGCAGTAT-3′
Amplifications were performed with a QuantStudio 6 Pro Real-Time PCR System (Applied Biosystems). The assay lower LOD was 50 copies/reaction.
TCID 50 quantification of SARS-CoV-2 from BAL and NS
TCID 50 assay was conducted as described previously 61 . Briefly, Vero-TMPRSS2 cells (VRC/NIH) were plated at 25,000 cells/well in Dulbecco’s Modified Eagle Medium (DMEM) + 10% FBS + gentamicin and the cultures were incubated at 37 °C, 5.0% CO 2 . Cells reached 80–100% confluence the following day. The medium was aspirated and replaced with 180 μL of DMEM + 2% FBS + gentamicin. Twenty microliters of BAL or NS sample was added to the top row in quadruplicate and mixed using a P200 pipettor five times. Using the pipettor, 20 μL was transferred to the next row, and repeated down the plate (columns A–H) representing tenfold dilutions. The tips were disposed for each row and repeated until the last row. Positive (virus stock of known infectious titer in the assay) and negative (medium only) control wells were included in each assay set-up. The plates were incubated at 37 °C, 5.0% CO 2 for 4 days. The cell monolayers were then visually inspected for cytopathic effect. The TCID 50 value was calculated using the Read–Muench formula.
Serum and mucosal antibody titers
Quantification of antibodies in the blood and mucosa was performed as previously described 74 . Briefly, total IgG and IgA antigen-specific antibodies to SARS-CoV-2-derived antigens were determined in a multiplex serology assay by Meso Scale Discovery (MSD). We measured responses using V-Plex SARS-CoV-2 Panel 22 for variant RBD and Panel 1 for WT proteins and protein domains according to manufacturer’s instructions, except 25 μl of sample and detection antibody were used per well. BAL and NW were initially concentrated tenfold using Amicon Ultra centrifugal filter 10 kDa MWCO (Millipore). For measurement of antibody titers to variant RBD, concentrated BAL and NW were initially diluted 1:5 and then serially diluted 1:5; heat-inactivated plasma was initially diluted 1:100 and then serially diluted 1:4. Data presented as AUC. For measurement of IgG antibody titers to SARS-CoV-2 protein domains, concentrated BAL and NW were diluted at 1:5, 1:10 and 1:20 ratios; heat-inactivated plasma was diluted at 1:25, 1:50 and 1:100 ratios. For measurement of IgA titers, concentrated BAL and NW were initially diluted 1:5 and then serially diluted 1:5. Results were reported as BAU/mL based upon the reference standard included in Panel 1 kit according to manufacturer’s instructions.
RBD–ACE2 binding inhibition
BAL fluid and NW were concentrated tenfold with Amicon Ultra centrifugal filter 10 kDa MWCO (Millipore). To remove residual RBD-62 prior to the binding inhibition assay, fluid was diluted 1:1 in 50 mM sodium phosphate, 300 mM sodium chloride (binding buffer). HisPur Ni-NTA spin plate (Thermo Scientific) was equilibrated with binding buffer and the diluted fluid was applied to the plate and incubated with agitation at 4 °C overnight. The purified fluid was collected after centrifugation for 1 min at 1000 × g . The fluid was then dialyzed against Diluent 100 using Pierce Microdialysis Plates (Thermo Scientific). Purified fluid was diluted to a final ratio of 1:5. ACE2-binding inhibition assay was performed with V-Plex SARS-CoV-2 Panel 22 (ACE2) Kit (MSD) per manufacturer’s instructions. Plates were read on MSD Sector S 600 instrument. Results are reported as percent inhibition.
Intracellular cytokine staining (ICS)
Cryopreserved PBMC and BAL cells were thawed and rested overnight in a 37 °C, 5% CO 2 incubator. The next morning, cells were stimulated with SARS-CoV-2 spike protein (S1 and S2, matched to ancestral COVID-19 mRNA vaccine insert) and nucleocapsid (N) peptide pools (JPT Peptides) at a final concentration of 2 μg/ml in the presence of 3 mM monensin for 6 h. The S1, S2 and N peptide pools are comprised of 158, 157 and 102 individual peptides, respectively, as 15mers overlapping by 11 aa in 100% DMSO. Negative controls received an equal concentration of DMSO instead of peptides (final concentration of 0.5%). ICS was performed as previously described 35 , 75 . The following monoclonal antibodies were used: (1) CD3 APC-CY7, clone SP34.2, BD Biosciences #557757—Lot #0223215 at a dilution of 1:640; (2) CD4 PE-CY5.5, clone S3.5, Invitrogen #MHCD0418—Lot #2303833 at 1:80; (3) CD8 BV570, clone RPA-T8, Biolegend #301038—Lot #B333843 at 1:80; (4) CD45RA PE-CY5, clone 5H9, BD Biosciences #552888—Lot #8110737 at 1:2500; (5) CCR7 BV650, clone G043H7, Biolegend #353234—Lot #B325079 at 1:10; (6) CXCR5 PE, clone MU5UBEE, Thermo Fisher #12-9185-42—Lot #2279157 at 1:10; (7) CXCR3 BV711, clone 1C6/CXCR3, BD Biosciences #563156—Lot #0309602 at 1:20; (8) PD-1 BUV737, clone EH12.1, BD Horizon #612792—Lot #0303349 at 1:40; (9) ICOS PE-CY7, clone C398.4A, Biolegend #313520—Lot #B213626 at 1:640; (10) CD69 ECD, clone TP1.55.3, Beckman Coulter #6607110—Lot #7620090 at 1:40; (11) IFN-g Alexa 700, clone B27, Biolegend #506516—Lot #B320892 at 1:640; (12) IL-2 BV750, clone MQ1-17H12, BD Biosciences #566361—Lot #7108833 at 1:40; (13) IL-4 BB700, clone MP4-25D2, BD Biosciences custom order—Lot #1042139 at 1:20; (14) TNF FITC, clone Mab11, BD Biosciences #554512—Lot #0015360 at 1:80; (15) IL-13 BV421, clone JES10-5A2, BD Biosciences #563580—Lot #0286560 at 1:20; (16) IL-17A BV605, clone BL168, Biolegend #512326—Lot #B319897 at 1:40; (17) IL-21 Alexa 647, clone 3A3-N2.1, BD Biosciences #560493—Lot #1005849 at 1:10; and (18) CD154 BV785, clone 24-31, Biolegend #310842—Lot #B329207 at 1:20. Aqua Live/Dead Fixable Dead Cell Stain Kit (Invitrogen #L34957—Lot #2204200 at 1:800) was used to exclude dead cells. All antibodies were previously titrated to determine the optimal concentration. Samples were acquired on a BD FACSymphony flow cytometer and analyzed using FlowJo version 10.8.2 (Treestar, Inc., Ashland, OR).
B-cell probe binding
Flow cytometric analysis of antigen-specific memory B-cell frequencies was performed as previously described 35 . Briefly, cryopreserved PBMC were thawed and stained with the following antibodies (monoclonal unless indicated): (1) IgD FITC, goat polyclonal antibody, Southern Biotech #2030-02—Lot #A2118-WF09C at a dilution of 1:40; (2) IgM PerCP-Cy5.5, clone G20-127, BD Biosciences #561285—Lot #0307134 at 1:40; (3) IgA Dy405, goat polyclonal antibody, Jackson ImmunoResearch #109-475-011—Lot #155196 at 1:40; (4) CD20 BV570, clone 2H7, Biolegend #302332—Lot #B301458 at 1:40; (5) CD27 BV650, clone O323, Biolegend #302828—Lot #B273921 at 1:20; (6) CD14 BV785, clone M5E2, Biolegend #301840—Lot #B327948 at 1:80; (7) CD16 BUV496, clone 3G8, BD Biosciences #564653—Lot #0288806 at 1:40; (8) CD4 BUV737, clone SK3, BD Biosciences #564305—Lot #0282762 at 1:40; (9) CD8 BUV395, clone RPA-T8, BD Biosciences #563795—Lot #9346411 at 1:80; (10) CD19 APC, clone J3-119, Beckman Coulter #IM2470U—Lot #200092 at 1:20; (11) IgG Alexa 700, clone G18-145, BD Biosciences #561296—Lot #0135021 at 1:20; (12) CD3 APC-Cy7, clone SP34.2, BD Biosciences #557757—Lot #0223215 at 1:40; (13) CD38 PE, clone OKT10, Caprico Biotech #100826—Lot #8AE4 at 1:640; (14) CD21 PE-Cy5, clone B-ly4, BD Biosciences #551064—Lot #0072939 at 1:20; and (15) CXCR5 PE-Cy7, clone MU5UBEE, Thermo Fisher #25-9185-42—Lot #2312036 at 1:40. Stained cells were then incubated with streptavidin-BV605 (BD Biosciences) labeled Delta S-2P, BA1 S-2P or RBD-62 and streptavidin-BUV661 (BD Biosciences) labeled WA1 or Delta S-2P for 30 min at 4 °C (protected from light). Cells were washed and fixed in 0.5% formaldehyde (Tousimis Research Corp) prior to data acquisition. Aqua Live/Dead Fixable Dead Cell Stain Kit (Invitrogen #L34957—Lot #2098529 at 1:800) was used to exclude dead cells. All antibodies were previously titrated to determine the optimal concentration. Samples were acquired on a BD FACSymphony cytometer and analyzed using FlowJo version 10.7.2 (BD, Ashland, OR).
Quantification and statistical analysis
Comparisons of animals that received RBD-62 vs. control for virus titers and humoral responses post-challenge are based on Wilcoxon tests on individual days while longitudinal analyses are based on generalized estimating equations (GEE). We adjusted for multiple comparisons across timepoints for each assay using Holm’s adjustment; we did not adjust to account for multiple comparisons across different assays or different target antigens. P values are shown in the figures, and the relevant statistical analyses and sample n are listed in corresponding figure legends. NS denotes that the indicated comparison was not significant, with two-sided P > 0.05.
Virus titers were analyzed on the log 10 scale; humoral responses were analyzed as the AUC on the log 10 scale. Values below the limit of detection or lower limit of quantification for virus titers were set to half of the value for statistical analysis (25 copies sgRNA or 1.35 TCID 50 per mL or per swab). Antibody binding and virus assays are log-transformed as appropriate. All statistical analyses were done using R version 4.2.1. All flow cytometry data were graphed in FlowJo version 10.7.2 (B-cell binding) or version 10.8.2 (ICS) while all other graphs were designed using Prism version 9.3.1 or 10.2.0.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data are available in the main text, the Supplementary data or as Source data provided with this paper. Source data are provided with this paper.
Planas, D. et al. Reduced sensitivity of SARS-CoV-2 variant Delta to antibody neutralization. Nature 596 , 276–280 (2021).
Article ADS CAS PubMed Google Scholar
Wang, R. et al. Analysis of SARS-CoV-2 variant mutations reveals neutralization escape mechanisms and the ability to use ACE2 receptors from additional species. Immunity 54 , 1611–1621 e1615 (2021).
Article CAS PubMed PubMed Central Google Scholar
Edara, V. V. et al. mRNA-1273 and BNT162b2 mRNA vaccines have reduced neutralizing activity against the SARS-CoV-2 omicron variant. Cell Rep. Med. 3 , 100529 (2022).
Hoffmann, M. et al. The Omicron variant is highly resistant against antibody-mediated neutralization: implications for control of the COVID-19 pandemic. Cell 185 , 447–456 e411 (2022).
Article CAS PubMed Google Scholar
Muik, A. et al. Neutralization of SARS-CoV-2 Omicron by BNT162b2 mRNA vaccine-elicited human sera. Science 375 , eabn7591 (2022).
Schmidt, F. et al. Plasma neutralization of the SARS-CoV-2 Omicron variant. N. Engl. J. Med. 386 , 599–601 (2022).
Article PubMed Google Scholar
Davis-Gardner, M. E. et al. Neutralization against BA.2.75.2, BQ.1.1, and XBB from mRNA Bivalent Booster. N. Engl. J. Med. 388 , 183–185 (2023).
Chemaitelly, H. et al. Duration of mRNA vaccine protection against SARS-CoV-2 Omicron BA.1 and BA.2 subvariants in Qatar. Nat. Commun. 13 , 3082 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Collie, S., Champion, J., Moultrie, H., Bekker, L. G. & Gray, G. Effectiveness of BNT162b2 vaccine against Omicron variant in South Africa. N. Engl. J. Med. 386 , 494–496 (2022).
Baden, L. R. et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. N. Engl. J. Med. 384 , 403–416 (2021).
Polack, F. P. et al. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. N. Engl. J. Med. 383 , 2603–2615 (2020).
Lauring, A. S. et al. Clinical severity of, and effectiveness of mRNA vaccines against, covid-19 from Omicron, Delta, and Alpha SARS-CoV-2 variants in the United States: prospective observational study. BMJ 376 , e069761 (2022).
Regev-Yochay, G. et al. Efficacy of a fourth dose of Covid-19 mRNA vaccine against Omicron. N. Engl. J. Med. 386 , 1377–1380 (2022).
Cohen, M. J. et al. Association of receiving a fourth dose of the BNT162b vaccine with SARS-CoV-2 infection among health care workers in Israel. JAMA Netw. Open 5 , e2224657 (2022).
Article PubMed PubMed Central Google Scholar
Bar-On, Y. M. et al. Protection by a fourth dose of BNT162b2 against Omicron in Israel. N. Engl. J. Med. 386 , 1712–1720 (2022).
Gagne, M. et al. mRNA-1273 or mRNA-Omicron boost in vaccinated macaques elicits similar B cell expansion, neutralizing responses, and protection from Omicron. Cell 185 , 1556–1571 e1518 (2022).
Park, Y. J. et al. Imprinted antibody responses against SARS-CoV-2 Omicron sublineages. Science 378 , 619–627 (2022).
Hawman, D. W. et al. Replicating RNA platform enables rapid response to the SARS-CoV-2 Omicron variant and elicits enhanced protection in naive hamsters compared to ancestral vaccine. EBioMedicine 83 , 104196 (2022).
Quandt, J. et al. Omicron BA.1 breakthrough infection drives cross-variant neutralization and memory B cell formation against conserved epitopes. Sci. Immunol. 7 , eabq2427 (2022).
Reynolds, C. J. et al. Immune boosting by B.1.1.529 (Omicron) depends on previous SARS-CoV-2 exposure. Science 377 , eabq1841 (2022).
Roltgen, K. et al. Immune imprinting, breadth of variant recognition, and germinal center response in human SARS-CoV-2 infection and vaccination. Cell 185 , 1025–1040 e1014 (2022).
Jayk Bernal, A. et al. Molnupiravir for oral treatment of Covid-19 in nonhospitalized patients. N. Engl. J. Med. 386 , 509–520 (2022).
Hammond, J. et al. Oral nirmatrelvir for high-risk, nonhospitalized adults with Covid-19. N. Engl. J. Med. 386 , 1397–1408 (2022).
Park, J. J., Lee, J., Seo, Y. B. & Na, S. H. Nirmatrelvir/ritonavir prescription rate and outcomes in coronavirus disease 2019: a single center study. Infect. Chemother. 54 , 757–764 (2022).
Shah, M. M. et al. Paxlovid associated with decreased hospitalization rate among adults with COVID-19—United States, April–September 2022. MMWR Morb. Mortal. Wkly. Rep. 71 , 1531–1537 (2022).
Zhou, Y. et al. Nirmatrelvir-resistant SARS-CoV-2 variants with high fitness in an infectious cell culture system. Sci. Adv. 8 , eadd7197 (2022).
Iketani, S. et al. Multiple pathways for SARS-CoV-2 resistance to nirmatrelvir. Nature 613 , 558–564 (2022).
Padhi, A. K. & Tripathi, T. Hotspot residues and resistance mutations in the nirmatrelvir-binding site of SARS-CoV-2 main protease: design, identification, and correlation with globally circulating viral genomes. Biochem. Biophys. Res. Commun. 629 , 54–60 (2022).
Hu, Y. et al. Naturally occurring mutations of SARS-CoV-2 main protease confer drug resistance to nirmatrelvir. ACS Cent. Sci. 9 , 1658–1669 (2023).
Westberg, M. et al. An orally bioavailable SARS-CoV-2 main protease inhibitor exhibits improved affinity and reduced sensitivity to mutations. Sci. Transl. Med. 16 , eadi0979 (2024).
Lee, D. et al. Bioengineered amyloid peptide for rapid screening of inhibitors against main protease of SARS-CoV-2. Nat. Commun. 15 , 2108 (2024).
Zahradnik, J. et al. SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution. Nat. Microbiol. 6 , 1188–1198 (2021).
Li, C. et al. Gelatin stabilizes nebulized proteins in pulmonary drug delivery against COVID-19. ACS Biomater. Sci. Eng. 8 , 2553–2563 (2022).
Corbett, K. S. et al. Protection against SARS-CoV-2 Beta variant in mRNA-1273 vaccine-boosted nonhuman primates. Science 374 , 1343–1353 (2021).
Gagne, M. et al. Protection from SARS-CoV-2 Delta one year after mRNA-1273 vaccination in rhesus macaques coincides with anamnestic antibody response in the lung. Cell 185 , 113–130.e115 (2022).
Choi, S. J. et al. T cell epitopes in SARS-CoV-2 proteins are substantially conserved in the Omicron variant. Cell. Mol. Immunol. 19 , 447–448 (2022).
Akkaya, M., Kwak, K. & Pierce, S. K. B cell memory: building two walls of protection against pathogens. Nat. Rev. Immunol. 20 , 229–238 (2020).
Sette, A. & Crotty, S. Immunological memory to SARS-CoV-2 infection and COVID-19 vaccines. Immunol. Rev. 310 , 27–46 (2022).
Palm, A. E. & Henry, C. Remembrance of things past: long-term B cell memory after infection and vaccination. Front. Immunol. 10 , 1787 (2019).
Goel, R. R. et al. Distinct antibody and memory B cell responses in SARS-CoV-2 naive and recovered individuals following mRNA vaccination. Sci. Immunol. 6 , eabi6950 (2021).
Hu, Y., et al. Naturally Occurring Mutations of SARS-CoV-2 Main Protease Confer Drug Resistance to Nirmatrelvir. ACS Cent. Sci. 9 , 1658–1669 (2023).
Barton, M. I. et al. Effects of common mutations in the SARS-CoV-2 Spike RBD and its ligand, the human ACE2 receptor on binding affinity and kinetics. Elife 10 , e70658 (2021).
Han, P. et al. Receptor binding and complex structures of human ACE2 to spike RBD from omicron and delta SARS-CoV-2. Cell 185 , 630–640.e610 (2022).
Uriu, K. et al. Enhanced transmissibility, infectivity, and immune resistance of the SARS-CoV-2 omicron XBB.1.5 variant. Lancet Infect. Dis. 23 , 280–281 (2023).
Dejnirattisai, W. et al. SARS-CoV-2 Omicron-B.1.1.529 leads to widespread escape from neutralizing antibody responses. Cell 185 , 467–484.e415 (2022).
Chaouat, A. E. et al. Anti-human ACE2 antibody neutralizes and inhibits virus production of SARS-CoV-2 variants of concern. iScience 25 , 104935 (2022).
Zhang, F. et al. Pan-sarbecovirus prophylaxis with human anti-ACE2 monoclonal antibodies. Nat. Microbiol. 8 , 1051–1063 (2023).
Chan, K. K. et al. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science 369 , 1261–1265 (2020).
Rao, L. et al. Decoy nanoparticles protect against COVID-19 by concurrently adsorbing viruses and inflammatory cytokines. Proc. Natl Acad. Sci. USA 117 , 27141–27147 (2020).
Huang, K. Y. et al. Humanized COVID-19 decoy antibody effectively blocks viral entry and prevents SARS-CoV-2 infection. EMBO Mol. Med. 13 , e12828 (2021).
Miller, A. et al. A super-potent tetramerized ACE2 protein displays enhanced neutralization of SARS-CoV-2 virus infection. Sci. Rep. 11 , 10617 (2021).
Ferrari, M. et al. Characterization of a novel ACE2-based therapeutic with enhanced rather than reduced activity against SARS-CoV-2 variants. J. Virol. 95 , e0068521 (2021).
Capraz, T. et al. Structure-guided glyco-engineering of ACE2 for improved potency as soluble SARS-CoV-2 decoy receptor. Elife 10 , e73641 (2021).
Ikemura, N. et al. An engineered ACE2 decoy neutralizes the SARS-CoV-2 Omicron variant and confers protection against infection in vivo. Sci. Transl. Med. 14 , eabn7737 (2022).
Kochl, K. et al. Optimizing variant-specific therapeutic SARS-CoV-2 decoys using deep-learning-guided molecular dynamics simulations. Sci. Rep. 13 , 774 (2023).
Article ADS PubMed PubMed Central Google Scholar
Tada, T., Dcosta, B. M., Zhou, H. & Landau, N. R. Prophylaxis and treatment of SARS-CoV-2 infection by an ACE2 receptor decoy in a preclinical animal model. iScience 26 , 106092 (2023).
Tanaka, S. et al. An ACE2 Triple Decoy that neutralizes SARS-CoV-2 shows enhanced affinity for virus variants. Sci. Rep. 11 , 12740 (2021).
Torchia, J. A. et al. Optimized ACE2 decoys neutralize antibody-resistant SARS-CoV-2 variants through functional receptor mimicry and treat infection in vivo. Sci. Adv. 8 , eabq6527 (2022).
Xiong, Q. et al. Close relatives of MERS-CoV in bats use ACE2 as their functional receptors. Nature 612 , 748–757 (2022).
Fumagalli, V. et al. Nirmatrelvir treatment of SARS-CoV-2-infected mice blunts antiviral adaptive immune responses. EMBO Mol. Med. 15 , e17580 (2023).
Corbett, K. S. et al. mRNA-1273 protects against SARS-CoV-2 beta infection in nonhuman primates. Nat. Immunol. 22 , 1306–1315 (2021).
Corbett, K. S. et al. Immune correlates of protection by mRNA-1273 vaccine against SARS-CoV-2 in nonhuman primates. Science 373 , eabj0299 (2021).
Touret, F. et al. Enhanced neutralization escape to therapeutic monoclonal antibodies by SARS-CoV-2 omicron sub-lineages. iScience 26 , 106413 (2023).
Planas, D. et al. Resistance of Omicron subvariants BA.2.75.2, BA.4.6, and BQ.1.1 to neutralizing antibodies. Nat. Commun. 14 , 824 (2023).
Iketani, S. et al. Antibody evasion properties of SARS-CoV-2 Omicron sublineages. Nature 604 , 553–556 (2022).
Arora, P. et al. Omicron sublineage BQ.1.1 resistance to monoclonal antibodies. Lancet Infect. Dis. 23 , 22–23 (2023).
Imai, M. et al. Efficacy of antiviral agents against Omicron subvariants BQ.1.1 and XBB. N. Engl. J. Med. 388 , 89–91 (2023).
Wang, L. T. et al. A potent anti-malarial human monoclonal antibody targets circumsporozoite protein minor repeats and neutralizes sporozoites in the liver. Immunity 53 , 733–744.e738 (2020).
Edara, V. V. et al. Infection- and vaccine-induced antibody binding and neutralization of the B.1.351 SARS-CoV-2 variant. Cell Host Microbe 29 , 516–521.e513 (2021).
Xie, X. et al. An infectious cDNA clone of SARS-CoV-2. Cell Host Microbe 27 , 841–848.e843 (2020).
Edara, V. V. et al. Infection and vaccine-induced neutralizing-antibody responses to the SARS-CoV-2 B.1.617 variants. N. Engl. J. Med. 385 , 664–666 (2021).
Katzelnick, L. C. et al. Viridot: an automated virus plaque (immunofocus) counter for the measurement of serological neutralizing responses with application to dengue virus. PLoS Negl. Trop. Dis. 12 , e0006862 (2018).
Song, K. et al. Genetic immunization in the lung induces potent local and systemic immune responses. Proc. Natl Acad. Sci. USA 107 , 22213–22218 (2010).
Corbett, K. S. et al. Evaluation of the mRNA-1273 vaccine against SARS-CoV-2 in nonhuman primates. N. Engl. J. Med. 383 , 1544–1555 (2020).
Donaldson, M. M., Kao, S. F. & Foulds, K. E. OMIP-052: an 18-color panel for measuring Th1, Th2, Th17, and Tfh responses in rhesus macaques. Cytom. A 95 , 261–263 (2019).
Article Google Scholar
Download references
Acknowledgements
We would like to thank Ruth Woodward and the entire Translational Research Program, Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health for expert technical support regarding animal procedures. We would also like to acknowledge Matthew Burnett for assistance in figure design. Patricia Darrah provided guidance on the operation of the PARI nebulizer. Lawrence Wang provided the malaria circumsporozoite protein which was used as a negative control in our ACE2-binding inhibition assay. We would also like to thank Josue Marquez and Anna Mychalowych for sample processing and handling. Research was primarily funded by the Intramural Research Program of the Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services. Funding was also provided under 75N93021C00017 (NIAID Centers of Excellence for Influenza Research and Response, CEIRR) and NIH P51 OD011132 awarded to Emory University. This work was also supported in part by the Emory Executive Vice President for Health Affairs Synergy Fund Award, COVID-Catalyst-I3 Funds from the Woodruff Health Sciences Center and Emory School of Medicine, the Pediatric Research Alliance Center for Childhood Infections and Vaccines and Children’s Healthcare of Atlanta, and Woodruff Health Sciences Center 2020 COVID-19 CURE Award. G.S. received funding from both the Israel Science Foundation Grant (No. 3814/19) within the KillCorona-Curbing Coronavirus Research Program and the Ben B. and Joyce E. Eisenberg Foundation of the Weizmann Institute of Science. Y.R. received funding from the Anita James Rosen Foundation.
Open access funding provided by the National Institutes of Health.
Author information
Elizabeth McCarthy
Present address: Fred Hutch Cancer Center, Seattle, WA, USA
Authors and Affiliations
Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Matthew Gagne, Barbara J. Flynn, Christopher Cole Honeycutt, Dillon R. Flebbe, Shayne F. Andrew, Samantha J. Provost, Lauren McCormick, Elizabeth McCarthy, John-Paul M. Todd, Saran Bao, I-Ting Teng, Kathryn E. Foulds, Peter D. Kwong, Mario Roederer, Robert A. Seder & Daniel C. Douek
Bioqual Inc., Rockville, MD, USA
Alex Van Ry, Laurent Pessaint, Alan Dodson, Anthony Cook, Mark G. Lewis & Hanne Andersen
Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
Shir Marciano, Jiří Zahradník & Gideon Schreiber
Department of Earth and Planetary Sciences, Weizmann Institute of Science, Rehovot, Israel
Yinon Rudich & Chunlin Li
Center for Childhood Infections and Vaccines, Children’s Healthcare of Atlanta, Division of Infectious Diseases, Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA
Shilpi Jain, Bushra Wali & Mehul S. Suthar
Emory Vaccine Center, Emory University, Atlanta, GA, USA
Emory National Primate Research Center, Atlanta, GA, USA
Department of Microbiology and Immunology, Emory University, Atlanta, GA, USA
Mehul S. Suthar
Biostatistics Research Branch, Division of Clinical Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Martha C. Nason
You can also search for this author in PubMed Google Scholar
Contributions
M.G., J.-P.M.T., S.M., Y.R., C.L., J.Z., M.S.S., G.S., R.A.S. and D.C.D. designed the experiments and oversaw the study. M.G., B.J.F., C.C.H., D.R.F., S.F.A., S.J.P., L.M., A.V.R., S.J. and B.W. conducted experiments. I.-T.T., S.M., Y.R., C.L., J.Z. and G.S. developed critical reagents and performed research to optimize experimental conditions. E.M., J.-P.M.T., S.B., L.P. and A.C. performed research involving direct contact with primates. M.C.N. conducted statistical analysis. Experiments were overseen by M.G., S.F.A., J.-P.M.T., Y.R., L.P., A.D., M.G.L., H.A., M.S.S., K.E.F., P.D.K., M.R., G.S., R.A.S. and D.C.D. M.G. and D.C.D. wrote the original draft of the manuscript. All authors reviewed and edited the manuscript.
Corresponding authors
Correspondence to Robert A. Seder or Daniel C. Douek .
Ethics declarations
Competing interests.
The authors declare the following competing interests: M.S.S. serves on the scientific board of advisors for Moderna and Ocugen. D.C.D. is an inventor on US Patent Application No. 63/147,419 entitled “Antibodies Targeting the Spike Protein of Coronaviruses”. A.V.R., L.P., A.D., A.C., M.G.L. and H.A. are employees of Bioqual. The other authors declare no competing interests.
Peer review
Peer review information.
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information, reporting summary, source data, source data, rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Gagne, M., Flynn, B.J., Honeycutt, C.C. et al. Variant-proof high affinity ACE2 antagonist limits SARS-CoV-2 replication in upper and lower airways. Nat Commun 15 , 6894 (2024). https://doi.org/10.1038/s41467-024-51046-w
Download citation
Received : 21 November 2023
Accepted : 29 July 2024
Published : 12 August 2024
DOI : https://doi.org/10.1038/s41467-024-51046-w
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.
IMAGES
COMMENTS
As biological experiments can be complicated, replicate measurements are often taken to monitor the performance of the experiment, but such replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results.
What is replication? This Perspective article proposes that the answer shifts the conception of replication from a boring, uncreative, housekeeping activity to an exciting, generative, vital contributor to research progress.
According to common understanding, replication is repeating a study's procedure and observing whether the prior finding recurs. This definition is intuitive, easy to apply, and incorrect. We propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research.
In psychology, replication is defined as reproducing a study. It is essential for validity, but it's not always easy to perform experiments and get the same result.
Replicability is a subtle and nuanced topic, especially when discussed broadly across scientific and engineering research. An attempt by a second researcher to replicate a previous study is an effort to determine whether applying the same methods to the same scientific question produces similar results.
Replication (statistics) In engineering, science, and statistics, replication is the process of repeating a study or experiment under the same or similar conditions to support the original claim, which crucial to confirm the accuracy of results as well as for identifying and correcting the flaws in the original experiment. [ 1]
Replicating scientific results is tough — but essential A high-profile replication study in cancer biology has obtained disappointing results. Scientists must redouble their efforts to find out why.
Reproducibility is defined as obtaining consistent results using the same data and code as the original study (synonymous with computational reproducibility). Replicability means obtaining consistent results across studies aimed at answering the same scientific question using new data or other new computational methods.
The landscape is initialized anew for each of the 1000 replications in each simulation experiment. b For each replication, 100 discovery-oriented experiments are conducted, each with a randomly ...
Recent controversies in psychology have spurred conversations about the nature and quality of psychological research. One topic receiving substantial attention is the role of replication in psychol...
Good experimental design practice includes planning for replication. First, identify the questions the experiment aims to answer. Next, determine the proportion of variability induced by each step ...
DNA replication is semiconservative. Each strand in the double helix acts as a template for synthesis of a new, complementary strand. New DNA is made by enzymes called DNA polymerases, which require a template and a primer (starter) and synthesize DNA in the 5' to 3' direction. During DNA replication, one new strand (the leading strand) is made as a continuous piece. The other (the lagging ...
Replication is a key idea in science and statistics, but is often misunderstood by researchers because they receive little education or training on experimental design. Consequently, the wrong entity is replicated in many experiments, leading to pseudoreplication or the "unit of analysis" problem [1,2]. This results in exaggerated sample ...
The idea of replication is based on the premise that there are empirical regularities or universal laws to be replicated and verified, and the scientific method is adequate for doing it. Scientific truth, however, is not absolute but relative to time, ...
Replication is a research methodology used to verify, consolidate, and advance knowledge and understanding within empirical fields of study. A replication study works toward this goal by repeating a ...
5. Replicability. Replicability is a subtle and nuanced topic, especially when discussed broadly across scientific and engineering research. An attempt by a second researcher to replicate a previous study is an effort to determine whether applying the same methods to the same scientific question produces similar results.
Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability.
Replication. Once we have repeated our testing over and over, and think we understand the results, then it is time for replication. That means getting other scientists to perform the same tests, to see whether they get the same results. As with repetition, the most important things to watch for are results that don't fit our hypothesis, and for ...
Still, even in bacteria, with their smaller genomes, DNA replication involves an incredibly sophisticated, highly coordinated series of molecular events. These events are divided into four major ...
Replication is the non consecutive running of the experimental design multiple times. The purpose is to provide additional information and degrees of freedom to better understand and estimate the variation in the experiment. It is not the same as repetition.
Repetition vs Replication Replication assesses whether the same experiment yields consistent results across different trials or conditions, ensuring external validity and generalizability. Repetition focuses on obtaining multiple measurements within the same experiment or closely related experiments to assess precision and internal consistency.
Here we assume the replication experiment is indeed a true replication - a not unreasonable assumption in light of the effort expended to replicate these experiments accurately.
Causal dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication
Using a rotating Mach-Zehnder interferometer comparing light propagation in atmospheric air and vacuum, we have reproduced the anomalous signal residuals reported in early Michelson-Morley experiments with gas in the optical pathways. Far lower in amplitude than classical predictions and usually dismissed as instrumental systematics, these small signals were nevertheless reproducible in ...
You don't need a fancy qualification for these experiments - just some curiosity and few household basics.
self-preservation without replication — Research AI model unexpectedly modified its own code to extend runtime Facing time constraints, Sakana's "AI Scientist" attempted to change limits placed ...
If the statistics come from replicates, or from a single 'representative' experiment, consider whether the experiments offer strong support for the conclusions. If P-values are shown for replicates or a single 'representative' experiment, consider whether the experiments offer strong support for the conclusions. Open in a separate window
RBD-62 treatment equivalently suppresses virus replication in both upper and lower airways, a phenomenon not previously observed with clinically approved vaccines. ... All experiments were ...