An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
The PMC website is updating on October 15, 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- v.13(4); 2012 Apr
Replicates and repeats—what is the difference and is it significant?
David l vaux.
1 The Walter and Eliza Hall Institute, and the Department of Experimental Biology, University of Melbourne, Melbourne, Australia.
Fiona Fidler
2 La Trobe University School of Psychological Science, Melbourne, Australia.
Geoff Cumming
Science is knowledge gained through repeated experiment or observation. To be convincing, a scientific paper needs to provide evidence that the results are reproducible. This evidence might come from repeating the whole experiment independently several times, or from performing the experiment in such a way that independent data are obtained and a formal procedure of statistical inference can be applied—usually confidence intervals (CIs) or statistical significance testing. Over the past few years, many journals have strengthened their guidelines to authors and their editorial practices to ensure that error bars are described in figure legends—if error bars appear in the figures—and to set standards for the use of image-processing software. This has helped to improve the quality of images and reduce the number of papers with figures that show error bars but do not describe them. However, problems remain with how replicate and independently repeated data are described and interpreted. As biological experiments can be complicated, replicate measurements are often taken to monitor the performance of the experiment, but such replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results. In this article, we put forward our view to explain why data from replicates cannot be used to draw inferences about the validity of a hypothesis, and therefore should not be used to calculate CIs or P values, and should not be shown in figures.
…replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results
Let us suppose we are testing the hypothesis that the protein Biddelonin (BDL), encoded by the Bdl gene, is required for bone marrow colonies to grow in response to the cytokine HH-CSF. Luckily, we have wild-type (WT) and homozygous Bdl gene-deleted mice at our disposal, and a vial of recombinant HH-CSF. We prepare suspensions of bone marrow cells from a single WT and a single Bdl −/− mouse (same sex littermates from a Bdl +/− heterozygous cross) and count the cell suspensions by using a haemocytometer, adjusting them so that there are 1 × 10 5 cells per millilitre in the final solution of soft agar growth medium. We add 1 ml aliquots of the suspension to sets of ten 35 × 10 mm Petri dishes that each contain 10 μl of either saline or purified recombinant mouse HH-CSF.
We therefore put in the incubator four sets of ten soft agar cultures: one set of ten plates has WT bone marrow cells with saline; the second has Bdl −/− cells with saline; the third has WT cells with HH-CSF, and the fourth has Bdl −/− cells with HH-CSF. After a week, we remove the plates from the incubator and count the number of colonies (groups of >50 cells) in each plate by using a dissecting microscope. The number of colonies counted is shown in Table 1 .
Plate number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
WT + saline | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+ saline | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 |
WT + HH-CSF | 61 | 59 | 55 | 64 | 57 | 69 | 63 | 51 | 61 | 61 |
+ HH-CSF | 48 | 34 | 50 | 59 | 37 | 46 | 44 | 39 | 51 | 47 |
1 × 10 5 WT or Bdl −/− bone marrow cells were plated in 1 ml soft agar cultures in the presence or absence of 1 μM HH-CSF. Colonies per plate were counted after 1 week. WT, wild type.
We could plot the counts of the plates on a graph. If we plotted just the colony counts of only one plate of each type ( Fig 1A shows the data for plate 1), it seems clear that HH-CSF is necessary for many colonies to form, but it is not immediately apparent whether the response of the Bdl −/− cells is significantly different to that of the WT cells. Furthermore, the graph does not look ‘sciency’ enough; there are no error bars or P -values. Besides, by showing the data for only one plate we are breaking the fundamental rule of science that all relevant data should be reported and subjected to analysis, unless good reasons can be given why some data should be omitted.
Displaying data from replicates—what not to do. ( A ) Data for plate 1 only (shown in Table 1 ). ( B ) Means ± SE for replicate plates 1–3 (in Table 1 ), * P > 0.05. ( C ) Means ± SE for replicate plates 1–10 (in Table 1 ), * P < 0.0001. ( D ) Means ± SE for HH-CSF-treated replicate plates 1–10 (in Table 1 ). Statistics should not be shown for replicates because they merely indicate the fidelity with which the replicates were made, and have no bearing on the hypothesis being tested. In each of these figures, n = 1 and the size of the error bars in ( B ), ( C ) and ( D ) reflect sampling variation of the replicates. The SDs of the replicates would be expected to be roughly the square root of the mean number of colonies. Also, axes should commence at 0, other than in exceptional circumstances, such as for log scales. SD, standard deviation; SE, standard error.
To make it look better, we could add the mean numbers of colonies in the first three plates of each type to the graph ( Fig 1B ), with error bars that report the standard error (SE) of the three values of each type. Now it is looking more like a figure in a high-profile journal, but when we use the data from the three replicate plates of each type to assess the statistical significance of the difference in the responses of the WT and Bdl −/− cells to HH-CSF, we find P > 0.05, indicating they are not significantly different.
As we have another seven plates from each group, we can plot the means and SEs of all ten plates and re-calculate P ( Fig 1C ). Now we are delighted to find that there is a highly significant difference between the Bdl −/− and WT cells, with P < 0.0001.
However, although the differences are highly statistically significant, the heights of the columns are not dramatically different, and it is hard to see the error bars. To remedy this, we could simply start the y -axis at 40 rather than zero ( Fig 1D ), to emphasize the differences in the response to HH-CSF. Although this necessitates removing the saline controls, these are not as important as visual impact for high-profile journals.
With a small amount of effort, and no additional experiments, we have transformed an unimpressive result ( Fig 1A,B ) into one that gives strong support to our hypothesis that BDL is required for a response to HH-CSF, with a highly significant P -value, and a figure ( Fig 1D ) that looks like it could belong in one of the top journals.
So, what is wrong? The first problem is that our data do not confirm the hypothesis that BDL is required for bone marrow colonies to grow in response to HH-CSF, they actually refute it. Clearly, bone marrow colonies are growing in the absence of BDL, even if the number is not as great as when the Bdl genes are intact. Terms such as ‘required’, ‘essential’ and ‘obligatory’ are not relative, yet are still often incorrectly used when partial effects are seen. At the very least, we should reformulate our hypothesis, perhaps to “BDL is needed for a full response of bone marrow colony-forming cells to the cytokine HH-CSF”.
…by showing the data for only one plate we are breaking the fundamental rule of science that all relevant data should be reported and subjected to analysis…
The second major problem is that the calculations of P and statistical significance are based on the SE of replicates, but the ten replicates in any of the four conditions were each made from a single suspension of bone marrow cells from just one mouse. As such, we can at best infer a statistically significant difference between the concentration of colony-forming cells in the bone marrow cell suspension from that particular WT mouse and the bone marrow suspension from that particular gene-deleted mouse. We have made just one comparison, so n = 1, no matter how many replicate plates we count. To make an inference that can be generalized to all WT mice and Bdl −/− mice, we need to repeat our experiments a number of times, making several independent comparisons using several mice of each type.
Rather than providing independent data, the results from the replicate plates are linked because they all came from the same suspension of bone marrow cells. For example, if we made any error in determining the concentration of bone marrow cells, this error would be systematically applied to all of the plates. In this case, we determined the initial number of bone marrow cells by performing a cell count using a haemocytometer, a method that typically only gives an accuracy of ±10%. Therefore, no matter how many plates are counted, or how small the error bars are in Fig 1 , it is not valid to conclude that there is a difference between the WT and Bdl −/− cells. Moreover, even if we had used a flow cytometer to sort exactly the same number of bone marrow cells into each of the plates, we would still have only tested cells from a single Bdl −/− mouse, so n would still equal 1 (see Fundamental principle 1 in Sidebar A ).
Sidebar A | Fundamental principles of statistical design
Fundamental principle 1
Science is knowledge obtained by repeated experiment or observation: if n = 1, it is not science, as it has not been shown to be reproducible. You need a random sample of independent measurements.
Fundamental principle 2
Experimental design, at its simplest, is the art of varying one factor at a time while controlling others: an observed difference between two conditions can only be attributed to Factor A if that is the only factor differing between the two conditions. We always need to consider plausible alternative interpretations of an observed result. The differences observed in Fig 1 might only reflect differences between the two suspensions, or be due to some other (of the many) differences between the two individual mice, besides the particular genotypes of interest.
Fundamental principle 3
A conclusion can only apply to the population from which you took the random sample of independent measurements: so if we have multiple measures on a single suspension from one individual mouse, we can only draw a conclusion about that particular suspension from that particular mouse. If we have multiple measures of the activity of a single vial of cytokine, then we can only generalize our conclusion to that vial.
Fundamental principle 4
Although replicates cannot support inference on the main experimental questions, they do provide important quality controls of the conduct of experiments. Values from an outlying replicate can be omitted if a convincing explanation is found, although repeating part or all of the experiment is a safer strategy. Results from an independent sample, however, can only be left out in exceptional circumstances, and only if there are especially compelling reasons to justify doing so.
To be convincing, a scientific paper describing a new finding needs to provide evidence that the results are reproducible. While it might be argued that a hypothetical talking dog would represent an important scientific discovery even if n = 1, few people would be convinced if someone claimed to have a talking dog that had been observed on one occasion to speak a single word. Most people would require several words to be spoken, with a number of independent observers, on several occasions. The cloning of Dolly the sheep represented a scientific breakthrough, but she was one of five cloned sheep described by Campbell et al [ 1 ]. Eight fetuses and sheep were typed by microsatellite analysis and shown to be identical to the cell line used to provide the donor nuclei.
To be convincing, a scientific paper needs to provide evidence that the results are reproducible
Inferences can only be made about the population from which the independent samples were drawn. In our original experiment, we took individual replicate aliquots from the suspensions of bone marrow cells ( Fig 2A ). We can therefore only generalize our conclusions to the ‘population’ from which our sample aliquots came: in this case the population is that particular suspension of bone marrow cells. To test our hypothesis, it is necessary to carry out an experiment similar to that shown in Fig 2B . Here, bone marrow has been independently isolated from a random sample of WT mice and another random sample of Bdl −/− mice. In this case, we can draw conclusions about Bdl −/− mice in general, and compare them withWT mice (in general). In Fig 2A , the number of Bdl −/− mice that have been compared with WT mice (which is the comparison relevant to our hypothesis) is one, so n = 1, regardless of how many replicate plates are counted. Conversely, in Fig 2B we are comparing three Bdl −/− mice with WT controls, so n = 3, whether we plate three replicate plates of each type or 30. Note, however, that it is highly desirable for statistical reasons to have samples larger than n = 3, and/or to test the hypothesis by some other approach, for example, by using antibodies that block HH-CSF or BDL, or by re-expressing a Bdl cDNA in the Bdl −/− cells (see Fundamental principle 2 in Sidebar A ).
Sample variation. Variation between samples can be used to make inferences about the population from which the independent samples were drawn (red arrows). For replicates, as in ( A ), inferences can only be made about the bone marrow suspensions from which the aliquots were taken. In ( A ), we might be able to infer that the plates on the left and the right contained cells from different suspensions, and possibly that the bone marrow cells came from two different mice, but we cannot make any conclusions about the effects of the different genotypes of the mice. In ( B ), three independent mice were chosen from each genotype, so we can make inferences about all mice of that genotype. Note that in the experiments in ( B ), n = 3, no matter how many replicate plates are created.
One of the most commonly used methods to determine the abundance of mRNA is real-time quantitative reverse transcription PCR (qRT-PCR; although the following example applies equally well to an ELISA or similar). Typically, multi-well plates are used so that many samples can be simultaneously read in a PCR machine. Let us suppose we are going to use qRT-PCR to compare levels of Boojum mRNA ( Bjm ) in control bone marrow cells (treated with medium alone) with Bjm levels in bone marrow cells treated with HH-CSF, in order to test the hypothesis that HH-CSF induces expression of the Bjm gene.
We isolate bone marrow cells from a normal mouse, and dispense equal aliquots containing a million cells into each of two wells of a six-well plate. For the moment we use only two of the six wells. We then add 4 ml of plain medium to one of the wells (the control), and 4 ml of a mixture of medium supplemented with HH-CSF to the other well (the experimental well). We incubate the plate for 24 h and then transfer the cells into two tubes, in which we extract the RNA using TRizol. We then suspend the RNA in 50 μl TRIS-buffered RNAse-free water.
We put 10 μl from each tube into each of two fresh tubes, so that both Actin (as a control) and Bjm message can be determined in each sample. We now have four tubes, each with 10 μl of mRNA solution. We make two sets of ‘reaction mix’ with the only difference being that one contains Actin PCR primers and the other Bjm primers. We add 40 μl of one or the other ‘reaction mix’ to each of the four tubes, so we now have 50 μl in each tube. After mixing, we take three aliquots of 10 μl from each of the four tubes and put them into three wells of a 384-well plate, so that 12 wells in total contain the RT-PCR mix. We then put the plate into the thermocycler. After an hour, we get an Excel spreadsheet of results.
…should we dispense with replicates altogether? The answer, of course, is ‘no’. Replicates serve as internal quality checks on how the experiment was performed
We then calculate the ratio of the Bjm signal to the Actin signal for each of the three pairs of reactions that contained RNA from the HH-CSF-treated cells, and for each of the three pairs of control reactions. In this case, the variation among the three replicates will not be affected by sampling error (which was what caused most of the variation in colony number in the earlier bone marrow colony-forming assay), but will only reflect the fidelity with which the replicates were made, and perhaps some variation in the heating of the separate wells in the PCR machine. The three 10 μl aliquots each came from the same, single, mRNA preparation, so we can only make inferences about the contents of that particular tube. As in the previous example, in this case n still equals 1, and no inferences about the main experimental hypothesis can be made. The same would be true if each RNA sample were analysed in 10 or 100 wells; we are only comparing one control sample to one experimental sample, so n = 1 ( Fig 3A ). To draw a general inference about the effect of HH-CSF on Bjm expression, we would have to perform the experiment on several independent samples derived from independent cultures of HH-CSF-stimulated bone marrow cells ( Fig 3B ).
Means of replicates compared with means of independent samples. ( A ) The ratios of the three-replicate Bjm PCR reactions to the three-replicate Actin PCR reactions from the six aliquots of RNA from one culture of HH-CSF-stimulated cells and one culture of unstimulated cells are shown (filled squares). The means of the ratios are shown as columns. The close correlation of the three replicate values (blue lines) indicates that the replicates were created with high fidelity and the pipetting was consistent, but is not relevant to the hypothesis being tested. It is not appropriate to show P -values here, because n = 1. ( B ) The ratios of the replicate PCR reactions using mRNA from the other cultures (two unstimulated, and two treated with HH-CSF) are shown as triangles and circles. Note how the correlation between the replicates (that is, the groups of three shapes) is much greater than the correlation between the mean values for the three independent untreated cultures and the three independent HH-CSF-treated cultures (green lines). Error bars indicate SE of the ratios from the three independent cultures, not the replicates for any single culture. P > 0.05. SE, standard error.
For example, we could have put the bone marrow cells in all six wells of the tissue culture plate, and performed three independent cultures with HH-CSF, and three independent control cultures in medium without HH-CSF. mRNA could then have been extracted from the six cultures, and each split into six wells to measure Actin and Bjm mRNA levels by using qRT-PCR. In this case, 36 wells would have been read by the machine. If the experiment were performed this way, then n = 3, as there were three independent control cultures, and three independent HH-CSF-dependent cultures, that were testing our hypothesis that HH-CSF induces Bjm expression. We then might be able to generalize our conclusions about the effect of that vial of recombinant HH-CSF on expression of Bjm mRNA. However, in this case ( Fig 3B ) P > 0.05, so we cannot exclude the possibility that the differences observed were just due to chance, and that HH-CSF has no effect on Bjm mRNA expression. Note that we also cannot conclude that it has no effect; if P > 0.05, the only conclusion we can make is that we cannot make any conclusions. Had we calculated and shown errors and P -values for replicates in Fig 3A , we might have incorrectly concluded, and perhaps misled the readers to conclude that there was a statistically significant effect of HH-CSF in stimulating Bjm transcription (see Fundamental principle 3 in Sidebar A ).
Why bother with replicates at all? In the previous sections we have seen that replicates do not allow inferences to be made, or allow us to draw conclusions relevant to the hypothesis we are testing. So should we dispense with replicates altogether? The answer, of course, is ‘no’. Replicates serve as internal quality checks on how the experiment was performed. If, for example, in the experiment described in Table 1 and Fig 1 , one of the replicate plates with saline-treated WT bone marrow contained 100 colonies, you would immediately suspect that something was wrong. You could check the plate to see if it had been mislabelled. You might look at the colonies using a microscope and discover that they are actually contaminating colonies of yeast. Had you not made any replicates, it is possible you would not have realized that a mistake had occurred.
Replicates […] cannot be used to infer conclusions
Fig 4 shows the results of the same qRT-PCR experiment as in Fig 3 , but in this case, for one of the sets of triplicate PCR ratios there is much more variation than in the others. Furthermore, this large variation can be accounted for by just one value of the three replicates—that is, the uppermost circle in the graph. If you had results such as those in Fig 4A , you would look at the individual values for the Actin PCR and Bjm PCR for the replicate that had the strange result. If the Bjm PCR sample was unusually high, you could check the corresponding well in the PCR plate to see if it had the same volume as the other wells. Conversely, if the Actin PCR value was much lower than those for the other two replicates, on checking the well in the plate you might find that the volume was too low. Alternatively, the unusual results might have been due to accidentally adding two aliquots of RNA, or two of PCR primer-reaction mix. Or perhaps the pipette tip came loose, or there were crystals obscuring the optics, or the pipette had been blocked by some debris, etc., etc., etc. Replicates can thus alert you to aberrant results, so that you know when to look further and when to repeat the experiment. Replicates can act as an internal check of the fidelity with which the experiment was performed. They can alert you to problems with plumbing, leaks, optics, contamination, suspensions, mixing or mix-ups. But they cannot be used to infer conclusions.
Interpreting data from replicates. ( A ) Mean ± SE of three independent cultures each with ratios from triplicate PCR measurements. P > 0.05. This experiment is much like the one in Fig 3B . However, notice in this case, for one of the sets of replicates (the circles from one of the HH-CSF-treated replicate values), there is a much greater range than for the other five sets of triplicate values. Because replicates are carefully designed to be as similar to each other as possible, finding unexpected variation should prompt an investigation into what went wrong during the conduct of the experiment. Note how in this case, an increase in variation among one set of replicates causes a decrease in the SEs for the values for the independent HH-CSF results: the SE bars for the HH-CSF condition are shorter in Fig 4A than in Fig 3B . Failure to take note of abnormal variation in replicates can lead to incorrect statistical inferences. ( B ) Bjm mRNA levels (relative to Actin ) for three independent cultures each with ratios from triplicate PCR measurements. Means are shown by a horizontal line. The data here are the same as those for Fig 3B or Fig 4A with the aberrant value deleted. When n is as small as 3, it is better to just plot the data points, rather than showing statistics. SE, standard error.
Because replicate values are not relevant to the hypothesis being tested, they—and statistics derived from them—should not be shown in figures. In Fig 4B , the large dots show the means of the replicate values in Fig 4A , after the aberrant replicate value has been excluded. While in this figure you could plot the means and SEs of the mRNA results from the three independent medium- and HH-CSF-treated cultures, in this case, the independent values are plotted and no error bars are shown. When the number of independent data points is low, and they can easily be seen when plotted on the graph, we recommend simply doing this, rather than showing means and error bars.
What should we look for when reading papers? Although replicates can be a valuable internal control to monitor the performance of your experiments, there is no point in showing them in the figures in publications because the statistics from replicates are not relevant to the hypothesis being tested. Indeed, if statistics, error bars and P -values for replicates are shown, they can mislead the readers of a paper who assume that they are relevant to the paper's conclusions. The corollary of this is that if you are reading a paper and see a figure in which the error bars—whether standard deviation, SE or CI—are unusually small, it might alert you that they come from replicates rather than independent samples. You should carefully scrutinize the figure legend to determine whether the statistics come from replicates or independent experiments. If the legend does not state what the error bars are, what n is, or whether the results come from replicates or independent samples, ask yourself whether these omissions undermine the paper, or whether some knowledge can still be gained from reading it.
…if statistics, error bars and P -values for replicates are shown, they can mislead the readers of a paper who assume that they are relevant to the paper’s conclusions
You should also be sceptical if the figure contains data from only a single experiment with statistics for replicates, because in this case, n = 1, and no valid conclusions can be made, even if the authors state that the results were ‘representative’—if the authors had more data, they should have included them in the published results (see Sidebar B for a checklist of what to look for). If you wish to see more examples of what not to do, search the Internet for the phrases ‘SD of one representative’, ‘SE of one representative’, ‘SEM of one representative’, ‘SD of replicates’ or ‘SEM of replicates’.
Sidebar B | Error checklist when reading papers
- If error bars are shown, are they described in the legend?
- If statistics or error bars are shown, is n stated?
- If the standard deviations (SDs) are less than 10%, do the results come from replicates?
- If the SDs of a binomial distribution are consistently less than √( np (1 – p ))—where n is sample size and P is the probability—are the data too good to be true?
- If the SDs of a Poisson distribution are consistently less than √(mean), are the data too good to be true?
- If the statistics come from replicates, or from a single ‘representative’ experiment, consider whether the experiments offer strong support for the conclusions.
- If P -values are shown for replicates or a single ‘representative’ experiment, consider whether the experiments offer strong support for the conclusions.
David L. Vaux
Acknowledgments
This work was made possible through Victorian State Government Operational Infrastructure Support, and Australian Government NHMRC IRIISS and NHMRC grants 461221and 433063.
The authors declare that they have no conflict of interest.
- Campbell KH, McWhir J, Ritchie WA, Wilmut I (1996) Sheep cloned by nuclear transfer from a cultured cell line . Nature 380 : 64–66 [ PubMed ] [ Google Scholar ]
Why is Replication in Research Important?
Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims.
Updated on June 30, 2023
Often viewed as a cornerstone of science , replication builds confidence in the scientific merit of a study’s results. The philosopher Karl Popper argued that, “we do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them.”
As such, creating the potential for replication is a common goal for researchers. The methods section of scientific manuscripts is vital to this process as it details exactly how the study was conducted. From this information, other researchers can replicate the study and evaluate its quality.
This article discusses replication as a rational concept integral to the philosophy of science and as a process validating the continuous loop of the scientific method. By considering both the ethical and practical implications, we may better understand why replication is important in research.
What is replication in research?
As a fundamental tool for building confidence in the value of a study’s results, replication has power. Some would say it has the power to make or break a scientific claim when, in reality, it is simply part of the scientific process, neither good nor bad.
When Nosek and Errington propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research, they revive its neutrality. The true purpose of replication, therefore, is to advance scientific discovery and theory by introducing new evidence that broadens the current understanding of a given question.
Why is replication important in research?
The great philosopher and scientist, Aristotle , asserted that a science is possible if and only if there are knowable objects involved. There cannot be a science of unicorns, for example, because unicorns do not exist. Therefore, a ‘science’ of unicorns lacks knowable objects and is not a ‘science’.
This philosophical foundation of science perfectly illustrates why replication is important in research. Basically, when an outcome is not replicable, it is not knowable and does not truly exist. Which means that each time replication of a study or a result is possible, its credibility and validity expands.
The lack of replicability is just as vital to the scientific process. It pushes researchers in new and creative directions, compelling them to continue asking questions and to never become complacent. Replication is as much a part of the scientific method as formulating a hypothesis or making observations.
Types of replication
Historically, replication has been divided into two broad categories:
- Direct replication : performing a new study that follows a previous study’s original methods and then comparing the results. While direct replication follows the protocols from the original study, the samples and conditions, time of day or year, lab space, research team, etc. are necessarily different. In this way, a direct replication uses empirical testing to reflect the prevailing beliefs about what is needed to produce a particular finding.
- Conceptual replication : performing a study that employs different methodologies to test the same hypothesis as an existing study. By applying diverse manipulations and measures, conceptual replication aims to operationalize a study’s underlying theoretical variables. In doing so, conceptual replication promotes collaborative research and explanations that are not based on a single methodology.
Though these general divisions provide a helpful starting point for both conducting and understanding replication studies, they are not polar opposites. There are nuances that produce countless subcategories such as:
- Internal replication : when the same research team conducts the same study while taking negative and positive factors into account
- Microreplication : conducting partial replications of the findings of other research groups
- Constructive replication : both manipulations and measures are varied
- Participant replication : changes only the participants
Many researchers agree these labels should be confined to study design, as direction for the research team, not a preconceived notion. In fact, Nosek and Errington conclude that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge.
How do researchers replicate a study?
Like all research studies, replication studies require careful planning. The Open Science Framework (OSF) offers a practical guide which details the following steps:
- Identify a study that is feasible to replicate given the time, expertise, and resources available to the research team.
- Determine and obtain the materials used in the original study.
- Develop a plan that details the type of replication study and research design intended.
- Outline and implement the study’s best practices.
- Conduct the replication study, analyze the data, and share the results.
These broad guidelines are expanded in Brown’s and Wood’s article , “Which tests not witch hunts: a diagnostic approach for conducting replication research.” Their findings are further condensed by Brown into a blog outlining four main procedural categories:
- Assumptions : identifying the contextual assumptions of the original study and research team
- Data transformations : using the study data to answer questions about data transformation choices by the original team
- Estimation : determining if the most appropriate estimation methods were used in the original study and if the replication can benefit from additional methods
- Heterogeneous outcomes : establishing whether the data from an original study lends itself to exploring separate heterogeneous outcomes
At the suggestion of peer reviewers from the e-journal Economics, Brown elaborates with a discussion of what not to do when conducting a replication study that includes:
- Do not use critiques of the original study’s design as a basis for replication findings.
- Do not perform robustness testing before completing a direct replication study.
- Do not omit communicating with the original authors, before, during, and after the replication.
- Do not label the original findings as errors solely based on different outcomes in the replication.
Again, replication studies are full blown, legitimate research endeavors that acutely contribute to scientific knowledge. They require the same levels of planning and dedication as any other study.
What happens when replication fails?
There are some obvious and agreed upon contextual factors that can result in the failure of a replication study such as:
- The detection of unknown effects
- Inconsistencies in the system
- The inherent nature of complex variables
- Substandard research practices
- Pure chance
While these variables affect all research studies, they have particular impact on replication as the outcomes in question are not novel but predetermined.
The constant flux of contexts and variables makes assessing replicability, determining success or failure, very tricky. A publication from the National Academy of Sciences points out that replicability is obtaining consistent , not identical, results across studies aimed at answering the same scientific question. They further provide eight core principles that are applicable to all disciplines.
While there is no straightforward criteria for determining if a replication is a failure or a success, the National Library of Science and the Open Science Collaboration suggest asking some key questions, such as:
- Does the replication produce a statistically significant effect in the same direction as the original?
- Is the effect size in the replication similar to the effect size in the original?
- Does the original effect size fall within the confidence or prediction interval of the replication?
- Does a meta-analytic combination of results from the original experiment and the replication yield a statistically significant effect?
- Do the results of the original experiment and the replication appear to be consistent?
While many clearly have an opinion about how and why replication fails, it is at best a null statement and at worst an unfair accusation. It misses the point, sidesteps the role of replication as a mechanism to further scientific endeavor by presenting new evidence to an existing question.
Can the replication process be improved?
The need to both restructure the definition of replication to account for variations in scientific fields and to recognize the degrees of potential outcomes when comparing the original data, comes in response to the replication crisis . Listen to this Hidden Brain podcast from NPR for an intriguing case study on this phenomenon.
Considered academia’s self-made disaster, the replication crisis is spurring other improvements in the replication process. Most broadly, it has prompted the resurgence and expansion of metascience , a field with roots in both philosophy and science that is widely referred to as "research on research" and "the science of science." By holding a mirror up to the scientific method, metascience is not only elucidating the purpose of replication but also guiding the rigors of its techniques.
Further efforts to improve replication are threaded throughout the industry, from updated research practices and study design to revised publication practices and oversight organizations, such as:
- Requiring full transparency of the materials and methods used in a study
- Pushing for statistical reform , including redefining the significance of the p-value
- Using pre registration reports that present the study’s plan for methods and analysis
- Adopting result-blind peer review allowing journals to accept a study based on its methodological design and justifications, not its results
- Founding organizations like the EQUATOR Network that promotes transparent and accurate reporting
Final thoughts
In the realm of scientific research, replication is a form of checks and balances. Neither the probability of a finding nor prominence of a scientist makes a study immune to the process.
And, while a single replication does not validate or nullify the original study’s outcomes, accumulating evidence from multiple replications does boost the credibility of its claims. At the very least, the findings offer insight to other researchers and enhance the pool of scientific knowledge.
After exploring the philosophy and the mechanisms behind replication, it is clear that the process is not perfect, but evolving. Its value lies within the irreplaceable role it plays in the scientific method. Replication is no more or less important than the other parts, simply necessary to perpetuate the infinite loop of scientific discovery.
Charla Viera, MS
See our "Privacy Policy"
The Happy Scientist
Error message, what is science: repeat and replicate.
In the scientific process, we should not rely on the results of a single test. Instead, we should perform the test over and over. Why? If it works once, shouldn't it work the same way every time? Yes, it should, so if we repeat the experiment and get a different result, then we know that there is something about the test that we are not considering.
If your system blocks Vimeo, click here to use the alternate player
In studying the processes of science, you will often run into two words, which seem similar: Repetition and Replication
Sometimes it is a matter of random chance, as in the case of flipping a coin. Just because it comes up heads the first time does not mean that it will always come up heads. By repeating the experiment over and over, we can see if our result really supports our hypothesis ( What is a Hypothesis? ), or if it was just random chance.
Sometimes the result might be due to some variable that you have not recognized. In our example of flipping a coin, the individual's technique for flipping the coin might influence the results. To take that into consideration, we repeat the experiment over and over with different people, looking closely for any results that don't fit into the idea we are testing.
Results that don't fit are important! Figuring out why they do not fit our hypothesis can give us an opportunity to learn new things, and get a better understanding of the idea we are testing.
Replication
Once we have repeated our testing over and over, and think we understand the results, then it is time for replication. That means getting other scientists to perform the same tests, to see whether they get the same results. As with repetition, the most important things to watch for are results that don't fit our hypothesis, and for the same reason. Those different results give us a chance to discover more about our idea. The different results may be because the person replicating our tests did something different, but they also might be because that person noticed something that we missed.
What if you are wrong!
If we did miss something, it is OK, as long as we performed our tests honestly and scientifically. Science is not about proving that "I am right!" Instead, it is a process for trying to learn more about the universe and how it works. It is usually a group effort, with each scientist adding her own perspective to the idea, giving us a better understanding and often raising new questions to explore.
Please log in.
Search form
Search by topic, search better.
- Life Science
- Earth Science
- Chemical Science
- Space Science
- Physical Science
- Process of Science
Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Difference between replication and repeated measurements
The following quote is from Montgomery's Experimental Design:
There is an important distinction between replication and repeated measurements . For example, suppose that a silicon wafer is etched in a single-wafer plasma etching process, and a critical dimension on this wafer is measured three times. These measurements are not replicates; they are a form of repeated measurements, and in this case, the observed variability in the three repeated measurements is a direct reflection of the inherent variability in the measurement system or gauge. As another illustration, suppose that as part of an experiment in semiconductor manufacturing, four wafers are processed simultaneously in an oxidation furnace at a particular gas flow rate and time and then a measurement is taken on the oxide thickness of each wafer. Once again, the measurement on the four wafers are not replicates but repeated measurements. In this case they reflect differences among the wafers and other sources of variability within that particular furnace run. Replication reflects sources of variability both between runs and (potentially) within runs.
I don't quite understand the difference between replication and repeated measurements. Wikipedia says:
The repeated measures design (also known as a within-subjects design) uses the same subjects with every condition of the research, including the control.
According to Wikipedia, the two examples are in Montgomery's book aren't repeated measurement experiments.
In the first example, the wafer is used under only one condition, isn't it?
In the second example, each wafer is used with only one condition: "processed simultaneously in an oxidation furnace at a particular gas flow rate and time", is it?
"Replication reflects sources of variability both between runs and (potentially) within runs". Then what is for repeated measurements?
- experiment-design
- terminology
- 2 $\begingroup$ Simply put, replication involves same technique on different sample $\endgroup$ – user36297 Commented Dec 17, 2013 at 3:13
5 Answers 5
I don't think his second example is replication OR repeated measurements.
Any study involves multiple cases (subjects, people, silicon chips, whatever).
Repeated measures involves measuring the same cases multiple times. So, if you measured the chips, then did something to them, then measured them again, etc it would be repeated measures.
Replication involves running the same study on different subjects but identical conditions. So, if you did the study on n chips, then did it again on another n chips that would be replication.
- 4 $\begingroup$ How about an almost-mnemonic: you can replicate conditions but not subjects, though you can repeat a measurement on the same subject. $\endgroup$ – Wayne Commented Mar 5, 2014 at 22:23
Unfortunately, terminology varies quite a bit and in confusing ways, especially between disciplines. There will be many people who will use different terms for the same thing, and/or the same terms for different things (this is a pet peeve of mine). I gather the book in question is this . This is design of experiments from the perspective of engineering (as opposed to the biomedical or the social science perspectives). The Wikipedia entry seems to be coming from the biomedical / social science perspective.
In engineering , an experimental run is typically thought of as having set up your equipment and run it. This produces, in a sense, one data point. Running your experiment again is a replication ; it gets you a second data point. In a biomedical context, you run an experiment and get $N$ data. Someone else replicates your experiment on a new sample with another $N'$ data. These constitute different ways of thinking about what you call an "experimental run". Tragically, they are very confusing.
Montgomery is referring to multiple data from the same run as "repeated measurements". Again, this is common in engineering. A way to think about this from outside the engineering context is to think about a hierarchical analysis, where you are interested in estimating and drawing inferences about the level 2 units . That is, treatments are randomly assigned to doctors and every patient (on whom you take a measurement) is a repeated measurement with respect to the doctor . Within the same doctor, those measurements "reflect differences among the wafers [patients] and other sources of variability within that particular furnace run [doctor's care]".
- $\begingroup$ (+1) Montgomery is referring to multiple data from the same run as "repeated measures" -- the quote actually says "repeated measurements". Is this slight difference in wording important? $\endgroup$ – amoeba Commented Mar 5, 2014 at 21:24
- $\begingroup$ Thanks for the catch, @amoeba. I'm used to saying / thinking / typing "repeated measures". It was just a slip of the fingers. $\endgroup$ – gung - Reinstate Monica Commented Mar 5, 2014 at 21:26
- $\begingroup$ So just to be clear: Montgomery's "repeated measurements" of wafers are not "repeated measures" of wafers, right? I would say that your answer lacks this stated explicitly. You say that Montgomery's "repeated measurements" can be interpreted as repeated measures with respect to furnaces (fair enough), but furnaces are not the object of study in this quote; wafers are. $\endgroup$ – amoeba Commented Mar 5, 2014 at 21:30
- 1 $\begingroup$ @amoeba, off the top of my head, I'm not sure what corresponds to what I would call "repeated measures" in the engineering perspective on DoE. I suppose you could say "Montgomery's 'repeated measurements' can be interpreted as repeated measures with respect to furnaces (fair enough), but furnaces are not the object of study in this quote; wafers are", but M's point is that the repeated measurements are information about "differences among the wafers and other sources of variability within that particular furnace run". Identifying sources of variability is the point of DoE in engineering. $\endgroup$ – gung - Reinstate Monica Commented Mar 5, 2014 at 22:06
- 1 $\begingroup$ Imagine you manufacture gears to be used in a machine. The gears must be 3.000 cm in diameter. If they are too small, there will be play in the gears & they will wear out prematurely, shortening the life of the machine. If they are too large, they will cause the machine to seize up & explode, potentially causing other damage or injury. The idea is to identify sources of variability (& subsequently determine how to control them). This is different from biomedical experiments in which the idea is to find viable treatments. $\endgroup$ – gung - Reinstate Monica Commented Mar 5, 2014 at 22:09
What's going on here is the confusion in terminology. Here in the book, measurements refer to a single experimental trial observation , and the experiment calls for several observations to be made.
The term ' repeated measures ' refers to measuring subjects in multiple conditions .
That is, in a within-subject design (aka crossed design, or repeated measures), you have, say, two conditions: a treatment and a control, and each subject goes through both conditions, usually in a counter-balanced way. This means that you have subjects act as their own control, and this design helps you deal with between-subject variability. One disadvantage of this research design is the problem of carryover effects, where the first condition that the subject goes through adversely influences the other condition.
In other words, don't confuse 'repeated measures' and multiple observations under the same experimental condition.
See also: Are Measurements made on the same patient independent?
- $\begingroup$ (+1) Do you mean that by "repeated measurements" Montgomery did not mean "repeated measures"? I think it's exactly what you mean, and I agree, but I find that your wording could be a bit more explicit about that. $\endgroup$ – amoeba Commented Mar 5, 2014 at 21:20
http://blog.gembaacademy.com/2007/05/08/repetitions-versus-replications/ Repetitions versus Replications May 8, 2007 By Ron 6 Comments Many Six Sigma practitioners struggle to differentiate between a repetition and replication. Normally this confusion arises when dealing with Design of Experiments (DOE).
Let’s use an example to explain the difference.
Sallie wants to run a DOE in her paint booth. After some brainstorming and data analysis she decides to experiment with the “fluid flow” and “attack angle” of the paint gun. Since she has 2 factors and wants to test a “high” and “low” level for each factor she decides on a 2 factor, 2 level full factorial DOE. Here is what this basic design would look like.
Now then, Sallie decides to paint 6 parts during each run. Since there are 4 runs she needs at least 24 parts (6 x 4). These 6 parts per run are what we call repetitions. Here is what the design looks like with the 6 repetitions added to the design.
Finally, since this painting process is ultra critical to her company Sallie decides to do the entire experiment twice. This helps her add some statistical power and serves as a sort of confirmation. If she wanted to she could do the first 4 runs with the day shift staff and the second 4 runs with the night shift staff.
Completing the DOE a second time is what we call replication. You may also hear the term blocking used instead of replicating. Here is what the design looks like with the 6 repetitions and replication in place (in yellow).
So there you have it! That is the difference between repetition and replication.
- $\begingroup$ Not able to see the figures you are referring to. $\endgroup$ – user3024069 Commented Mar 11, 2021 at 7:17
Let me add an interesting factor, lot . In the above example, instead of making six tests with the same lot of paint (which, per above definitions means six repetitions per combination of conditions) she tests with six different paint lots per combination of conditions, which means also 24 total experiments; does this mean she is doing six replications per combination of conditions? Another example: A liquid pigment is measured for color intensity I . The lab method of analysis has two factors: suspension clarification time "T" and sample size W . Each factor has two levels, i.e, short and long T, and small and large W. That makes a 2x2 design. Testing the same lot sample under the four different conditions means there are 4 experiments in total, no repetitions. Testing the same lot twice each time means there would be two repetitions per condition, 8 experiments in total. But what if we test samples from six different lots per condition? Does this mean there are six replications per combination or conditions? The number of experiments would be 24. Now, we may want to make the method more precise and ask the lab technician to repeat the test twice (from the same sample) every time he makes a measurement, and report only the average per lot sample. I assume we could use the averages as a single result per lot sample, and for DoE, say a 2-way layout ANOVA with replications, each lot sample result is a replication . Please comment.
Your Answer
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Not the answer you're looking for? Browse other questions tagged experiment-design terminology or ask your own question .
- Featured on Meta
- Preventing unauthorized automated access to the network
- User activation: Learnings and opportunities
- Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...
Hot Network Questions
- After rolling a die 10 times, what is the probability that the sequence of rolls is non-decreasing?
- nicematrix \midrule undefined
- Showing that a ball is a convex set
- Why do evacuations result in so many injuries?
- What happens if parents refuse to name their newborn child?
- When does derived tensor product commute with arbitrary products?
- How to reject non-physical solutions to the wave equation?
- What does "we are out"mean here?
- Is there a fast/clever way to return a logical vector if elements of a vector are in at least one interval?
- \ContinuedFloat with three table not working
- In big band horn parts, should I write double flats (sharps) or the enharmonic equivalent?
- Does this work for page turns in a busy violin part?
- Remove an entire inner list that has any zeros
- What are major reasons why Republicans support the death penalty?
- Proof of existence of algebraic closure with Zorn's lemma
- Is it possible to know where the Sun is just by looking at the Moon?
- How can I make a 2D FTL-lane map on a galaxy-wide scale?
- Find all tuples with prescribed ordering
- God the Father punished the Son as sin-bearer: how does that prove God’s righteousness?
- What is the average result of rolling Xd6 twice and taking the higher of the two sums?
- MegaRAID device can't start in Windows, error code 10 I/O adapter hardware error has occurred
- How to format units inside math environment?
- How to fix bottom of stainless steel pot that has been separated from its main body?
- Information about novel 'heavy weapons'
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- 15 December 2021
- Correction 16 December 2021
Replicating scientific results is tough — but essential
You have full access to this article via your institution.
Funders and publishers need to take replication studies much more seriously than they do at present. Credit: Anne-Christine Poujoulat/AFP/Getty
Replicabillity — the ability to obtain the same result when an experiment is repeated — is foundational to science. But in many research fields it has proved difficult to achieve. An important and much-anticipated brace of research papers now show just how complicated, time-consuming and difficult it can be to conduct and interpret replication studies in cancer biology 1 , 2 .
Nearly a decade ago, research teams organized by the non-profit Center for Open Science in Charlottesville, Virginia, and ScienceExchange, a research-services company based in Palo Alto, California, set out to systematically test whether selected experiments in highly cited papers published in prestigious scientific journals could be replicated. The effort was part of the high-profile Reproducibility Project: Cancer Biology (RPCB) initiative. The researchers assessed experimental outcomes or ‘effects’ by seven metrics, five of which could apply to numerical results. Overall, 46% of these replications were successful by three or more of these metrics, such as whether results fell within the confidence interval predicted by the experiment or retained statistical significance.
The project was launched in the wake of reports from drug companies that they could not replicate findings in many cancer-biology papers. But those reports did not identify the papers, nor the criteria for replication. The RPCB was conceived to bring research rigour to such retrospective replication studies.
Initial findings
One of the clearest findings was that the effects of an experimental treatment — such as killing cancer cells or shrinking tumours — were drastically smaller in replications, overall 85% smaller, than what had been reported originally. It’s hard to know why. There could have been statistical fluke, for example; bias in the original study or in the replication; or lack of know-how by the replicators that caused the repeated study to miss some essential quality of the original.
Half of top cancer studies fail high-profile reproducibility effort
The project also took more than five years longer than expected, and, despite taking the extra time, the teams were able to assess experiments in only one-quarter of the experiments they had originally planned to cover. This underscores the fact that such assessments take much more time and effort than expected.
The RPCB studies were budgeted to cost US$1.3 million over three years. That was increased to $1.5 million, not including the costs of personnel or project administration.
None of the 53 papers selected contained enough detail for the researchers to repeat the experiments. So the replicators had to contact authors for information, such as how many cells were injected, by what route, or the exact reagent used. Often, these were details that even the authors could not provide because the information had not been recorded or laboratory members had moved on. And one-third of authors either refused requests for more information or did not respond. For 136 of the 193 experimental effects assessed, replicators also had to request a key reagent from the original authors (such as a cell line, plasmid or model organism) because they could not buy it or get it from a repository. Some 69% of the authors were willing to share their reagents.
Openness and precision
Since the reproducibility project began, several efforts have encouraged authors to share more-precise methodological details of their studies. Nature , along with other journals, introduced a reproducibility checklist in 2013. It requires that authors report key experimental data, such as the strain, age and sex of animals used. Authors are also encouraged to deposit their experimental protocols in repositories, so that other researchers can access them.
Understand the real reasons reproducibility reform fails
Furthermore, the ‘Landis 4’ criteria were published in 2012 to promote rigorous animal research. They include the requirement for blinding, randomization and statistically assessed sample sizes. Registered Reports, an article format in which researchers publish the design of their studies before doing their experiments, is another key development. It means that ‘null effects’ are more likely to be published than buried in a file drawer . The project team found that null effects were more likely to be replicated; 80% of such studies passed by three metrics, compared with only 40% of ‘positive effects’.
Harder to resolve is the fact that what works in one lab might not work in another, possibly because of inherent variation or unrecognized methodological differences. Take the following example: one study tracked whether a certain type of cell contributes to blood supply in tumours 3 . Tracking these cells required that they express a ‘reporter’ molecule (in this case, green fluorescent protein). But, despite many attempts and tweaks, the replicating team couldn’t make the reporter sufficiently active in the cells to be tracked 4 , so the replication attempt was stopped.
The RPCB teams vetted replication protocols with the original authors, and also had them peer reviewed. But detailed advance agreement on experimental designs will not necessarily, on its own, account for setbacks encountered when studies are repeated — in some cases, many years after the originals. That is why another approach to replication is used by the US Defense Advanced Research Projects Agency (DARPA). In one DARPA programme, research teams are assigned independent verification teams. The research teams must help to troubleshoot and provide support for the verification teams so that key results can be obtained in another lab even before work is published. This approach is built into programme requirements: 3–8% of funds allocated for research programmes go towards such verification efforts 5 .
Such studies also show that researchers, research funders and publishers must take replication studies much more seriously. Researchers need to engage in such actions, funders must ramp up investments in these studies, and publishers, too, must play their part so that researchers can be confident that this work is important. It is laudable that the press conference announcing the project’s results included remarks and praise by the leaders of the US National Academies of Sciences, Engineering, and Medicine and the National Institutes of Health. But the project was funded by a philanthropic investment fund, Arnold Ventures in Houston, Texas.
The entire scientific community must recognize that replication is not for replication’s sake, but to gain an assurance central to the progress of science: that an observation or result is sturdy enough to spur future work. The next wave of replication efforts should be aimed at making this everyday essential easier to achieve.
Nature 600 , 359-360 (2021)
doi: https://doi.org/10.1038/d41586-021-03736-4
Updates & Corrections
Correction 16 December 2021 : This article originally mischaracterized the RPCB’s analysis of replication attempts. Rather than recording seven experimental outcomes, it assessed experimental effects using seven metrics, and it also assessed 193 experimental effects not 193 experiments.
Errington, T. M., Denis, A., Perfito, N., Iorns, E. & Nosek, B. A. eLife 10 , e67995 (2021).
Article PubMed Google Scholar
Errington, T. M. et al. eLife 10 , e71601 (2021).
Ricci-Vitiani, L. et al. Nature 468 , 824–828 (2010).
Errington, T. M. et al. eLife 10 , e73430 (2021).
Raphael, M. P., Sheehan, P. E. & Vora, G. J. Nature 579 , 190–192 (2020).
Download references
Reprints and permissions
Related Articles
- Research data
- Research management
- Institutions
The trials and triumphs of sustainable science
Spotlight 25 SEP 24
Do AI models produce more original ideas than researchers?
News 20 SEP 24
AI’s international research networks mapped
Nature Index 18 SEP 24
More measures needed to ease funding competition in China
Correspondence 24 SEP 24
Gender inequity persists among journal chief editors
The human costs of the research-assessment culture
Career Feature 09 SEP 24
Greening science: what’s in it for you?
Can South Korea regain its edge in innovation?
Nature Index 21 AUG 24
Faculty Positions& Postdoctoral Research Fellow, School of Optical and Electronic Information, HUST
Job Opportunities: Leading talents, young talents, overseas outstanding young scholars, postdoctoral researchers.
Wuhan, Hubei, China
School of Optical and Electronic Information, Huazhong University of Science and Technology
Faculty Positions in Neurobiology, Westlake University
We seek exceptional candidates to lead vigorous independent research programs working in any area of neurobiology.
Hangzhou, Zhejiang, China
School of Life Sciences, Westlake University
Full-Time Faculty Member in Molecular Agrobiology at Peking University
Faculty positions in molecular agrobiology, including plant (crop) molecular biology, crop genomics and agrobiotechnology and etc.
Beijing, China
School of Advanced Agricultural Sciences, Peking University
Faculty Positions Open, ShanghaiTech University
6 major schools are now hiring faculty members.
Shanghai, China
ShanghaiTech University
Faculty Positions at Great Bay University, China
We are now seeking outstanding candidates in Physics, Chemistry and Physical Sciences.
Dongguan, Guangdong, China
Great Bay University, China (GBU)
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
- Quality Improvement
- Talk To Minitab
Repeated Measures Designs: Benefits, Challenges, and an ANOVA Example
Topics: ANOVA , Data Analysis , Statistics
Repeated measures designs don’t fit our impression of a typical experiment in several key ways. When we think of an experiment, we often think of a design that has a clear distinction between the treatment and control groups. Each subject is in one, and only one, of these non-overlapping groups. Subjects who are in a treatment group are exposed to only one type of treatment. This is the common independent groups experimental design.
These ideas seem important, but repeated measures designs throw them out the window! What if you have a subject in the control group and all the treatment groups? Is this a problem? Not necessarily. In fact, repeated measures designs can provide tremendous benefits!
In this post, I’ll highlight the advantages and disadvantages of using a repeated measures design and show an example of how to analyze a repeated measures design using ANOVA in Minitab .
What Are Repeated Measures Designs?
As you'd expect, repeated measures designs involve multiple measurements of each subject. That’s no surprise, but there is more to it than just that. In repeated measures designs, the subjects are typically exposed to all of the treatment conditions. Surprising, right?
In this type of design, each subject functions as an experimental block . A block is a categorical variable that explains variation in the response variable that is not caused by the factors that you really want to know about. You use blocks in designed experiments to minimize bias and variance of the error because of these nuisance factors.
In repeated measures designs, the subjects are their own controls because the model assesses how a subject responds to all of the treatments. By including the subject block in the analysis, you can control for factors that cause variability between subjects. The result is that only the variability within subjects is included in the error term, which usually results in a smaller error term and a more powerful analysis.
The Benefits of Repeated Measures Designs
More statistical power : Repeated measures designs can be very powerful because they control for factors that cause variability between subjects.
Fewer subjects : Thanks to the greater statistical power , a repeated measures design can use fewer subjects to detect a desired effect size. Further sample size reductions are possible because each subject is involved with multiple treatments. For example, if an independent groups design requires 20 subjects per experimental group, a repeated measures design may only require 20 total.
Quicker and cheaper : Fewer subjects need to be recruited, trained, and compensated to complete an entire experiment.
Assess an effect over time: Repeated measures designs can track an effect overtime, such as the learning curve for a task. In this situation, it’s often better to measure the same subject at multiple times rather than different subjects at one point in time for each.
Managing the Challenges of Repeated Measures Designs
Repeated measures designs have some disadvantages compared to designs that have independent groups. The biggest drawbacks are known as order effects, and they are caused by exposing the subjects to multiple treatments. Order effects are related to the order that treatments are given but not due to the treatment itself. For example, scores can decrease over time due to fatigue, or increase due to learning. In taste tests, a dry wine may get a higher rank if it was preceded by a dryer wine and a lower rank if preceded by a sweeter wine. Order effects can interfere with the analysis’ ability to correctly estimate the effect of the treatment itself.
There are various methods you can use to reduce these problems in repeated measures designs. These methods include randomization, allowing time between treatments, and counterbalancing the order of treatments among others. Finally, it’s always good to remember that an independent groups design is an alternative for avoiding order effects.
Below is a very common crossover repeated measures design. Studies that use this type of design are as diverse as assessing different advertising campaigns, training programs, and pharmaceuticals. In this design, subjects are randomly assigned to the two groups and you can add additional treatments and a control group as needed.
There are many different types of repeated measures designs and it’s beyond the scope of this post to cover all of them. Each study must carefully consider which design meets the specific needs of the study.
For more information about different types of repeated measures designs, how to arrange the worksheet, and how to perform the analysis in Minitab, see Analyzing a repeated measures design . Also, learn how to use Minitab to analyze a Latin square with repeated measures design . Now, let’s use Minitab to perform a complex repeated measures ANOVA!
Example of Repeated Measures ANOVA
An experiment was conducted to determine how several factors affect subject accuracy in adjusting dials. Three subjects perform tests conducted at one of two noise levels. At each of three time periods, the subjects monitored three different dials and make adjustments as needed. The response is an accuracy score. The noise, time, and dial factors are crossed, fixed factors. Subject is a random factor, nested within noise. Noise is a between-subjects factor, time and dial are within-subjects factors.
Here are the data to try this yourself. If you're not already using our software and you want to play along, you can get a free 30-day trial version .
To analyze this repeated measures design using ANOVA in Minitab, choose: Stat > ANOVA > General Linear Model > Fit General Linear Model , and follow these steps:
- In Responses , enter Score .
- In Factors , enter Noise Subject ETime Dial .
- Click Random/Nest .
- Under Nesting , enter Noise in the cell to the right of Subject .
- Under Factor type , choose Random in the cell to the right of Subject .
- Click OK , and then click Model .
- Under Factors and Covariates , select all of the factors.
- From the pull-down to the right of Interactions through order , choose 3 .
- Click the Add button.
- From Terms in model , choose Subject*Etime*Dial(Noise) and click Delete .
- Click OK in all dialog boxes.
Below are the highlights.
You can gain some idea about how the design affected the sensitivity of the F-tests by viewing the variance components below. The variance components used in testing within-subjects factors are smaller (7.13889, 1.75, 7.94444) than the between-subjects variance (65.3519). It is typical that a repeated measures model can detect smaller differences in means within subjects as compared to between subjects.
Of the four interactions among fixed factors, the noise by time interaction was the only one with a low p-value (0.029). This implies that there is significant evidence for judging that a subjects' sensitivity to noise changed over time. There is also significant evidence for a dial effect (p-value < 0.0005). Among random terms, there is significant evidence for time by subject (p-value = 0.013) and subject (p-value < 0.0005) effects.
In closing, I'll graph these effects using Stat > ANOVA > General Linear Model > Factorial Plots . This handy tool takes our ANOVA model and produces a main effects plot and an interactions plot to help us understand what the results really mean.
You Might Also Like
- Trust Center
© 2023 Minitab, LLC. All Rights Reserved.
- Terms of Use
- Privacy Policy
- Cookies Settings
- White papers
- Case Studies
- Biotech & Pharma
- Agriculture & Food science
- Cancer research
- Diagnostics & Analytics
- Molecular Biology
- Microbiology
- Video Tutorials
- Life in the lab
The Importance of Replicable Data
Replicable data is the crux of any scientific research. It is crucial if you plan to publish your research in the future. Data replicability simply means that it is possible for an experiment to be carried out again, either by the same scientist or another. If data is not replicable, it may mean that your blood, sweat and tears could be all for nothing. Alas we want to make sure that doesn’t happen! Read on so you can correctly execute your experiments without having to send them to execution. Importance of data replicability
Why is the ability to repeat experiments important?
How can you ensure data replicability.
1. Reliability
Replication lets you see patterns and trends in your results. This is affirmative for your work, making it stronger and better able to support your claims. This helps maintain integrity of data. On the other hand, repeating experiments allows you to identify mistakes, flukes, and falsifications. Mistakes may have been the misreading of a result or incorrectly entering data. These are sometimes inevitable as we are only human. However, replication can identify falsifications which can carry serious implications in the future.
2. Peer review
If someone is to thoroughly peer review your work, then they would carry out the experiments again themselves.. If someone were wanting to replicate an experiment,the first scientist should do everything possible to allow replicability.
3. Publications
If your work is to be published, it is crucial for there to be a section on the methods of your work. Hence this should be replicable in order to enable others to repeat your methodology. Also, if your methods are reliable, the results are more likely to be reliable. Furthermore, it will indicate whether your data was collected in a generally accepted way, which others are able to repeat.
4. Variable checking
Being able to replicate experiments and the resulting data allows you to check the extraneous variables. These are variables that you are not actually testing, but that may be influencing your results. Through replication, you can see how and if any extraneous variables have affected your experiment and if they need to be made note of. Through replication, you are more likely to be able to identify the undesirable variables and then decrease or control their influence where possible.
5. Avoid retractions
Replicating data yourself, as well as others doing it, is advisable before you publish the work, if that is your intention. This is because if the data has been replicated and confirmed before publication, it is again more likely to have integrity. In turn, the chance of your paper being retracted decreases. Making it easier for others to replicate data then makes it easier for them to support your data and claims, so it is definitely in your interest to make data replicable.
1. Record everything you do
While carrying out your experiment, you should record every step you take in the process. This is not only because it is good practice and is often required to track what you are doing, but it provides a log to look back at. This, in turn, gives you something to refer back to and enables you to repeat the experiment. It also makes it easier for others to follow the same steps to see if they obtain the same results, which is the whole aim of replicability.
2. Be totally transparent
Sometimes it can be tempting to ignore mistakes or write results more favorably than they actually came out. This also applies to when you repeat experiments, if one is a bit of an outlier, don’t brush it under the rug. That is the point of repeats, to check your methods, equipment. If you are not truthful with what others will be reading and carrying out experiments from in the future, this could significantly skew their results.
3. Make your raw data available
You should make your raw data available for others, so long as it does not compromise patents or such. This would be accompanied by the step-by-step process that you went through and the description of each step.. Having the raw data to compare when repeating experiments yourself or when others replicate it in the future makes it easier since you have something to refer back to.
4. Store you data in an electronic lab notebook
All of these problems with regards to data reproducibility can be tackled using an electronic lab notebook. ELNs’ clever data management allows you to enter data directly into your lab notebook, with an automatic full audit trail. This includes dates and times of creation, editing, deletion, signing and witnessing. Moreover, with an ELN you can create and share protocols or templates, thus making reproducible instructions for future use. If you would like to find out more as to why an ELN may just change your life (in the lab), click here for a comprehensive guide on ELNs
Data reproducibility is one of three main conditions for data integrity. Research also has to have data reproducibility and research reproducibility . These may sound similar, but they are actually quite different. Follow the links to find out the difference between data and research reproducibility.
Try your digital lab notebook
Share this:
Related posts.
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
- First name *
- Last name *
- Your DZNE site *
- List of group members to be invited *
- By continuing, you agree to Labfolder's Terms of Use and acknowledge reading the Privacy Policy and the DZNE data protection notification
- Comments This field is for validation purposes and should be left unchanged.
Gestion des groupes et sous-groupes
300GB de stockage de données *
Paramètres de partage personnalisés
Signer & Cosigner
Export XHTML
Possibilité d’installation sur serveur local
Introduction personnalisée
Conseils en gestion de données
*pour la version cloud. Si vous optez pour la version serveur, vous aurez autant de place de stockage que le permet votre serveur.
Gruppen- und Untergruppen-Verwaltung
300GB Datenspeicher *
CBenutzerdefinierte Freigabeeinstellungen
Digitale Unterschrift nach dem Vieraugenprinzip
XHTML export
Lokale Server-Installation verfügbar
Persönliche Einarbeitung
Datenmanagement Beratung
*Cloud-Version. In der Server-Version ist Ihr Speicherplatz nur von Ihrem Server begrenzt.
Group and sub-group management
300GB for data storage per user *
Custom share settings
- Sign & Witness
Local Server installation available
Personalized onboarding
Data management consultancy
*Cloud version. In the server version, you can have as much storage as your server can provide.
[ninja_form id=445]
[ninja_form id=442]
[ninja_form id=439]
- Group management
- Unlimited sub-groups
- Advanced share settings
- XHTML export for data archiving
- FDA CFR 21 part 11 compliance
- ISO 9001, 13458, 15189 and 17025
- GLP and GMP compliance
- Personalised onboarding
- Local server installation available
- 300GB storage per user in the Cloud
How Many Times Should an Experiment be Replicated?
Cite this chapter.
- Natalia Juristo 2 &
- Ana M. Moreno 2
359 Accesses
1 Citations
An important decision in any problem of experimental design is to determine how many times an experiment should be replicated. Note that we are referring to the internal replication of an experiment. Generally, the more it is replicated, the more accurate the results of the experiment will be. However, resources tend to be limited, which places constraints on the number of replications. In this chapter, we will consider several methods for determining the best number of replications for a given experiment. We will focus on one-factor designs, but the general-purpose methodology can be extended to more complex experimental situations.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
- Durable hardcover edition
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Unable to display preview. Download preview PDF.
Author information
Authors and affiliations.
Universidad Politecnica de Madrid, Spain
Natalia Juristo & Ana M. Moreno
You can also search for this author in PubMed Google Scholar
Rights and permissions
Reprints and permissions
Copyright information
© 2001 Springer Science+Business Media New York
About this chapter
Juristo, N., Moreno, A.M. (2001). How Many Times Should an Experiment be Replicated?. In: Basics of Software Engineering Experimentation. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3304-4_15
Download citation
DOI : https://doi.org/10.1007/978-1-4757-3304-4_15
Publisher Name : Springer, Boston, MA
Print ISBN : 978-1-4419-5011-6
Online ISBN : 978-1-4757-3304-4
eBook Packages : Springer Book Archive
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
Log in to EMBO Press
Change password, your password must have 8 characters or more and contain 3 of the following:.
- Lowercase letter (a-z),
- Uppercase letter (A-Z),
- Special Character,
- Number (0-9)
Password Changed Successfully
Your password has been changed
Create a new account
Forgot your password.
Enter your email address below.
Please check your email for instructions on resetting your password. If you do not receive an email within 10 minutes, your email address may not be registered, and you may need to create a new EMBO Press account.
Request Username
Can't sign in? Forgot your username?
Enter your email address below and we will send you your username
If the address matches an existing account you will receive an email with instructions to retrieve your username
All EMBO Press journals Open Access as of 1 January 2024 - read the FAQs
- This Journal
Replicates and repeats—what is the difference and is it significant?: A brief discussion of statistics and experimental design
Information & authors, metrics & citations.
…replicates are not independent tests of the hypothesis, and so they cannot provide evidence of the reproducibility of the main results
Plate number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
WT + saline | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+ saline | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 |
WT + HH‐CSF | 61 | 59 | 55 | 64 | 57 | 69 | 63 | 51 | 61 | 61 |
+ HH‐CSF | 48 | 34 | 50 | 59 | 37 | 46 | 44 | 39 | 51 | 47 |
- Download figure
- Download PowerPoint
…by showing the data for only one plate we are breaking the fundamental rule of science that all relevant data should be reported and subjected to analysis…
Sidebar A | Fundamental principles of statistical design
To be convincing, a scientific paper needs to provide evidence that the results are reproducible
…should we dispense with replicates altogether? The answer, of course, is ‘no’. Replicates serve as internal quality checks on how the experiment was performed
Replicates […] cannot be used to infer conclusions
…if statistics, error bars and P ‐values for replicates are shown, they can mislead the readers of a paper who assume that they are relevant to the paper's conclusions
Sidebar B | Error checklist when reading papers
Conflict of interest, acknowledgements, biographies.
Information
Published in, article versions, submission history, permissions, affiliations, download citations.
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Select your manager software from the list below and click Download.
Citing Literature
- Matteo Audano, Silvia Pedretti, Gaia Cermenati, Elisabetta Brioschi, Giuseppe Riccardo Diaferia, Serena Ghisletti, Alessandro Cuomo, Tiziana Bonaldi, Franco Salerno, Marina Mora, Liliana Grigore, Katia Garlaschelli, Andrea Baragetti, Fabrizia Bonacina, Alberico Luigi Catapano, Giuseppe Danilo Norata, Maurizio Crestani, Donatella Caruso, Enrique Saez, Emma De Fabiani, Nico Mitro, Zc3h10 is a novel mitochondrial regulator, EMBO reports, 10.15252/embr.201745531, 19 , 4, (2018). Abstract
View Options
View options, copy the content link.
Copying failed.
Share on social media
Next article.
- Skip to secondary menu
- Skip to main content
- Skip to primary sidebar
Statistics By Jim
Making statistics intuitive
Repeated Measures Designs: Benefits and an ANOVA Example
By Jim Frost 25 Comments
Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and treatment groups that have clear divisions between them. Each subject is in only one of these groups.
These rules for experiments seem crucial, but repeated measures designs regularly violate them! For example, a subject is often in all the experimental groups. Far from causing problems, repeated measures designs can yield significant benefits.
In this post, I’ll explain how repeated measures designs work along with their benefits and drawbacks. Additionally, I’ll work through a repeated measures ANOVA example to show you how to analyze this type of design and interpret the results.
To learn more about ANOVA tests, read my ANOVA Overview .
Drawbacks of Independent Groups Designs
To understand the benefits of repeated measures designs, let’s first look at the independent groups design to highlight a problem. Suppose you’re conducting an experiment on drugs that might improve memory. In a typical independent groups design, each subject is in one experimental group. They’re either in the control group or one of the treatment groups. After the experiment, you score them on a memory test and then compare the group means.
In this design, you obtain only one score from each subject. You don’t know whether a subject scores higher or lower on the test because of an inherently better or worse memory. Some portion of the observed scores is based on the memory traits of the subjects rather than because of the drug. This example illustrates how people introduce an uncontrollable factor into the study.
Imagine that a person in the control group scores high while someone else in a treatment group scores low, not due to the treatment, but due to differing baseline memory capabilities. This “fuzziness” makes it harder to assess differences between the groups.
If only there were some way to know whether subjects tend to measure high or low. We need some way of incorporating each person’s variability into the model. Oh wait, that’s what we’re talking about—repeated measures designs!
How Repeated Measures Designs Work
As the name implies, you need to measure each subject multiple times in a repeated measures design. Shocking! They are longitudinal studies. However, there’s more to it. The subjects usually experience all of the experimental conditions, which allow them to serve as experimental blocks or as their own control. Statisticians refer to this as dependent samples because one observation provides information about another observation. What does that mean? Let me break this down one piece at a time.
The effects of the controllable factors in an experiment are what you really want to learn. However, as we saw in our example above, there can also be uncontrolled sources of variation that make it harder to learn about those things that we can control.
Experimental blocks explain some of the uncontrolled variability in an experiment. While you can’t control the blocks, you can include them in the model to reduce the amount of unexplained variability. By accounting for more of the uncontrolled variability, you can learn more about the controllable variables that are the entire point of your experiment.
Let’s go back to our longitudinal study for the drug’s effectiveness. We saw how subjects are an uncontrolled factor that makes it harder to assess the effects of the drugs. However, if we took multiple measurements from each person, we gain more information about their personal outcome measures under a variety of conditions. We might see that some subjects tend to score high or low on the memory tests. Then, we can compare their scores for each treatment group to their general baseline.
And, that’s how repeated measures designs work. You understand each person better so that you can place their personal reaction to each experimental condition into their particular context. Repeated measures designs use dependent samples because one observation provides information about another observation.
Related posts : Independent and Dependent Samples and Longitudinal Studies: Overview, Examples & Benefits .
Benefits of Repeated Measures Designs
In statistical terms, we say that experimental blocks reduce the variance and bias of the model’s error by controlling for factors that cause variability between subjects. The error term contains only the variability within-subjects and not the variability between subjects. The result is that the error term tends to be smaller, which produces the following benefits:
Greater statistical power : By controlling for differences between subjects, this type of design can have much more statistical power . If an effect exists, your statistical test is more likely to detect it.
Requires a smaller number of subjects: Because of the increased power, you can recruit fewer people and still have a good probability of detecting an effect that truly exists. If you’d need 20 people in each group for a design with independent groups, you might only need a total of 20 for repeated measures.
Faster and less expensive: The time and costs associated with administering repeated measures designs can be much lower because there are fewer people to recruit, train, and compensate.
Time-related effects: As we saw, an independent groups design collects only one measurement from each person. By collecting data from multiple points in time for each subject, repeated measures designs can assess effects over time. This tracking is particularly useful when there are potential time effects, such as learning or fatigue.
Managing the Challenges of Repeated Measures Designs
Repeated measures designs have some great benefits, but there are a few drawbacks that you should consider. The largest downside is the problem of order effects, which can happen when you expose subjects to multiple treatments. These effects are associated with the treatment order but are not caused by the treatment.
Order effects can impede the ability of the model to estimate the effects correctly. For example, in a wine taste test, subjects might give a dry wine a lower score if they sample it after a sweet wine.
You can use different strategies to minimize this problem. These approaches include randomizing or reversing the treatment order and providing sufficient time between treatments. Don’t forget, using an independent groups design is an efficient way to eliminate order effects.
Crossover Repeated Measures Designs
I’ve diagramed a crossover repeated measures design, which is a very common type of experiment. Study volunteers are assigned randomly to one of the two groups. Everyone in the study receives all of the treatments, but the order is reversed for the second group to reduce the problems of order effects. In the diagram, there are two treatments, but the experimenter can add more treatment groups.
Studies from a diverse array of subject areas use crossover designs. These areas include weight loss plans, marketing campaigns, and educational programs among many others. Even our theoretical memory pill study can use it.
Repeated measures designs come in many flavors, and it’s impossible to cover them all here. You need to look at your study area and research goals to determine which type of design best meets your requirements. Weigh the benefits and challenges of repeated measures designs to decide whether you can use one for your study.
Repeated Measures ANOVA Example
Let’s imagine that we used a repeated measures design to study our hypothetical memory drug. For our study, we recruited five people, and we tested four memory drugs. Everyone in the study tried all four drugs and took a memory test after each one. We obtain the data below. You can also download the CSV file for the Repeated_measures_data .
In the dataset, you can see that each subject has an ID number so we can associate each person with all of their scores. We also know which drug they took for each score. Together, this allows the model to develop a baseline for each subject and then compare the drug specific scores to that baseline.
How do we fit this model? In your preferred statistical software package, you need to fit an ANOVA model like this:
- Score is the response variable.
- Subject and Drug are the factors,
- Subject should be a random factor .
Subject is a random factor because we randomly selected the subjects from the population and we want them to represent the entire population. If we were to include Subject as a fixed factor, the results would apply only to these five people and would not be generalizable to the larger population.
Drug is a fixed factor because we picked these drugs intentionally and we want to estimate the effects of these four drugs particularly.
Repeated Measures ANOVA Results
After we fit the repeated measures ANOVA model, we obtain the following results.
The P-value for Drug is 0.000. This low P-value indicates that all four group means are not equal. Because the model includes Subjects, we know that the Drug effect and its P-value accounts for the variability between subjects.
Below is the main effects plot for Drug, which displays the fitted mean for each drug.
Clearly, drug 4 is the best. Tukey’s multiple comparisons (not shown) indicate that Drug 4 – Drug 3 and Drug 4 – Drug 2 are statistically significant.
Have you used a repeated measures design for your study?
Share this:
Reader Interactions
December 15, 2023 at 2:24 pm
thanks for these posts and comments. question – in a repeated measures analysis within SPSS, the first output is the multivariate effect, and the second is the within-subjects effect. I imagine both analyses approach the effect from a different point of view. I’m trying to understand the difference, similarity, when to use multivariate vs. within-subjects. My data has three time points. one between-subjects factor.
November 30, 2022 at 11:14 am
Hi Jim – Thank you for your posts, which are always comprehensive and value-adding.
If my subjects are not individual respondents, but an aggregated group of respondents in a geography (example: respondents in a geographic area forms my subjects G1, G2, …,Gn), do I need to normalize the output variable to handle the fluctuation across the subjects due to population variations across geographies? Or will the Repeated Measures ANOVA handle that if I add Subject (Geography) as my factor?
September 26, 2022 at 6:37 am
Hi and thank you for a calrifying page! But I still haven’t found what I’m looking for… I have conducted a test with 2 groups, approx 25 persons randomly allocated to each group. They were given two different drug treatments. We measured several variables before the drug was given. After the drug was given, the same variables were measured after 1, 5, 20 and 60 minutes. Let’s say these variables were AA, BB, CC, DD and EE. Let’s assume they are normally distributed at all times. Variable types are Heart Rate, Blood Pressure, and such. How am I supposed to perform statistics in this case? Just comparing drug effects at each time point will inevitably produce Type I errors? These are Repeated Measurements but is really R.M. ANOVA appropriate here?
September 26, 2022 at 8:28 pm
Hi Tony, yes, I think you need to use repeated measures MANOVA. That should allow you to accomplish all that while controlling Type I errors by avoiding multiple tests.
August 3, 2022 at 3:56 am
Hi Jim, I have 3 samples(say A, B&C) that being tasted and rated in hedonic scale by panelists. Each panelist will be given 3 samples(one at a time) to be tasted or evaluated. A total of 100 respondents are selected from particular population. can repeated measure ANOVA be used? this is consider related right? if not, can you suggest the appropriate test to use.
June 25, 2022 at 1:06 am
I’m very mathematically challenged and your posts really simplify things. I’m trying to help a singer determine which factors interactively determine his commission during livestreams, as the commission is different each time. For each date, I have the amount of coins gifted from viewers, the average viewer count, the total number of viewers, and the average time viewers have spent watching. The dependent variable is the commission. Would I use an ANOVA for this?
December 7, 2021 at 11:24 pm
Hi Jim Please if I have the following data, which Test is most appropriate Comparing mean of BMI, diastolic pressure, cholesterol between two age groups (15 to 30) and above 30 years? Thank you
December 9, 2021 at 6:21 pm
Hi Salma, I’m not sure about your IV and DVs. What’s your design? I can’t answer your question without knowing what you want to test. What do you want to learn using those variables?
October 25, 2021 at 4:32 pm
Jim, Isn’t there a sphericity requirement for data in repeated measures anova?
October 25, 2021 at 10:48 pm
Spherical errors are those that have no autocorrelation and have a constant variance. In my post about OLS assumptions , they’re assumptions #4 and #5 with a note to that effect in the text for #5. It’s a standard linear models assumption.
May 6, 2021 at 5:09 pm
Hi Jim, we have data from analysis of different sources of gluten free flour analysed together and compared to wheat flour for different properties. What would be the best test to use in this case please.
September 14, 2020 at 6:41 pm
Hi Jim, I found you post helpful and was wondering if the repeating measures ANOVA would be an appropriate analysis for a project I am working on. I have collected pre, post, and delayed post survey data. All participants first complete a pre survey, then engage in an intervention, direct after the intervention they all complete a post survey. Then 4 months later they all complete a delayed post survey. My interest is to see if there are any long-term impact of the intervention. Would the repeating measures ANOVA be appropriate to use to compare the participants’ pre, post, and delayed post scores?
June 12, 2020 at 8:28 pm
Thank you for another great post! I am doing a study protocol and the primary hypothesis is that a VR intervention will show improvement in postural control (4 CoP parameters), comparing the experimental and inactive control group (post-intervention). I was advised to use a repeated measures ANOVA to test the primary hypothesis but reading your post made me realize that might not be correct because my study subjects are not experiencing all the experimental conditions. Do you recommend another type of ANOVA?
Thanks in advance.
June 12, 2020 at 9:07 pm
I should probably clarify this better in the post. The subject don’t have to experience all the treatment conditions, but many studies use these designs for this reason. But, it’s not a requirement. If you’ve measured your subjects multiple times, you probably do need to use a repeated measures design.
June 2, 2020 at 11:31 am
Thank you so much for your helpful posts about statistics! I’ve tried doing a repeated measures analysis but have gotten a bit confused. I administered 3 different questionnaires on social behavior (all continuous outcomes, but on different scales [two ranging 0-50, the third 0-90]) on 4 different time points. The questionnaires are correlated to each other so I would prefer to put them in the same analysis. I was planning on doing this by making one within subject variable “time” and one within subject variable “questionnaire”. I would like to know what the effect is of time on social behavior and whether this effect is different depending on the specific questionnaire used. Is it ok to add these questionnaires in the same analysis even though they do not have the same range of scores or should I first center the total scores of the questionnaires?
Many thanks, Laura
June 3, 2020 at 7:44 pm
ANOVA can handle DVs that use different measurement units/scales without problems. However, if you want to determine which DV/survey is more important, you might consider standardizing them. Read more about that in my post about identifying the most important variables in your model . It discusses it in the regression context but the same applies to ANOVA.
You’ll obtain valid and consistent results using either standardized and unstandardized values. It just depends on what you want to learn.
I hope that helps!
May 30, 2020 at 4:53 pm
Hi Jim, thanks for your effort and time to make statics understandable to the wider public. Your style of teaching is quite simple.
I didn’t any questions nor responses for 2019 to data, but I hope you’re still there anyway.
I have this stat problem I need your opinion on. There are 8 drinking water wells clustered at different distances around an injection well. To simulate direction and concentration of contaminant within subsurface around the well area, a contaminant was injected/pumped continuously into the subsurface through the injection well. This happened for 6 weeks; pH samples were taken from the 8 wells daily for the 6 weeks. I need to test for 2 things, namely: 1. Is there any significant statistical difference in pH within the wells within the 6 weeks (6 weeks as a single time period) 2. Is there any statistical significant difference in pH for each well within the weeks (6 weeks time step)
Which statistical test best captures this analysis? I think of repeated measure ANOVA, what do you think please?
May 30, 2020 at 4:55 pm
Yes, because you’re looking at the same subjects (wells) over time, you need repeated measures ANOVA.
December 24, 2018 at 2:31 pm
Name: Vidya Kulkarni
Email: [email protected]
Comment: Shall appreciate a reply. My friend has performed experiments with rats in 3 groups by administering certain drug. Group 1 is not given any drug, Group 2 is given 50 mg and group 3 is given 100 mg. In each group there are 3 rats and for each of these rats their their tumor volume has been recorded for 9 consecutive days. Thus for each group we have 27 observations. We want to show the difference in their means is significantly different at some confidence level. Please let me know what statistical test should we use and if you can send a link to some similar example, that would be a great help. Looking forward to quick help. Thanks
December 11, 2018 at 8:44 pm
I wanted to tank you for your post! It was really helpful for me. In my design I have 30 subjects with 10 readings (from different electrodes on the scalp) for each subject in two sessions (immediate test, post test). I used repeated measure anova and I found a significant main effect of sessions and also significant interaction of sessions and electrodes. Main effect means I have significant difference between session1 data and session2 data but I am not sure about the interaction effect. I would appreciate if you help me with that.
Thanks, Mary
December 12, 2018 at 9:32 am
I’m not sure what your outcome variable is or what the electrodes variable measures precisely. But, here’s how you’d interpret the results generally.
The relationship between sessions and your outcome variable depends on the value of your electrodes variable. While there is a significant difference between sessions, that difference depends on the value of electrodes. If you create an interactions plot, it should be easier to see what is going on! For more information, see my post about interaction effects .
October 23, 2018 at 4:12 pm
Hello Jim ! I am very pleased to meet you and I greatly appreciate your work !
The Repeated Measures ANOVA that I have encountered in my study is as follows :
A number of subject groups, of n – people each, selected e.g by age, are tested repeatedly for the same number of times all, with the same drug ! I.e there is only one drug !
The score is the effectiveness of the drug on a specific body parameter, e.g on blood pressure. And the question is to assess the efectiveness of the drug.
Subjects group is not a random factor, as it is an age group Score also is not an independent r.v as it reflects the effect of the previous day of the drug
Do you have any notes on this type of problems or recommend a literature I can access from web ?
My best regards Elias Athens / Greece
October 24, 2018 at 4:26 pm
It’s OK to not have more than one drug. You just need to be able to compare the one drug to not taking the drug. You can do that both in a traditional control group/treatment group setting or by using repeated measures. However, given that you talk about repeated measures and everyone taking the drug, my guess is that it is some type of crossover design, which I describe in this post.
In this scenario, everyone would eventually take the same drug over the course of the study, but some subjects might start out by not taking the drug while the other subjects do. Then, the subjects switch.
You can include Subjects as a random factor if you randomly selected them from them population. Then, include Age as an additional Fixed factor if you’re specifying the age groups or as a covariate if you’re using their actual age (rather than dividing them into groups based on age ranges).
I hope this helps!
August 27, 2018 at 2:24 pm
I am getting conflicting advice. I ran a: pre-test, intervention, post-test study. Where I had 4 groups (3 experimental and one control). I tested hamstring strength. In my repeated measures ANOVA I had an effect of time but NO interaction effect. I have been told due to no interaction effect I do NOT run a post-hoc analysis. Is this correct as someone else has told me the complete opposite (I only run a post-hoc analysis when I do not have an interaction effect)?
August 28, 2018 at 11:11 pm
The correct action to do depends on the specifics of your study, which might be why you’re getting conflicting advice!
As a general statistical principle, it’s perfectly fine to perform post-hoc tests regardless of whether the interaction effect is significant or not. The only time that it makes no sense to perform a post hoc test is when no terms in your model are statistically significant. Although, even in that case, post hoc tests can sometimes detect statistical significance–but that’s another story. But, in a nutshell, you can perform post hoc tests whether or not your interaction term is significant.
However, I suspect that the real question is whether it makes sense the pre-test post-test nature of your study. You have measurements before and after the intervention. If the intervention is effective, you’d expect the differences to show up after the intervention but not before. Consequently, that is an interaction effect because it depends on the time of measurement. Read my blog post about interaction effects to see how these are “it depends” effects. So, if your interaction effect is not significant, it might not make sense to analyze your data further.
If the main effect for the treatment group variable is significant but not the interaction effect, it’s a bit difficult because it says that the treatment groups cause a difference between group means even in the pre-test measurement! That might represent only the differences between the subjects within those groups–it’s hard to say. You really want that interaction term to be significant!
If only the time effect is significant and nothing else, it’s probably not worth further investigation.
One thing I can say definitively is that the person who said that you can only perform a post-hoc analysis when the interaction is not significant is wrong! As a general principle, it’s OK to perform post-hoc analyses when an interaction term is significant. For your study, you particularly want a significant interaction term!
Comments and Questions Cancel reply
Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
How many times should an experiment be repeated?
I am doing an experiment as part of a school project. In order to decrease the random error I repeat the measurements.
How to define if I have made enough tries? Should it be 10? Or 20? Mathematically speaking the more tries I have done the better the precision is, however, this way I need to repeat the measurements an infinite number of times.
- experimental-physics
- error-analysis
- $\begingroup$ The ideal is 3 times or more $\endgroup$ – QuIcKmAtHs Commented Dec 29, 2017 at 15:55
- $\begingroup$ Trials and Experiments $\endgroup$ – user179430 Commented Dec 29, 2017 at 16:06
- 2 $\begingroup$ The answer depends on what you're measuring. Giving some details about your experiment would help. $\endgroup$ – lemon Commented Dec 29, 2017 at 17:27
- 1 $\begingroup$ the answer also depends on a characteristic of the measuring device, called "gauge capability". this has to do with how accurately and repeatably your measuring device does its job. Knowing the capability of your gauge allows you to determine whether differences in your measurements are dominated by flaws in the gauge rather than real differences between your experimental measurements. My own rule-of-thumb is 5 measurements are suggestive, 10 are data, and 50 are information- but this assumes a "capable" gauge. $\endgroup$ – niels nielsen Commented Dec 29, 2017 at 20:39
2 Answers 2
The answer depends on the degree of accuracy needed, and how noisy the measurements are. The requirements are set by the task (and your resources, such as time and effort), the noisiness depends on the measurement method (and perhaps on the measured thing, if it behaves a bit randomly).
For normally distributed errors (commonly but not always true), if you do $N$ independent measurements $x_i$ where each measurement error is normally distributed around the true mean $\mu$ with a standard error $\sigma$: you get an estimated mean by averaging your measurements $\hat{\mu}=(1/N)\sum_i x_i$. The neat thing is that the error in the estimate declines as you make more measurements, as $$\sigma_{mean}=\frac{\sigma}{\sqrt{N}}.$$ So if you knew that the standard error $\sigma$ was (say) 1 and you wanted a measurement that had a standard error 0.1, you can see that having $N=100$ would bring you down to that level of precision. Or, if $\delta$ is the desired accuracy, you need to make $\approx (\sigma/\delta)^2$ tries.
But when starting you do not know $\sigma$. You can get an estimate of the standard error of your measurements $\hat{\sigma}=\sqrt{\frac{1}{N-1}\sum_i (x_i-\hat{\mu})^2}$. This is a noisy result, since it is all based on your noisy measurements - if everything has gone right it is somewhere in the vicinity of the true $\sigma$, and you can use further statistical formulas to bound how much in error you might be in the error of your estimate. There are lots of annoying/interesting/subtle issues here that fill statistics courses.
In practice, for a school project : define how you make your measurements beforehand, make 10 or more, calculate the mean and standard error, and look at the data you have (this last step is often missed even by professional data scientists!) If the data is roughly normally distributed most measurements should be bunched up with a few outliers that are larger and smaller, and about half should be below the mean and half above. If you want to be cautious, check that the median (the middlemost data point) is close to the mean.
If the data is pretty normal, estimate how many tries you need and do them.
If the data does not look normal - very remote outliers, clumps away from the mean, skew (more high or low data points) - then the above statistics is suspect. Calculating means and standard errors still make sense and can/should be reported, but the formula for the accuracy will not be accurate. In cases like this it is often best to make a lot of measurements and in the report show the distribution of results to get a sense of the accuracy.
Things to look out for that this will not fix : biased measurements (whether that is due to always rounding up, always measuring from one side with a ruler, a thermometer that shows values slightly too high), too crude measurements, calculation errors (embarassingly common even in published science), errors in the experimental setup (are you really measuring what you want to measure?) and model errors (are you thinking about the problem in the right way?) No amount of statistics will fix this, but some planning and experimentation may help reduce the risk. Biased measurements can be corrected by checking that you get the right results for known cases and/or callibrating the device. Having two or more ways of measuring or calculating is a great sanity check. Experimental setup and model errors can be corrected by listening to annoying critics (who you can then magnanimously thank in your acknowledgement section).
Pick a number, let's say ten. Record your measurements. Determine the mean. Determine the standard deviation. Determine the standard error. Mean +/- 2*standard error will give you a 95% certainty that your mean is accurate.
Doing a chi squared test will determine if your data distribution is acceptable.
If standard error is too high then do more trials to reduce the error. If chi squared is off then it indicates your data is skewed which likely means there's some error in your measurement process. Correct that and try again.
Your Answer
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Not the answer you're looking for? Browse other questions tagged experimental-physics error-analysis statistics or ask your own question .
- Featured on Meta
- Preventing unauthorized automated access to the network
- User activation: Learnings and opportunities
- Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...
Hot Network Questions
- Strange Occurrences When Using Titlesec and Section Title Begins with `L'
- The meaning of the recursive type μt.t
- nicematrix \midrule undefined
- Is it possible to know where the Sun is just by looking at the Moon?
- Which ancient philosopher compared thoughts to birds?
- Writing horizontally right-to-left
- Does copying files from one drive to another also copy previously deleted data from the drive one?
- Taking out the film from the roll can it still work?
- Can excited state Fukui functions be calculated?
- why `tcpdump -i any` can't capture unicast traffic in br0 whilst `tcpdump -i br0` can?
- Do pilots have to produce their pilot license to police when asked?
- Does history possess the epistemological tools to establish the occurrence of an anomaly in the past that defies current scientific models?
- How can I reduce server load for a WordPress website with a large number of visitors without upgrading hosting?
- Letter of Recommendation for PhD Application from Instructor with Master Degree
- If Voyager is still an active NASA spacecraft, does it have a flight director? Is that a part time job?
- Does Artifact negate Blasphemy's "Die next turn" effect?
- How can I make a 2D FTL-lane map on a galaxy-wide scale?
- Will a Palm tree in Mars be approximately 2.5 times taller than the same tree on Earth?
- How does Linux kernel use Intel SGX (Software Guard Extensions)?
- In big band horn parts, should I write double flats (sharps) or the enharmonic equivalent?
- Is my TOTP key secure on a free hosting provider server with FTP and .htaccess restrictions?
- Proof of existence of algebraic closure with Zorn's lemma
- Is there some way to convert a rear disc brake bike to any type of rim brakes
- What is the name for this BC-BE back-to-back transistor configuration?
- Islamic Azad University
What is the reason for the replication of experiments in the design of Experiments?
Most recent answer.
Popular answers (1)
Top contributors to discussions in this field
- Centre d'Ecologie Fonctionnelle et Evolutive
- Arka Brenstech Private Limited
- University of Central Lancashire
- Technical College Požarevac
- Handique Girls' College
Get help with your research
Join ResearchGate to ask questions, get input, and advance your work.
All Answers (14)
Similar questions and discussions
- Asked 14 October 2017
- in this kind of experiment they consider each seedling is a biological replicate
- in this kind of experiment they consider each seedling is a biological replicate and following five independent experiments
- in this kind of experiment they consider each plate is a replicate and in one plate 10/15/20 seedlings are grown.
- Asked 27 February 2015
- Asked 2 June 2023
- Asked 26 April 2021
- Asked 18 July 2019
- Asked 29 January 2018
- Asked 23 June 2016
- Asked 22 April 2015
- Asked 29 September 2014
Related Publications
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
- Skip to main content
- Keyboard shortcuts for audio player
Hidden Brain
Scientific findings often fail to be replicated, researchers say.
Shankar Vedantam
A massive effort to test the validity of 100 psychology experiments finds that more than 50 percent of the studies fail to replicate. This is based on a new study published in the journal "Science."
Copyright © 2015 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.
IMAGES
VIDEO
COMMENTS
Although replicates cannot support inference on the main experimental questions, they do provide important quality controls of the conduct of experiments. Values from an outlying replicate can be omitted if a convincing explanation is found, although repeating part or all of the experiment is a safer strategy.
Learn why scientists repeat experiments and how repetition helps them verify results, rule out errors, and discover new things. Find out how repetition is also key to learning and mastery in various fields.
Replication is a fundamental tool for verifying and validating study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims. Learn about the types, methods, and challenges of replication in research.
Learn how repetition and replication are alike and different in experimental design and data analysis. Repetition refers to taking multiple measurements within the same or similar conditions, while replication involves conducting the same experiment or measurements under identical conditions across different runs.
Learn the difference between replicates and repeats in experimental design, and how they affect the sources of variability and the data analysis. Replicates are multiple runs with the same factor settings, while repeats are multiple measurements within the same run.
Learn how to design and analyze experiments to measure effects more accurately. Explore six techniques to reduce noise and increase signal, such as making repeated measurements, increasing sample size, randomizing samples and experiments, and including covariates.
For example, if the cost units of animals to cells to measurements is 10:1:0.1 (biological replicates are likely more expensive than technical ones) then an experiment with n A,n C,n M of 12,12,1 ...
Learn the difference between repetition and replication in science, and why they are important for testing hypotheses and discovering new knowledge. Repetition is repeating the same experiment over and over, while replication is having other scientists do the same experiment.
further and when to repeat the experiment. Replicates can act as an internal check of . the fidelity with which the experiment was . performed. T hey can alert you to problems .
Replication is the process of repeating a study or experiment under the same or similar conditions to support the original claim. Learn about the types, methods and examples of replication in statistics, and how it differs from repetition.
In a 2-Factor Repeated Measures design, Why do the ANOVA Expected Mean Squares have the values that they do? 8. Is there a good reason for a lab to repeat experiments instead of conducting a single larger blocked experiment. Hot Network Questions tan 3θ = cot(−θ)
Reproducing experiments is one of the cornerstones of the scientific process. Learn why it's important, how it works, and what challenges it faces in this article from NOVA.
This editorial discusses the challenges and importance of replication studies in cancer biology, which aim to test whether selected experiments in highly cited papers can be repeated. It ...
Learn how to recognize and analyze repeated measures designs in time series, where the same subjects or units are measured repeatedly. Explore the different covariance structures and software tools for repeated measures models.
Repeated measures designs don't fit our impression of a typical experiment in several key ways. When we think of an experiment, we often think of a design that has a clear distinction between the treatment and control groups. Each subject is in one, and only one, of these non-overlapping groups. Subjects who are in a treatment group are ...
Why is the ability to repeat experiments important? 1. Reliability. Replication lets you see patterns and trends in your results. This is affirmative for your work, making it stronger and better able to support your claims. This helps maintain integrity of data. On the other hand, repeating experiments allows you to identify mistakes, flukes ...
An important decision in any problem of experimental design is to determine how many times an experiment should be replicated. Note that we are referring to the internal replication of an experiment. Generally, the more it is replicated, the more accurate the results of the experiment will be.
Although replicates cannot support inference on the main experimental questions, they do provide important quality controls of the conduct of experiments. Values from an outlying replicate can be omitted if a convincing explanation is found, although repeating part or all of the experiment is a safer strategy.
Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and treatment groups that have clear divisions between them.
$\begingroup$ the answer also depends on a characteristic of the measuring device, called "gauge capability". this has to do with how accurately and repeatably your measuring device does its job. Knowing the capability of your gauge allows you to determine whether differences in your measurements are dominated by flaws in the gauge rather than real differences between your experimental ...
To repeat an experiment, under the same conditions, allows you to (a) estimate the variability of the results (how close to each other they are) and (b) to increase the accuracy of the estimate ...
A massive effort to test the validity of 100 psychology experiments finds that more than 50 percent of the studies fail to replicate. This is based on a new study published in the journal "Science."