If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Biology archive

Course: biology archive   >   unit 1.

  • The scientific method

Controlled experiments

  • The scientific method and experimental design

experimental control in biology

Introduction

How are hypotheses tested.

  • One pot of seeds gets watered every afternoon.
  • The other pot of seeds doesn't get any water at all.

Control and experimental groups

Independent and dependent variables, independent variables, dependent variables, variability and repetition, controlled experiment case study: co 2 ‍   and coral bleaching.

  • What your control and experimental groups would be
  • What your independent and dependent variables would be
  • What results you would predict in each group

Experimental setup

  • Some corals were grown in tanks of normal seawater, which is not very acidic ( pH ‍   around 8.2 ‍   ). The corals in these tanks served as the control group .
  • Other corals were grown in tanks of seawater that were more acidic than usual due to addition of CO 2 ‍   . One set of tanks was medium-acidity ( pH ‍   about 7.9 ‍   ), while another set was high-acidity ( pH ‍   about 7.65 ‍   ). Both the medium-acidity and high-acidity groups were experimental groups .
  • In this experiment, the independent variable was the acidity ( pH ‍   ) of the seawater. The dependent variable was the degree of bleaching of the corals.
  • The researchers used a large sample size and repeated their experiment. Each tank held 5 ‍   fragments of coral, and there were 5 ‍   identical tanks for each group (control, medium-acidity, and high-acidity). Note: None of these tanks was "acidic" on an absolute scale. That is, the pH ‍   values were all above the neutral pH ‍   of 7.0 ‍   . However, the two groups of experimental tanks were moderately and highly acidic to the corals , that is, relative to their natural habitat of plain seawater.

Analyzing the results

Non-experimental hypothesis tests, case study: coral bleaching and temperature, attribution:, works cited:.

  • Hoegh-Guldberg, O. (1999). Climate change, coral bleaching, and the future of the world's coral reefs. Mar. Freshwater Res. , 50 , 839-866. Retrieved from www.reef.edu.au/climate/Hoegh-Guldberg%201999.pdf.
  • Anthony, K. R. N., Kline, D. I., Diaz-Pulido, G., Dove, S., and Hoegh-Guldberg, O. (2008). Ocean acidification causes bleaching and productivity loss in coral reef builders. PNAS , 105 (45), 17442-17446. http://dx.doi.org/10.1073/pnas.0804478105 .
  • University of California Museum of Paleontology. (2016). Misconceptions about science. In Understanding science . Retrieved from http://undsci.berkeley.edu/teaching/misconceptions.php .
  • Hoegh-Guldberg, O. and Smith, G. J. (1989). The effect of sudden changes in temperature, light and salinity on the density and export of zooxanthellae from the reef corals Stylophora pistillata (Esper, 1797) and Seriatopora hystrix (Dana, 1846). J. Exp. Mar. Biol. Ecol. , 129 , 279-303. Retrieved from http://www.reef.edu.au/ohg/res-pic/HG%20papers/HG%20and%20Smith%201989%20BLEACH.pdf .

Additional references:

Want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Great Answer

logo

Home » experimental control important

What An Experimental Control Is And Why It’s So Important

' src=

Daniel Nelson

experimental control in biology

An experimental control is used in scientific experiments to minimize the effect of variables which are not the interest of the study. The control can be an object, population, or any other variable which a scientist would like to “control.”

You may have heard of experimental control, but what is it? Why is an experimental control important? The function of an experimental control is to hold constant the variables that an experimenter isn’t interested in measuring.

This helps scientists ensure that there have been no deviations in the environment of the experiment that could end up influencing the outcome of the experiment, besides the variable they are investigating. Let’s take a closer look at what this means.

You may have ended up here to understand why a control is important in an experiment. A control is important for an experiment because it allows the experiment to minimize the changes in all other variables except the one being tested.

To start with, it is important to define some terminology.

Terminology Of A Scientific Experiment

NegativeThe negative control variable is a variable or group where no response is expected
PositiveA positive control is a group or variable that receives a treatment with a known positive result
RandomizationA randomized controlled seeks to reduce bias when testing a new treatment
Blind experimentsIn blind experiments, the variable or group does not know the full amount of information about the trial to not skew results
Double-blind experimentsA double-blind group is where all parties do not know which individual is receiving the experimental treatment

Randomization is important as it allows for more non-biased results in experiments. Random numbers generators are often used both in scientific studies as well as on 지노 사이트 to make outcomes fairer.

Scientists use the scientific method to ask questions and come to conclusions about the nature of the world. After making an observation about some sort of phenomena they would like to investigate, a scientist asks what the cause of that phenomena could be. The scientist creates a hypothesis, a proposed explanation that answers the question they asked. A hypothesis doesn’t need to be correct, it just has to be testable.

The hypothesis is a prediction about what will happen during the experiment, and if the hypothesis is correct then the results of the experiment should align with the scientist’s prediction. If the results of the experiment do not align with the hypothesis, then a good scientist will take this data into consideration and form a new hypothesis that can better explain the phenomenon in question.

Independent and Dependent Variables

In order to form an effective hypothesis and do meaningful research, the researcher must define the experiment’s independent and dependent variables . The independent variable is the variable which the experimenter either manipulates or controls in an experiment to test the effects of this manipulation on the dependent variable. A dependent variable is a variable being measured to see if the manipulation has any effect.

experimental control in biology

Photo: frolicsomepl via Pixabay, CC0

For instance, if a researcher wanted to see how temperature impacts the behavior of a certain gas, the temperature they adjust would be the independent variable and the behavior of the gas the dependent variable.

Control Groups and Experimental Groups

There will frequently be two groups under observation in an experiment, the experimental group, and the control group . The control group is used to establish a baseline that the behavior of the experimental group can be compared to. If two groups of people were receiving an experimental treatment for a medical condition, one would be given the actual treatment (the experimental group) and one would typically be given a placebo or sugar pill (the control group).

Without an experimental control group, it is difficult to determine the effects of the independent variable on the dependent variable in an experiment. This is because there can always be outside factors that are influencing the behavior of the experimental group. The function of a control group is to act as a point of comparison, by attempting to ensure that the variable under examination (the impact of the medicine) is the thing responsible for creating the results of an experiment. The control group is holding other possible variables constant, such as the act of seeing a doctor and taking a pill, so only the medicine itself is being tested.

Why Are Experimental Controls So Important?

Experimental controls allow scientists to eliminate varying amounts of uncertainty in their experiments. Whenever a researcher does an experiment and wants to ensure that only the variable they are interested in changing is changing, they need to utilize experimental controls.

Experimental controls have been dubbed “controls” precisely because they allow researchers to control the variables they think might have an impact on the results of the study. If a researcher believes that some outside variables could influence the results of their research, they’ll use a control group to try and hold that thing constant and measure any possible influence it has on the results. It is important to note that there may be many different controls for an experiment, and the more complex a phenomenon under investigation is, the more controls it is likely to have.

Not only do controls establish a baseline that the results of an experiment can be compared to, they also allow researchers to correct for possible errors. If something goes wrong in the experiment, a scientist can check on the controls of the experiment to see if the error had to do with the controls. If so, they can correct this next time the experiment is done.

A Practical Example

Let’s take a look at a concrete example of experimental control. If an experimenter wanted to determine how different soil types impacted the germination period of seeds , they could set up four different pots. Each pot would be filled with a different soil type, planted with seeds, then watered and exposed to sunlight. Measurements would be taken regarding how long it took for the seeds to sprout in the different soil types.

experimental control in biology

Photo: Kaz via Pixabay, CC0

A control for this experiment might be to fill more pots with just the different types of soil and no seeds or to set aside some seeds in a pot with no soil. The goal is to try and determine that it isn’t something else other than the soil, like the nature of the seeds themselves, the amount of sun they were exposed to, or how much water they are given, that affected how quickly the seeds sprouted. The more variables a researcher controlled for, the surer they could be that it was the type of soil having an impact on the germination period.

  Not All Experiments Are Controlled

“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” — Richard P. Feynman

While experimental controls are important , it is also important to remember that not all experiments are controlled. In the real world, there are going to be limitations on what variables a researcher can control for, and scientists often try to record as much data as they can during an experiment so they can compare factors and variables with one another to see if any variables they didn’t control for might have influenced the outcome. It’s still possible to draw useful data from experiments that don’t have controls, but it is much more difficult to draw meaningful conclusions based on uncontrolled data.

Though it is often impossible in the real world to control for every possible variable, experimental controls are an invaluable part of the scientific process and the more controls an experiment has the better off it is.

← Previous post

Next post →

Related Posts

experimental control in biology

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.20(10); 2019 Oct 4

Logo of emborep

Why control an experiment?

John s torday.

1 Department of Pediatrics, Harbor‐UCLA Medical Center, Torrance, CA, USA

František Baluška

2 IZMB, University of Bonn, Bonn, Germany

Empirical research is based on observation and experimentation. Yet, experimental controls are essential for overcoming our sensory limits and generating reliable, unbiased and objective results.

An external file that holds a picture, illustration, etc.
Object name is EMBR-20-e49110-g001.jpg

We made a deliberate decision to become scientists and not philosophers, because science offers the opportunity to test ideas using the scientific method. And once we began our formal training as scientists, the greatest challenge beyond formulating a testable or refutable hypothesis was designing appropriate controls for an experiment. In theory, this seems trivial, but in practice, it is often difficult. But where and when did this concept of controlling an experiment start? It is largely attributed to Roger Bacon, who emphasized the use of artificial experiments to provide additional evidence for observations in his Novum Organum Scientiarum in 1620. Other philosophers took up the concept of empirical research: in 1877, Charles Peirce redefined the scientific method in The Fixation of Belief as the most efficient and reliable way to prove a hypothesis. In the 1930s, Karl Popper emphasized the necessity of refuting hypotheses in The Logic of Scientific Discoveries . While these influential works do not explicitly discuss controls as an integral part of experiments, their importance for generating solid and reliable results is nonetheless implicit.

… once we began our formal training as scientists, the greatest challenge beyond formulating a testable or refutable hypothesis was designing appropriate controls for an experiment.

But the scientific method based on experimentation and observation has come under criticism of late in light of the ever more complex problems faced in physics and biology. Chris Anderson, the editor of Wired Magazine, proposed that we should turn to statistical analysis, machine learning, and pattern recognition instead of creating and testing hypotheses, based on the Informatics credo that if you cannot answer the question, you need more data. However, this attitude subsumes that we already have enough data and that we just cannot make sense of it. This assumption is in direct conflict with David Bohm's thesis that there are two “Orders”, the Explicate and Implicate 1 . The Explicate Order is the way in which our subjective sensory systems perceive the world 2 . In contrast, Bohm's Implicate Order would represent the objective reality beyond our perception. This view—that we have only a subjective understanding of reality—dates back to Galileo Galilei who, in 1623, criticized the Aristotelian concept of absolute and objective qualities of our sensory perceptions 3 and to Plato's cave allegory that reality is only what our senses allow us to see.

The only way for systematically overcoming the limits of our sensory apparatus and to get a glimpse of the Implicate Order is through the scientific method, through hypothesis‐testing, controlled experimentation. Beyond the methodology, controlling an experiment is critically important to ensure that the observed results are not just random events; they help scientists to distinguish between the “signal” and the background “noise” that are inherent in natural and living systems. For example, the detection method for the recent discovery of gravitational waves used four‐dimensional reference points to factor out the background noise of the Cosmos. Controls also help to account for errors and variability in the experimental setup and measuring tools: The negative control of an enzyme assay, for instance, tests for any unrelated background signals from the assay or measurement. In short, controls are essential for the unbiased, objective observation and measurement of the dependent variable in response to the experimental setup.

The only way for systematically overcoming the limits of our sensory apparatus […] is through the Scientific Method, through hypothesis‐testing, controlled experimentation.

Nominally, both positive and negative controls are material and procedural; that is, they control for variability of the experimental materials and the procedure itself. But beyond the practical issues to avoid procedural and material artifacts, there is an underlying philosophical question. The need for experimental controls is a subliminal recognition of the relative and subjective nature of the Explicate Order. It requires controls as “reference points” in order to transcend it, and to approximate the Implicate Order.

This is similar to Peter Rowlands’ 4 dictum that everything in the Universe adds up to zero, the universal attractor in mathematics. Prior to the introduction of zero, mathematics lacked an absolute reference point similar to a negative or positive control in an experiment. The same is true of biology, where the cell is the reference point owing to its negative entropy: It appears as an attractor for the energy of its environment. Hence, there is a need for careful controls in biology: The homeostatic balance that is inherent to life varies during the course of an experiment and therefore must be precisely controlled to distinguish noise from signal and approximate the Implicate Order of life.

P  < 0.05 tacitly acknowledges the explicate order

Another example of the “subjectivity” of our perception is the level of accuracy we accept for differences between groups. For example, when we use statistical methods to determine if an observed difference between control and experimental groups is a random occurrence or a specific effect, we conventionally consider a p value of less than or equal to 5% as statistically significant; that is, there is a less than 0.05 probability that the effect is random. The efficacy of this arbitrary convention has been debated for decades; suffice to say that despite questioning the validity of that convention, a P value of < 0.05 reflects our acceptance of the subjectivity of our perception of reality.

… controls are essential for the unbiased, objective observation and measurement of the dependent variable in response to the experimental setup.

Thus, if we do away with hypothesis‐testing science in favor of informatics based on data and statistics—referring to Anderson's suggestion—it reflects our acceptance of the noise in the system. However, mere data analysis without any underlying hypothesis is tantamount to “garbage in‐garbage out”, in contrast to well‐controlled imaginative experiments to separate the wheat from the chaff. Albert Einstein was quoted as saying that imagination was more important than knowledge.

The ultimate purpose of the scientific method is to understand ourselves and our place in Nature. Conventionally, we subscribe to the Anthropic Principle, that we are “in” this Universe, whereas the Endosymbiosis Theory, advocated by Lynn Margulis, stipulates that we are “of” this Universe as a result of the assimilation of the physical environment. According to this theory, the organism endogenizes external factors to make them physiologically “useful”, such as iron as the core of the hemoglobin molecule, or ancient bacteria as mitochondria.

… there is a fundamental difference between knowing via believing and knowing based on empirical research.

By applying the developmental mechanism of cell–cell communication to phylogeny, we have revealed the interrelationships between cells and explained evolution from its origin as the unicellular state to multicellularity via cell–cell communication. The ultimate outcome of this research is that consciousness is the product of cellular processes and cell–cell communication in order to react to the environment and better anticipate future events 5 , 6 . Consciousness is an essential prerequisite for transcending the Explicate Order toward the Implicate Order via cellular sensory and cognitive systems that feed an ever‐expanding organismal knowledge about both the environment and itself.

It is here where the empirical approach to understanding nature comes in with its emphasis that knowledge comes only from sensual experience rather than innate ideas or traditions. In the context of the cell or higher systems, knowledge about the environment can only be gained by sensing and analyzing the environment. Empiricism is similar to an equation in which the variables and terms form a product, or a chemical reaction, or a biological process where the substrates, aka sensory data, form products, that is, knowledge. However, it requires another step—imagination, according to Albert Einstein—to transcend the Explicate Order in order to gain insight into the Implicate Order. Take for instance, Dmitri Ivanovich Mendeleev's Periodic Table of Elements: his brilliant insight was not just to use Atomic Number to organize it, but also to consider the chemical reactivities of the Elements by sorting them into columns. By introducing chemical reactivity to the Periodic Table, Mendeleev provided something like the “fourth wall” in Drama, which gives the audience an omniscient, god‐like perspective on what is happening on stage.

The capacity to transcend the subjective Explicate Order to approximate the objective Implicate Order is not unlike Eastern philosophies like Buddhism or Taoism, which were practiced long before the scientific method. An Indian philosopher once pointed out that the Hindus have known for 30,000 years that the Earth revolves around the sun, while the Europeans only realized this a few hundred years ago based on the work of Copernicus, Brahe, and Galileo. However, there is a fundamental difference between knowing via believing and knowing based on empirical research. A similar example is Aristotle's refusal to test whether a large stone would fall faster than a small one, as he knew the answer already 7 . Galileo eventually performed the experiment from the Leaning Tower in Pisa to demonstrate that the fall time of two objects is independent of their mass—which disproved Aristotle's theory of gravity that stipulated that objects fall at a speed proportional to their mass. Again, it demonstrates the power of empiricism and experimentation as formulated by Francis Bacon, John Locke, and others, over intuition and rationalizing.

Even if our scientific instruments provide us with objective data, we still need to apply our consciousness to evaluate and interpret such data.

Following the evolution from the unicellular state to multicellular organisms—and reverse‐engineering it to a minimal‐cell state—reveals that biologic diversity is an artifact of the Explicate Order. Indeed, the unicell seems to be the primary level of selection in the Implicate Order, as it remains proximate to the First Principles of Physiology, namely negative entropy (negentropy), chemiosmosis, and homeostasis. The first two principles are necessary for growth and proliferation, whereas the last reflects Newton's Third Law of Motion that every action has an equal and opposite reaction so as to maintain homeostasis.

All organisms interact with their surroundings and assimilate their experience as epigenetic marks. Such marks extend to the DNA of germ cells and thus change the phenotypic expression of the offspring. The offspring, in turn, interacts with the environment in response to such epigenetic modifications, giving rise to the concept of the phenotype as an agent that actively and purposefully interacts with its environment in order to adapt and survive. This concept of phenotype based on agency linked to the Explicate Order fundamentally differs from its conventional description as a mere set of biologic characteristics. Organisms’ capacities to anticipate future stress situations from past memories are obvious in simple animals such as nematodes, as well as in plants and bacteria 8 , suggesting that the subjective Explicate Order controls both organismal behavior and trans‐generational evolution.

That perspective offers insight to the nature of consciousness: not as a “mind” that is separate from a “body”, but as an endogenization of physical matter, which complies with the Laws of Nature. In other words, consciousness is the physiologic manifestation of endogenized physical surroundings, compartmentalized, and made essential for all organisms by forming the basis for their physiology. Endocytosis and endocytic/synaptic vesicles contribute to endogenization of cellular surroundings, allowing eukaryotic organisms to gain knowledge about the environment. This is true not only for neurons in brains, but also for all eukaryotic cells 5 .

Such a view of consciousness offers insight to our awareness of our physical surroundings as the basis for self‐referential self‐organization. But this is predicated on our capacity to “experiment” with our environment. The burgeoning idea that we are entering the Anthropocene, a man‐made world founded on subjective senses instead of Natural Laws, is a dangerous step away from our innate evolutionary arc. Relying on just our senses and emotions, without experimentation and controls to understand the Implicate Order behind reality, is not just an abandonment of the principles of the Enlightenment, but also endangers the planet and its diversity of life.

Further reading

Anderson C (2008) The End of Theory: the data deluge makes the scientific method obsolete. Wired (December 23, 2008)

Bacon F (1620, 2011) Novum Organum Scientiarum. Nabu Press

Baluška F, Gagliano M, Witzany G (2018) Memory and Learning in Plants. Springer Nature

Charlesworth AG, Seroussi U, Claycomb JM (2019) Next‐Gen learning: the C. elegans approach. Cell 177: 1674–1676

Eliezer Y, Deshe N, Hoch L, Iwanir S, Pritz CO, Zaslaver A (2019) A memory circuit for coping with impending adversity. Curr Biol 29: 1573–1583

Gagliano M, Renton M, Depczynski M, Mancuso S (2014) Experience teaches plants to learn faster and forget slower in environments where it matters. Oecologia 175: 63–72

Gagliano M, Vyazovskiy VV, Borbély AA, Grimonprez M, Depczynski M (2016) Learning by association in plants. Sci Rep 6: 38427

Katz M, Shaham S (2019) Learning and memory: mind over matter in C. elegans . Curr Biol 29: R365‐R367

Kováč L (2007) Information and knowledge in biology – time for reappraisal. Plant Signal Behav 2: 65–73

Kováč L (2008) Bioenergetics – a key to brain and mind. Commun Integr Biol 1: 114–122

Koshland DE Jr (1980) Bacterial chemotaxis in relation to neurobiology. Annu Rev Neurosci 3: 43–75

Lyon P (2015) The cognitive cell: bacterial behavior reconsidered. Front Microbiol 6: 264

Margulis L (2001) The conscious cell. Ann NY Acad Sci 929: 55–70

Maximillian N (2018) The Metaphysics of Science and Aim‐Oriented Empiricism. Springer: New York

Mazzocchi F (2015) Could Big Data be the end of theory in science? EMBO Rep 16: 1250–1255

Moore RS, Kaletsky R, Murphy CT (2019) Piwi/PRG‐1 argonaute and TGF‐β mediate transgenerational learned pathogenic avoidance. Cell 177: 1827–1841

Peirce CS (1877) The Fixation of Belief. Popular Science Monthly 12: 1–15

Pigliucci M (2009) The end of theory in science? EMBO Rep 10: 534

Popper K (1959) The Logic of Scientific Discovery. Routledge: London

Posner R, Toker IA, Antonova O, Star E, Anava S, Azmon E, Hendricks M, Bracha S, Gingold H, Rechavi O (2019) Neuronal small RNAs control behavior transgenerationally. Cell 177: 1814–1826

Russell B (1912) The Problems of Philosophy. Henry Holt and Company: New York

Scerri E (2006) The Periodic Table: It's Story and Significance. Oxford University Press, Oxford

Shapiro JA (2007) Bacteria are small but not stupid: cognition, natural genetic engineering and socio‐bacteriology. Stud Hist Philos Biol Biomed Sci 38: 807–818

Torday JS, Miller WB Jr (2016) Biologic relativity: who is the observer and what is observed? Prog Biophys Mol Biol 121: 29–34

Torday JS, Rehan VK (2017) Evolution, the Logic of Biology. Wiley: Hoboken

Torday JS, Miller WB Jr (2016) Phenotype as agent for epigenetic inheritance. Biology (Basel) 5: 30

Wasserstein RL, Lazar NA (2016) The ASA's statement on p‐values: context, process and purpose. Am Statist 70: 129–133

Yamada T, Yang Y, Valnegri P, Juric I, Abnousi A, Markwalter KH, Guthrie AN, Godec A, Oldenborg A, Hu M, Holy TE, Bonni A (2019) Sensory experience remodels genome architecture in neural circuit to drive motor learning. Nature 569: 708–713

Ladislav Kováč discussed the advantages and drawbacks of the inductive method for science and the logic of scientific discoveries 9 . Obviously, technological advances have enabled scientists to expand the borders of knowledge, and informatics allows us to objectively analyze ever larger data‐sets. It was the telescope that enabled Tycho Brahe, Johannes Kepler, and Galileo Galilei to make accurate observations and infer the motion of the planets. The microscope provided Robert Koch and Louis Pasteur insights into the microbial world and determines the nature of infectious diseases. Particle colliders now give us a glimpse into the birth of the Universe, while DNA sequencing and bioinformatics have enormously advanced biology's goal to understand the molecular basis of life.

However, Kováč also reminds us that Bayesian inferences and reasoning have serious drawbacks, as documented in the instructive example of Bertrand Russell's “inductivist turkey”, which collected large amounts of reproducible data each morning about feeding time. Based on these observations, the turkey correctly predicted the feeding time for the next morning—until Christmas Eve when the turkey's throat was cut 9 . In order to avoid the fate of the “inductivist turkey”, mankind should also rely on Popperian deductive science, namely formulating theories, concepts, and hypotheses, which are either confirmed or refuted via stringent experimentation and proper controls. Even if our scientific instruments provide us with objective data, we still need to apply our consciousness to evaluate and interpret such data. Moreover, before we start using our scientific instruments, we need to pose scientific questions. Therefore, as suggested by Albert Szent‐Györgyi, we need both Dionysian and Apollonian types of scientists 10 . Unfortunately, as was the case in Szent‐Györgyi's times, the Dionysians are still struggling to get proper support.

There have been pleas for reconciling philosophy and science, which parted ways owing to the rise of empiricism. This essay recognizes the centrality experiments and their controls for the advancement of scientific thought, and the attendant advance in philosophy needed to cope with many extant and emerging issues in science and society. We need a common “will” to do so. The rationale is provided herein, if only.

Acknowledgements

John Torday has been a recipient of NIH Grant HL055268. František Baluška is thankful to numerous colleagues for very stimulating discussions on topics analyzed in this article.

EMBO Reports (2019) 20 : e49110 [ PMC free article ] [ PubMed ] [ Google Scholar ]

Contributor Information

John S Torday, Email: ude.alcu@yadrotj .

František Baluška, Email: ed.nnob-inu@aksulab .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Controlled Experiment? | Definitions & Examples

What Is a Controlled Experiment? | Definitions & Examples

Published on April 19, 2021 by Pritha Bhandari . Revised on June 22, 2023.

In experiments , researchers manipulate independent variables to test their effects on dependent variables. In a controlled experiment , all variables other than the independent variable are controlled or held constant so they don’t influence the dependent variable.

Controlling variables can involve:

  • holding variables at a constant or restricted level (e.g., keeping room temperature fixed).
  • measuring variables to statistically control for them in your analyses.
  • balancing variables across your experiment through randomization (e.g., using a random order of tasks).

Table of contents

Why does control matter in experiments, methods of control, problems with controlled experiments, other interesting articles, frequently asked questions about controlled experiments.

Control in experiments is critical for internal validity , which allows you to establish a cause-and-effect relationship between variables. Strong validity also helps you avoid research biases , particularly ones related to issues with generalizability (like sampling bias and selection bias .)

  • Your independent variable is the color used in advertising.
  • Your dependent variable is the price that participants are willing to pay for a standard fast food meal.

Extraneous variables are factors that you’re not interested in studying, but that can still influence the dependent variable. For strong internal validity, you need to remove their effects from your experiment.

  • Design and description of the meal,
  • Study environment (e.g., temperature or lighting),
  • Participant’s frequency of buying fast food,
  • Participant’s familiarity with the specific fast food brand,
  • Participant’s socioeconomic status.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

experimental control in biology

You can control some variables by standardizing your data collection procedures. All participants should be tested in the same environment with identical materials. Only the independent variable (e.g., ad color) should be systematically changed between groups.

Other extraneous variables can be controlled through your sampling procedures . Ideally, you’ll select a sample that’s representative of your target population by using relevant inclusion and exclusion criteria (e.g., including participants from a specific income bracket, and not including participants with color blindness).

By measuring extraneous participant variables (e.g., age or gender) that may affect your experimental results, you can also include them in later analyses.

After gathering your participants, you’ll need to place them into groups to test different independent variable treatments. The types of groups and method of assigning participants to groups will help you implement control in your experiment.

Control groups

Controlled experiments require control groups . Control groups allow you to test a comparable treatment, no treatment, or a fake treatment (e.g., a placebo to control for a placebo effect ), and compare the outcome with your experimental treatment.

You can assess whether it’s your treatment specifically that caused the outcomes, or whether time or any other treatment might have resulted in the same effects.

To test the effect of colors in advertising, each participant is placed in one of two groups:

  • A control group that’s presented with red advertisements for a fast food meal.
  • An experimental group that’s presented with green advertisements for the same fast food meal.

Random assignment

To avoid systematic differences and selection bias between the participants in your control and treatment groups, you should use random assignment .

This helps ensure that any extraneous participant variables are evenly distributed, allowing for a valid comparison between groups .

Random assignment is a hallmark of a “true experiment”—it differentiates true experiments from quasi-experiments .

Masking (blinding)

Masking in experiments means hiding condition assignment from participants or researchers—or, in a double-blind study , from both. It’s often used in clinical studies that test new treatments or drugs and is critical for avoiding several types of research bias .

Sometimes, researchers may unintentionally encourage participants to behave in ways that support their hypotheses , leading to observer bias . In other cases, cues in the study environment may signal the goal of the experiment to participants and influence their responses. These are called demand characteristics . If participants behave a particular way due to awareness of being observed (called a Hawthorne effect ), your results could be invalidated.

Using masking means that participants don’t know whether they’re in the control group or the experimental group. This helps you control biases from participants or researchers that could influence your study results.

You use an online survey form to present the advertisements to participants, and you leave the room while each participant completes the survey on the computer so that you can’t tell which condition each participant was in.

Although controlled experiments are the strongest way to test causal relationships, they also involve some challenges.

Difficult to control all variables

Especially in research with human participants, it’s impossible to hold all extraneous variables constant, because every individual has different experiences that may influence their perception, attitudes, or behaviors.

But measuring or restricting extraneous variables allows you to limit their influence or statistically control for them in your study.

Risk of low external validity

Controlled experiments have disadvantages when it comes to external validity —the extent to which your results can be generalized to broad populations and settings.

The more controlled your experiment is, the less it resembles real world contexts. That makes it harder to apply your findings outside of a controlled setting.

There’s always a tradeoff between internal and external validity . It’s important to consider your research aims when deciding whether to prioritize control or generalizability in your experiment.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

Prevent plagiarism. Run a free check.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is a Controlled Experiment? | Definitions & Examples. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/methodology/controlled-experiment/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, extraneous variables | examples, types & controls, guide to experimental design | overview, steps, & examples, how to write a lab report, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Microbe Notes

Microbe Notes

Controlled Experiments: Definition, Steps, Results, Uses

Controlled experiments ensure valid and reliable results by minimizing biases and controlling variables effectively.

Rigorous planning, ethical considerations, and precise data analysis are vital for successful experiment execution and meaningful conclusions.

Real-world applications demonstrate the practical impact of controlled experiments, guiding informed decision-making in diverse domains.

Controlled Experiments

Controlled experiments are the systematic research method where variables are intentionally manipulated and controlled to observe the effects of a particular phenomenon. It aims to isolate and measure the impact of specific variables, ensuring a more accurate causality assessment.

Table of Contents

Interesting Science Videos

Importance of controlled experiments in various fields

Controlled experiments are significant across diverse fields, including science, psychology, economics, healthcare, and technology.

They provide a systematic approach to test hypotheses, establish cause-and-effect relationships, and validate the effectiveness of interventions or solutions.

Why Controlled Experiments Matter? 

Validity and reliability of results.

Controlled experiments uphold the gold standard for scientific validity and reliability. By meticulously controlling variables and conditions, researchers can attribute observed outcomes accurately to the independent variable being tested. This precision ensures that the findings can be replicated and are trustworthy.

Minimizing Biases and Confounding Variables

One of the core benefits of controlled experiments lies in their ability to minimize biases and confounding variables. Extraneous factors that could distort results are mitigated through careful control and randomization. This enables researchers to isolate the effects of the independent variable, leading to a more accurate understanding of causality.

Achieving Causal Inference

Controlled experiments provide a strong foundation for establishing causal relationships between variables. Researchers can confidently infer causation by manipulating specific variables and observing resulting changes. The capability informs decision-making, policy formulation, and advancements across various fields.

Planning a Controlled Experiment

Formulating research questions and hypotheses.

Formulating clear research questions and hypotheses is paramount at the outset of a controlled experiment. These inquiries guide the direction of the study, defining the variables of interest and setting the stage for structured experimentation.

Well-defined questions and hypotheses contribute to focused research and facilitate meaningful data collection.

Identifying Variables and Control Groups

Identifying and defining independent, dependent, and control variables is fundamental to experimental planning. 

Precise identification ensures that the experiment is designed to isolate the effect of the independent variable while controlling for other influential factors. Establishing control groups allows for meaningful comparisons and robust analysis of the experimental outcomes.

Designing Experimental Procedures and Protocols

Careful design of experimental procedures and protocols is essential for a successful controlled experiment. The step involves outlining the methodology, data collection techniques, and the sequence of activities in the experiment. 

A well-designed experiment is structured to maintain consistency, control, and accuracy throughout the study, thereby enhancing the validity and credibility of the results.

Conducting a Controlled Experiment

Randomization and participant selection.

Randomization is a critical step in ensuring the fairness and validity of a controlled experiment. It involves assigning participants to different experimental conditions in a random and unbiased manner. 

The selection of participants should accurately represent the target population, enhancing the results’ generalizability.

Data Collection Methods and Instruments

Selecting appropriate data collection methods and instruments is pivotal in gathering accurate and relevant data. Researchers often employ surveys, observations, interviews, or specialized tools to record and measure the variables of interest. 

The chosen methods should align with the experiment’s objectives and provide reliable data for analysis.

Monitoring and Maintaining Experimental Conditions

Maintaining consistent and controlled experimental conditions throughout the study is essential. Regular monitoring helps ensure that variables remain constant and uncontaminated, reducing the risk of confounding factors. 

Rigorous monitoring protocols and timely adjustments are crucial for the accuracy and reliability of the experiment.

Analysing Results and Drawing Conclusions

Data analysis techniques.

Data analysis involves employing appropriate statistical and analytical techniques to process the collected data. This step helps derive meaningful insights, identify patterns, and draw valid conclusions. 

Common techniques include regression analysis, t-tests , ANOVA , and more, tailored to the research design and data type .

Interpretation of Results

Interpreting the results entails understanding the statistical outcomes and their implications for the research objectives. 

Researchers analyze patterns, trends, and relationships revealed by the data analysis to infer the experiment’s impact on the variables under study. Clear and accurate interpretation is crucial for deriving actionable insights.

Implications and Potential Applications

Identifying the broader implications and potential applications of the experiment’s results is fundamental. Researchers consider how the findings can inform decision-making, policy development, or further research. 

Understanding the practical implications helps bridge the gap between theoretical insights and real-world application.

Common Challenges and Solutions

Addressing ethical considerations.

Ethical challenges in controlled experiments include ensuring informed consent, protecting participants’ privacy, and minimizing harm. 

Solutions involve thorough ethics reviews, transparent communication with participants, and implementing safeguards to uphold ethical standards throughout the experiment.

Dealing with Sample Size and Statistical Power

The sample size is crucial for achieving statistically significant results. Adequate sample sizes enhance the experiment’s power to detect meaningful effects accurately. 

Statistical power analysis guides researchers in determining the optimal sample size for the experiment, minimizing the risk of type I and II errors .

Mitigating Unforeseen Variables

Unforeseen variables can introduce bias and affect the experiment’s validity. Researchers employ meticulous planning and robust control measures to minimize the impact of unforeseen variables. 

Pre-testing and pilot studies help identify potential confounders, allowing researchers to adapt the experiment accordingly.

A controlled experiment involves meticulous planning, precise execution, and insightful analysis. Adhering to ethical standards, optimizing sample size, and adapting to unforeseen variables are key challenges that require thoughtful solutions. 

Real-world applications showcase the transformative potential of controlled experiments across varied domains, emphasizing their indispensable role in evidence-based decision-making and progress.

  • https://www.khanacademy.org/science/biology/intro-to-biology/science-of-biology/a/experiments-and-observations
  • https://www.scribbr.com/methodology/controlled-experiment/
  • https://link.springer.com/10.1007/978-1-4899-7687-1_891
  • http://ai.stanford.edu/~ronnyk/GuideControlledExperiments.pdf
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6776925/
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4017459/
  • https://www.merriam-webster.com/dictionary/controlled%20experiment

About Author

Photo of author

Krisha Karki

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Loading metrics

Open Access

Primers provide a concise introduction into an important aspect of biology highlighted by a current PLOS Biology research article.

See all article types »

Controlling control—A primer in open-source experimental control systems

* E-mail: [email protected]

Affiliation Department of Chemistry, Northwestern University, Evanston, Illinois, United States of America

ORCID logo

  • Christopher James Forman

PLOS

Published: September 10, 2020

  • https://doi.org/10.1371/journal.pbio.3000858
  • Reader Comments

Fig 1

Biological systems are composed of countless interlocking feedback loops. Reactor control systems—such as Chi-Bio ( https://chi.bio/ ), recently published in PLOS Biology —enable biologists to drive multiple processes within living biological samples, using a single experimental framework. Consequently, the dynamic relationships between many biological variables can be explored simultaneously in situ. Similar multivariable experimental reactors are employed beyond biology in the study of active matter and non-equilibrium chemical reactions, in which physical systems are maintained far from equilibrium through the continuous introduction of energy or matter. Inexpensive state-of-the-art components enable open-source implementation of such multiparameter architectures, which represent a move away from expensive systems optimised for single measurements, towards affordable and reconfigurable multi-measurement systems. The transfer of well-understood engineering knowledge into the hands of biological and chemical specialists via open-source channels allows rapid cycles of experimental development and heralds a change in experimental capability that is driving increased theoretical and practical understanding of out-of-equilibrium systems across a wide range of scientific fields.

Citation: Forman CJ (2020) Controlling control—A primer in open-source experimental control systems. PLoS Biol 18(9): e3000858. https://doi.org/10.1371/journal.pbio.3000858

Copyright: © 2020 Christopher James Forman. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: APEX, Automated Parametric Explorer; DC, direct current; GFP, green fluorescent protein; I 2 C, inter-integrated circuit; LED, light emitting diode; PID, proportional-integral-derivative; SPI, serial peripheral interface; USB, universal serial bus; UV-Vis, ultraviolet to visible

Multiparameter measurement and feedback systems

Soyuz and Space X capsules allow mission specialists to reach orbit. Along similar lines, well-conceived open-source control systems and high-quality state-of-the-art components give specialists the ability to construct sophisticated experimental configurations with little to no engineering expertise, bringing domain-specific knowledge closer to the laboratory equipment design process. The demand for such capabilities has resulted in the development of core control systems into which a range of high-quality state of the art sensors, actuators, and control algorithms can be inserted with minimal effort. Instead of many separate machines optimised for single measurements—such as ultraviolet to visible (UV-Vis) light spectrometers, or microscopes—these generalised frameworks allow for simultaneous measurement and control of multiple experimental parameters, under the complete direction of the end-user. Such systems are made affordable by the diverse range of extremely cheap—yet high-quality—components available on the consumer market, from supply companies such as Mouser, RS, or Digikey.

Multiparameter experimental arrangements introduce the possibility of sophisticated feedback loops to investigate coupling between several measurable quantities. By modifying an input perturbation in response to observed system behaviour, it is possible to drive the system out of equilibrium to specific steady states and actively keep it there—like balancing a pencil on its tip. The amount of energy or matter needed to maintain such a state can be tracked, allowing characterisation of the interaction between perturbed quantities and other co-monitored quantities. Although each measurement may not be as accurate as a measurement on a dedicated system, such loss of fidelity is more than compensated for by enhanced understanding of the interactions between parameters. For example, with the Chi-Bio bioreactor, Steel and colleagues [ 1 ] made use of the sensor histidine kinase—CcaS—and its cognate response receptor—CcaR—to form an optogenetic system coupled to green fluorescent protein (GFP) expression. They were able to drive the quantity of GFP in a culture of cells so it tracked an externally defined profile. The impact of the expression of GFP on other gene circuits—coupled via cellular burden—could then be monitored at distinct wavelengths with more than sufficient accuracy and precision to understand the behaviour.

Experimental feedback loops are becoming increasingly popular in other fields such as physics and chemistry.

A beautiful example of a feedback loop experiment [ 2 ]—with relevance to biologists—contributed evidence to support the topical and beguiling idea of dissipative adaptation [ 3 , 4 ] from the field of non-equilibrium physics, in which time-varying energy input into a system can be used to select assembly processes. Control over the assembly of a wide range of objects—including living cells—was established far from equilibrium by regulating the flow of energy supplied by a laser. Such observations are clearly of direct relevance in the behaviour of biological cells which exist in external flows of matter or energy.

In an intriguing chemical example, a feedback loop enabled external selection of the morphology of gold nanoparticles, directing their evolution [ 5 ]. A pedagogical example is shown in Fig 1 , using the author’s own setup in which a dye (fluorescein) is combined with a continuous flow of water and the resulting emission is tracked to ensure the concentration stays the same.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Engineered feedback loops are extremely common and well understood for driving and monitoring physical processes. A sensor and actuator work in tandem to drive an arbitrary physical process—typically using an error-driven algorithm. (B) An example feedback loop uses spectral intensity as an input into a PID algorithm to regulate the concentration of a dye (fluorescein) in a continuous flow of water. (C) Only a narrow band of the spectrum is monitored in this setup. In principle many such bands can be monitored, allowing tracking of multiple species. (D) The emission intensity oscillates about the set-point, allowing the system to respond to impulses such as the rapid manual addition of water directly into the solution. Data acquisition was intermittent to simulate lost data. The time taken to return to the steady state behaviour after a transient impulse is known as the response time. Em, Emission; Ex, Excitation; I 2 C, inter-integrated circuit; LED, light emitting diode; PID, Proportional-Integral-Derivative.

https://doi.org/10.1371/journal.pbio.3000858.g001

The concept of a feedback loop is universal ( Fig 1A ) and could be applied at any time and length scale. The properties of the physical system dictate the specification of the necessary hardware that drives the cost of the system. Biological and chemical processes are generally, but not always, quite slow and respond on the order of seconds to hours. For a modern processor, such timescales are easy to cope with, so simple computers such as Arduinos (Arduino, Somerville, MA), Raspberry Pis (Raspberry Pi Foundation, Cambridge, UK), or Beaglebones ( Beagleboard.org Foundation, Oakland, MI) have the capacity to drive biological and chemical systems with reasonably sophisticated algorithms.

Exploring picosecond chemical process—such as photoexcitation—needs femtosecond laser pulses and substantially more expensive electronics. Consequently, re-fitting engineered bioreactors, such as Chi-Bio, for use as chemical reactors is probably more likely to work for larger macromolecular polymerisations and self-assembly reactions, which can take hours or days. Such an observation is unsurprising when you consider a cell to be, in part, a macromolecular polymerization reaction network.

To go about designing a specific reactor such as Chi-Bio, or the author’s system ( Fig 1B ), a range of concepts need to be understood, which are explained in Box 1 . Perhaps the most enticing aspect of these kinds of system is the requirement to include software as a key part of the experiment, which often incorporates a model—such as the digital twinning process described in Box 1 . Indeed, such is the scientific goal of building feedback systems: If you can understand how a physical system responds to arbitrary input, and drive it to any particular state from any other state, and you know what all its states are—both numerically and in real life—is there anything left that is learnable about that system, besides pushing it into new conditions?

Box 1. Universal control principles: A brief definition of key concepts associated with control systems

Open or closed.

Control systems operate in open or closed configuration. In closed loops, the perturbing value is adjusted in response to observed behaviour, enabling stabilisation of physical quantities—such as the temperature. Open loops passively record behaviour in response to predetermined or uncontrolled input, e.g., observing turbidity as a cell culture matures or reaction colour as reagents are introduced at unregulated rates.

Response time

Every physical system consists of a set of processes that operate at different rates. Such a system will take time to settle to a new state in response to an input perturbation. Whether open or closed loop, the time taken for the system to respond depends on which processes are perturbed.

Sampling rate

The rate at which information about the physical system is captured determines the fastest process that the control system can respond to. The sampling rate should be high enough to capture information about the fastest process of interest.

The bandwidth of the control system refers to the range of rates of processes that a control system can monitor and respond to. The wider the bandwidth, the higher the sampling rate and the faster a process that the control system can monitor and control.

Sensitivity

The more sensitive the control system, the smaller the detectable change in the physical system. Sensitivity is generally governed by the quality of the sensor and the amount of noise.

Signals that are dominated by noise can be detected by summing multiple measurements. The random noise cancels out, but the signal does not. However, control decisions must be made within a finite time, so long integration times are not always possible. Continuous accumulation of historical data is therefore extremely helpful but creates memory and processing requirements. As each new measurement arrives, it is weighed against previously collected data. Does a change in a measured quantity correspond to noise? Or is it genuinely a true change in the measured quantity of the system?

Processing gain

Additional signal processing can make up for poor signal-to-noise ratio. Some examples follow.

Proportional-integral-derivative controller

The proportional-integral-derivative (PID) controller used in Fig 1 is one example of how to maintain a steady state in a closed-loop configuration using a relatively simple algorithm. The value to apply to the actuator is calculated from the error between the desired value (the set-point) and the current sensor value. “P,” “I,” and “D” refer to 3 terms of the equation used in the calculation, which are values proportional to the error (P), the integrated sum of the error (I), and the derivative of the error (D). Often only P and I are necessary.

Advanced algorithms

In complex systems like spacecraft or automated factories, many sensors and actuators are monitored and adjusted continuously, so covariant effects must be considered in the model used to make control decisions. A process called digital twinning compares experimental and theoretical versions of a system to check performance. The better characterised a system becomes, the more likely a model can correctly predict system behaviour, yielding excellent knowledge of how to control the system.

Comparison of systems

The Chi-Bio [ 1 ] is a great example of an open-source multiparameter control system, in which a team of engineers and specialists has taken care of a host of power, control, and mechanical engineering problems, enabling biologists (and chemists) to put together a wide range of experiments involving combinations of a broad variety of components. A comparison of Chi-Bio with the author’s own home-built feedback control system for polymer chemistry, dubbed Automated Parametric Explorer (APEX), reveals some key similarities and differences that will help provide a baseline of examples at different price points for anyone thinking about entering the space of multiparameter control systems.

Chi-Bio consists of a series of predesigned circuit boards that assemble to form a mechanical infrastructure that creates a cavity to host a monitored reaction chamber in standard reaction vessels. The control and power system service a range of sensors (spectrometer, thermometer) and actuators (light emitting diodes [LEDs], laser, fluidic pumps, stirrer) enabling interactions to occur within the reaction chamber volume. Up to 8 bioreactors can be operated in parallel from the same controlling computer (a Beaglebone).

APEX is part off-the-shelf and part homemade. It is mainly a software framework for linking together a wide variety of devices that are controlled via universal serial bus (USB) and a network of Arduinos. Each Arduino controls a specific laboratory actuator (such as an LED, a motor, pneumatic solenoids) which can be distributed around an opto-fluidic setup and hooked into a single control and power framework. The assembly in Fig 1 took less than a day to set up and begin acquiring data. It is trivial to upload complex functions to each Arduino allowing low-level commands to trigger complex actuation behaviour. Sensing is performed via standard devices—such as cameras and spectrometers—which employ conventional optics. A simple potentiostat has also been integrated. Such a setup transfers laboratory control into user-written software allowing all these devices to be incorporated into a single data-driven application.

Power supply

Chi-Bio uses a standard 12 V power supply that is down-regulated to 6 V and 3.3 V. Such direct current (DC) to DC conversion is achieved with buck convertors, which are inexpensive circuits. In APEX, power supply is a standard 450 W computer power supply delivering 5 V and 12 V power rails. APEX also employs a 24 V rail that enables control over higher-power components, such as pump motors, through H-Bridge circuits. The Chi-Bio also employs a watch circuit that monitors for short circuits caused by splashes and automatically cuts power in that eventuality.

Optical system

The Chi-Bio optical input consists of a 7-LED board that generates narrow-band light signals across the visible spectrum as well as a UV LED and a 650 nm diode laser for optical density measurements. The output is monitored by an 11-channel spectrometer chip that consists of a single light-sensitive array with a checker-board pattern of 11 filters. Thus, each region of the sensor chip monitors a slightly different frequency, trading resolution for number of channels. Chi-Bio monitors laser brightness through the sample and—since the LED optical path is perpendicular to the spectrometer’s optical axis—Chi-Bio can monitor fluorescence emissions caused by one of the 8 excitation LEDs without the need for expensive dichroic mirrors and filters.

In contrast, APEX employs individual LEDs mounted on threaded 1-inch discs (SM1CP2M, Thorlabs, Newton, NJ) enabling their simplified incorporation into a standard 1-inch optical setup. A high-power broadband LED can be filtered using narrow-band filters to achieve a wide range of input signals, which can be focused using normal optics as shown in Fig 1B . Two bandpass filters and a dichroic are used to isolate emission and excitation using the same objective lens. This design helps to keep optics and fluids well separated. The signals across a range of wavelengths and integration times can be detected by a spectrometer (CCS250 from Thorlabs, Newton, NJ). To put the full-blown optical solution into contrast with Chi-Bio, the precision milled 1-inch disc alone costs around $20, a basic optical filter set starts at $1,000, and the spectrometer $2,000+. While the optical arrangements and quality of signals are no doubt superior for APEX, the cost difference is enormous. The spectrometer in Chi-Bio is a single chip that costs $9 but still provides good enough results to make excellent scientific progress. The value is in the flexibility and number of simultaneous channels. The entire Chi-Bio kit costs around $800 and measures temperature, absorbance, and fluorescence. APEX can be a fluorimeter, a microscope, a spectrometer, a potentiostat, a camera, a fluidic control system, or a machine-learning and data management centre, and it probably comes in at around $20,000 for parts—but not design labour—which is about the price of a high-end UV-Vis instrument.

Control protocols

In both cases, the same universal protocol enables communications across the devices in the system. Several low-level standards exist such as inter-integrated circuit (I 2 C) and serial peripheral interfaces (SPIs), in which one wire provides a clock to synchronize data transfer and the other wires provide one or more data channels and secondary circuit activation selection mechanisms. The I 2 C system used in Chi-Bio and APEX is an industry standard that is used to link together microchips with just 2 wires. Typically, these are primary-secondary configurations in which a single primary can control many secondary circuits.

As open-source systems employ ever more sophisticated architectures, the gaps between the knowledge of biologists, engineers, physicists, and chemists dwindle to occupy a similar realm, in which inert matter is pushed out of equilibrium to bring it one step closer to living material.

  • View Article
  • Google Scholar
  • PubMed/NCBI

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Controlled Experiments | Methods & Examples of Control

Controlled Experiments | Methods & Examples of Control

Published on 19 April 2022 by Pritha Bhandari . Revised on 10 October 2022.

In experiments , researchers manipulate independent variables to test their effects on dependent variables. In a controlled experiment , all variables other than the independent variable are controlled or held constant so they don’t influence the dependent variable.

Controlling variables can involve:

  • Holding variables at a constant or restricted level (e.g., keeping room temperature fixed)
  • Measuring variables to statistically control for them in your analyses
  • Balancing variables across your experiment through randomisation (e.g., using a random order of tasks)

Table of contents

Why does control matter in experiments, methods of control, problems with controlled experiments, frequently asked questions about controlled experiments.

Control in experiments is critical for internal validity , which allows you to establish a cause-and-effect relationship between variables.

  • Your independent variable is the colour used in advertising.
  • Your dependent variable is the price that participants are willing to pay for a standard fast food meal.

Extraneous variables are factors that you’re not interested in studying, but that can still influence the dependent variable. For strong internal validity, you need to remove their effects from your experiment.

  • Design and description of the meal
  • Study environment (e.g., temperature or lighting)
  • Participant’s frequency of buying fast food
  • Participant’s familiarity with the specific fast food brand
  • Participant’s socioeconomic status

Prevent plagiarism, run a free check.

You can control some variables by standardising your data collection procedures. All participants should be tested in the same environment with identical materials. Only the independent variable (e.g., advert colour) should be systematically changed between groups.

Other extraneous variables can be controlled through your sampling procedures . Ideally, you’ll select a sample that’s representative of your target population by using relevant inclusion and exclusion criteria (e.g., including participants from a specific income bracket, and not including participants with colour blindness).

By measuring extraneous participant variables (e.g., age or gender) that may affect your experimental results, you can also include them in later analyses.

After gathering your participants, you’ll need to place them into groups to test different independent variable treatments. The types of groups and method of assigning participants to groups will help you implement control in your experiment.

Control groups

Controlled experiments require control groups . Control groups allow you to test a comparable treatment, no treatment, or a fake treatment, and compare the outcome with your experimental treatment.

You can assess whether it’s your treatment specifically that caused the outcomes, or whether time or any other treatment might have resulted in the same effects.

  • A control group that’s presented with red advertisements for a fast food meal
  • An experimental group that’s presented with green advertisements for the same fast food meal

Random assignment

To avoid systematic differences between the participants in your control and treatment groups, you should use random assignment .

This helps ensure that any extraneous participant variables are evenly distributed, allowing for a valid comparison between groups .

Random assignment is a hallmark of a ‘true experiment’ – it differentiates true experiments from quasi-experiments .

Masking (blinding)

Masking in experiments means hiding condition assignment from participants or researchers – or, in a double-blind study , from both. It’s often used in clinical studies that test new treatments or drugs.

Sometimes, researchers may unintentionally encourage participants to behave in ways that support their hypotheses. In other cases, cues in the study environment may signal the goal of the experiment to participants and influence their responses.

Using masking means that participants don’t know whether they’re in the control group or the experimental group. This helps you control biases from participants or researchers that could influence your study results.

Although controlled experiments are the strongest way to test causal relationships, they also involve some challenges.

Difficult to control all variables

Especially in research with human participants, it’s impossible to hold all extraneous variables constant, because every individual has different experiences that may influence their perception, attitudes, or behaviors.

But measuring or restricting extraneous variables allows you to limit their influence or statistically control for them in your study.

Risk of low external validity

Controlled experiments have disadvantages when it comes to external validity – the extent to which your results can be generalised to broad populations and settings.

The more controlled your experiment is, the less it resembles real world contexts. That makes it harder to apply your findings outside of a controlled setting.

There’s always a tradeoff between internal and external validity . It’s important to consider your research aims when deciding whether to prioritise control or generalisability in your experiment.

Experimental designs are a set of procedures that you plan in order to examine the relationship between variables that interest you.

To design a successful experiment, first identify:

  • A testable hypothesis
  • One or more independent variables that you will manipulate
  • One or more dependent variables that you will measure

When designing the experiment, first decide:

  • How your variable(s) will be manipulated
  • How you will control for any potential confounding or lurking variables
  • How many subjects you will include
  • How you will assign treatments to your subjects

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, October 10). Controlled Experiments | Methods & Examples of Control. Scribbr. Retrieved 12 August 2024, from https://www.scribbr.co.uk/research-methods/controlled-experiments/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

What are Controlled Experiments?

Determining Cause and Effect

skynesher / Getty Images

  • Research, Samples, and Statistics
  • Key Concepts
  • Major Sociologists
  • News & Issues
  • Recommended Reading
  • Archaeology

A controlled experiment is a highly focused way of collecting data and is especially useful for determining patterns of cause and effect. This type of experiment is used in a wide variety of fields, including medical, psychological, and sociological research. Below, we’ll define what controlled experiments are and provide some examples.

Key Takeaways: Controlled Experiments

  • A controlled experiment is a research study in which participants are randomly assigned to experimental and control groups.
  • A controlled experiment allows researchers to determine cause and effect between variables.
  • One drawback of controlled experiments is that they lack external validity (which means their results may not generalize to real-world settings).

Experimental and Control Groups

To conduct a controlled experiment , two groups are needed: an experimental group and a control group . The experimental group is a group of individuals that are exposed to the factor being examined. The control group, on the other hand, is not exposed to the factor. It is imperative that all other external influences are held constant . That is, every other factor or influence in the situation needs to remain exactly the same between the experimental group and the control group. The only thing that is different between the two groups is the factor being researched.

For example, if you were studying the effects of taking naps on test performance, you could assign participants to two groups: participants in one group would be asked to take a nap before their test, and those in the other group would be asked to stay awake. You would want to ensure that everything else about the groups (the demeanor of the study staff, the environment of the testing room, etc.) would be equivalent for each group. Researchers can also develop more complex study designs with more than two groups. For example, they might compare test performance among participants who had a 2-hour nap, participants who had a 20-minute nap, and participants who didn’t nap.

Assigning Participants to Groups

In controlled experiments, researchers use  random assignment (i.e. participants are randomly assigned to be in the experimental group or the control group) in order to minimize potential confounding variables in the study. For example, imagine a study of a new drug in which all of the female participants were assigned to the experimental group and all of the male participants were assigned to the control group. In this case, the researchers couldn’t be sure if the study results were due to the drug being effective or due to gender—in this case, gender would be a confounding variable.

Random assignment is done in order to ensure that participants are not assigned to experimental groups in a way that could bias the study results. A study that compares two groups but does not randomly assign participants to the groups is referred to as quasi-experimental, rather than a true experiment.

Blind and Double-Blind Studies

In a blind experiment, participants don’t know whether they are in the experimental or control group. For example, in a study of a new experimental drug, participants in the control group may be given a pill (known as a placebo ) that has no active ingredients but looks just like the experimental drug. In a double-blind study , neither the participants nor the experimenter knows which group the participant is in (instead, someone else on the research staff is responsible for keeping track of group assignments). Double-blind studies prevent the researcher from inadvertently introducing sources of bias into the data collected.

Example of a Controlled Experiment

If you were interested in studying whether or not violent television programming causes aggressive behavior in children, you could conduct a controlled experiment to investigate. In such a study, the dependent variable would be the children’s behavior, while the independent variable would be exposure to violent programming. To conduct the experiment, you would expose an experimental group of children to a movie containing a lot of violence, such as martial arts or gun fighting. The control group, on the other hand, would watch a movie that contained no violence.

To test the aggressiveness of the children, you would take two measurements : one pre-test measurement made before the movies are shown, and one post-test measurement made after the movies are watched. Pre-test and post-test measurements should be taken of both the control group and the experimental group. You would then use statistical techniques to determine whether the experimental group showed a significantly greater increase in aggression, compared to participants in the control group.

Studies of this sort have been done many times and they usually find that children who watch a violent movie are more aggressive afterward than those who watch a movie containing no violence.

Strengths and Weaknesses

Controlled experiments have both strengths and weaknesses. Among the strengths is the fact that results can establish causation. That is, they can determine cause and effect between variables. In the above example, one could conclude that being exposed to representations of violence causes an increase in aggressive behavior. This kind of experiment can also zero-in on a single independent variable, since all other factors in the experiment are held constant.

On the downside, controlled experiments can be artificial. That is, they are done, for the most part, in a manufactured laboratory setting and therefore tend to eliminate many real-life effects. As a result, analysis of a controlled experiment must include judgments about how much the artificial setting has affected the results. Results from the example given might be different if, say, the children studied had a conversation about the violence they watched with a respected adult authority figure, like a parent or teacher, before their behavior was measured. Because of this, controlled experiments can sometimes have lower external validity (that is, their results might not generalize to real-world settings).

Updated  by Nicki Lisa Cole, Ph.D.

  • An Overview of Qualitative Research Methods
  • Using Ethnomethodology to Understand Social Order
  • Pros and Cons of Secondary Data Analysis
  • Immersion Definition: Cultural, Language, and Virtual
  • Sociology Explains Why Some People Cheat on Their Spouses
  • What Is Participant Observation Research?
  • The Differences Between Indexes and Scales
  • Definition and Overview of Grounded Theory
  • Deductive Versus Inductive Reasoning
  • The Study of Cultural Artifacts via Content Analysis
  • Units of Analysis as Related to Sociology
  • Data Sources For Sociological Research
  • Full-Text Sociology Journals Online
  • How Race and Gender Biases Impact Students in Higher Ed
  • The Racial Wealth Gap
  • A Review of Software Tools for Quantitative Data Analysis

science education resource

  • Activities, Experiments, Online Games, Visual Aids
  • Activities, Experiments, and Investigations
  • Experimental Design and the Scientific Method

Experimental Design - Independent, Dependent, and Controlled Variables

To view these resources with no ads, please login or subscribe to help support our content development. school subscriptions can access more than 175 downloadable unit bundles in our store for free (a value of $1,500). district subscriptions provide huge group discounts for their schools. email for a quote: [email protected] ..

Scientific experiments are meant to show cause and effect of a phenomena (relationships in nature).  The “ variables ” are any factor, trait, or condition that can be changed in the experiment and that can have an effect on the outcome of the experiment.

An experiment can have three kinds of variables: i ndependent, dependent, and controlled .

  • The independent variable is one single factor that is changed by the scientist followed by observation to watch for changes. It is important that there is just one independent variable, so that results are not confusing.
  • The dependent variable is the factor that changes as a result of the change to the independent variable.
  • The controlled variables (or constant variables) are factors that the scientist wants to remain constant if the experiment is to show accurate results. To be able to measure results, each of the variables must be able to be measured.

For example, let’s design an experiment with two plants sitting in the sun side by side. The controlled variables (or constants) are that at the beginning of the experiment, the plants are the same size, get the same amount of sunlight, experience the same ambient temperature and are in the same amount and consistency of soil (the weight of the soil and container should be measured before the plants are added). The independent variable is that one plant is getting watered (1 cup of water) every day and one plant is getting watered (1 cup of water) once a week. The dependent variables are the changes in the two plants that the scientist observes over time.

Experimental Design - Independent, Dependent, and Controlled Variables

Can you describe the dependent variable that may result from this experiment? After four weeks, the dependent variable may be that one plant is taller, heavier and more developed than the other. These results can be recorded and graphed by measuring and comparing both plants’ height, weight (removing the weight of the soil and container recorded beforehand) and a comparison of observable foliage.

Using What You Learned: Design another experiment using the two plants, but change the independent variable. Can you describe the dependent variable that may result from this new experiment?

Think of another simple experiment and name the independent, dependent, and controlled variables. Use the graphic organizer included in the PDF below to organize your experiment's variables.

Please Login or Subscribe to access downloadable content.

Citing Research References

When you research information you must cite the reference. Citing for websites is different from citing from books, magazines and periodicals. The style of citing shown here is from the MLA Style Citations (Modern Language Association).

When citing a WEBSITE the general format is as follows. Author Last Name, First Name(s). "Title: Subtitle of Part of Web Page, if appropriate." Title: Subtitle: Section of Page if appropriate. Sponsoring/Publishing Agency, If Given. Additional significant descriptive information. Date of Electronic Publication or other Date, such as Last Updated. Day Month Year of access < URL >.

Here is an example of citing this page:

Amsel, Sheri. "Experimental Design - Independent, Dependent, and Controlled Variables" Exploring Nature Educational Resource ©2005-2024. March 25, 2024 < http://www.exploringnature.org/db/view/Experimental-Design-Independent-Dependent-and-Controlled-Variables >

Exploringnature.org has more than 2,000 illustrated animals. Read about them, color them, label them, learn to draw them.

Exploringnature.org has more than 2,000 illustrated animals. Read about them, color them, label them, learn to draw them.

helpful professor logo

Positive Control vs Negative Control: Differences & Examples

Positive Control vs Negative Control: Differences & Examples

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Learn about our Editorial Process

positive control vs negative control, explained below

A positive control is designed to confirm a known response in an experimental design , while a negative control ensures there’s no effect, serving as a baseline for comparison.

The two terms are defined as below:

  • Positive control refers to a group in an experiment that receives a procedure or treatment known to produce a positive result. It serves the purpose of affirming the experiment’s capability to produce a positive outcome.
  • Negative control refers to a group that does not receive the procedure or treatment and is expected not to yield a positive result. Its role is to ensure that a positive result in the experiment is due to the treatment or procedure.

The experimental group is then compared to these control groups, which can help demonstrate efficacy of the experimental treatment in comparison to the positive and negative controls.

Positive Control vs Negative Control: Key Terms

Control groups.

A control group serves as a benchmark in an experiment. Typically, it is a subset of participants, subjects, or samples that do not receive the experimental treatment (as in negative control).

This could mean assigning a placebo to a human subject or leaving a sample unaltered in chemical experiments. By comparing the results obtained from the experimental group to the control, you can ascertain whether any differences are due to the treatment or random variability.

A well-configured experimental control is critical for drawing valid conclusions from an experiment. Correct use of control groups permits specificity of findings, ensuring the integrity of experimental data.

See More: Control Variables Examples

The Negative Control

Negative control is a group or condition in an experiment that ought to show no effect from the treatment.

It is useful in ensuring that the outcome isn’t accidental or influenced by an external cause. Imagine a medical test, for instance. You use distilled water, anticipating no reaction, as a negative control.

If a significant result occurs, it warns you of a possible contamination or malfunction during the testing. Failure of negative controls to stay ‘negative’ risks misinterpretation of the experiment’s result, and could undermine the validity of the findings.

The Positive Control

A positive control, on the other hand, affirms an experiment’s functionality by demonstrating a known reaction.

This might be a group or condition where the expected output is known to occur, which you include to ensure that the experiment can produce positive results when they are present. For instance, in testing an antibiotic, a well-known pathogen, susceptible to the medicine, could be the positive control.

Positive controls affirm that under appropriate conditions your experiment can produce a result. Without this reference, experiments could fail to detect true positive results, leading to false negatives. These two controls, used judiciously, are backbones of effective experimental practice.

Experimental Groups

Experimental groups are primarily characterized by their exposure to the examined variable.

That is, these are the test subjects that receive the treatment or intervention under investigation. The performance of the experimental group is then compared against the well-established markers – our positive and negative controls.

For example, an experimental group may consist of rats undergoing a pharmaceutical testing regime, or students learning under a new educational method. Fundamentally, this unit bears the brunt of the investigation and their response powers the outcomes.

However, without positive and negative controls, gauging the results of the experimental group could become erratic. Both control groups exist to highlight what outcomes are expected with and without the application of the variable in question. By comparing results, a clearer connection between the experiment variables and the observed changes surfaces, creating robust and indicative scientific conclusions.

Positive and Negative Control Examples

1. a comparative study of old and new pesticides’ effectiveness.

This hypothetical study aims to evaluate the effectiveness of a new pesticide by comparing its pest-killing potential with old pesticides and an untreated set. The investigation involves three groups: an untouched space (negative control), another treated with an established pesticide believed to kill pests (positive control), and a third area sprayed with the new pesticide (experimental group).

  • Negative Control: This group consists of a plot of land infested by pests and not subjected to any pesticide treatment. It acts as the negative control. You expect no decline in pest populations in this area. Any unexpected decrease could signal external influences (i.e. confounding variables ) on the pests unrelated to pesticides, affecting the experiment’s validity.
  • Positive Control: Another similar plot, this time treated with a well-established pesticide known to reduce pest populations, constitutes the positive control. A significant reduction in pests in this area would affirm that the experimental conditions are conducive to detect pest-killing effects when a pesticide is applied.
  • Experimental Group: This group consists of the third plot impregnated with the new pesticide. Carefully monitoring the pest level in this research area against the backdrop of the control groups will reveal whether the new pesticide is effective or not. Through comparison with the other groups, any difference observed can be attributed to the new pesticide.

2. Evaluating the Effectiveness of a Newly Developed Weight Loss Pill

In this hypothetical study, the effectiveness of a newly formulated weight loss pill is scrutinized. The study involves three groups: a negative control group given a placebo with no weight-reducing effect, a positive control group provided with an approved weight loss pill known to cause a decrease in weight, and an experimental group given the newly developed pill.

  • Negative Control: The negative control is comprised of participants who receive a placebo with no known weight loss effect. A significant reduction in weight in this group would indicate confounding factors such as dietary changes or increased physical activity, which may invalidate the study’s results.
  • Positive Control: Participants in the positive control group receive an FDA-approved weight loss pill, anticipated to induce weight loss. The success of this control would prove that the experiment conditions are apt to detect the effects of weight loss pills.
  • Experimental Group: This group contains individuals receiving the newly developed weight loss pill. Comparing the weight change in this group against both the positive and negative control, any difference observed would offer evidence about the effectiveness of the new pill.

3. Testing the Efficiency of a New Solar Panel Design

This hypothetical study focuses on assessing the efficiency of a new solar panel design. The study involves three sets of panels: a set that is shaded to yield no solar energy (negative control), a set with traditional solar panels that are known to produce an expected level of solar energy (positive control), and a set fitted with the new solar panel design (experimental group).

  • Negative Control: The negative control involves a set of solar panels that are deliberately shaded, thus expecting no solar energy output. Any unexpected energy output from this group could point towards measurement errors, needed to be rectified for a valid experiment.
  • Positive Control: The positive control set up involves traditional solar panels known to produce a specific amount of energy. If these panels produce the expected energy, it validates that the experiment conditions are capable of measuring solar energy effectively.
  • Experimental Group: The experimental group features the new solar panel design. By comparing the energy output from this group against both the controls, any significant output variation would indicate the efficiency of the new design.

4. Investigating the Efficacy of a New Fertilizer on Plant Growth

This hypothetical study investigates the efficacy of a newly formulated fertilizer on plant growth. The study involves three sets of plants: a set without any fertilizer (negative control), a set treated with an established fertilizer known to promote plant growth (positive control), and a third set fed with the new fertilizer (experimental group).

  • Negative Control: The negative control involves a set of plants not receiving any fertilizer. Lack of significant growth in this group will confirm that any observed growth in other groups is due to the applied fertilizer rather than other uncontrolled factors.
  • Positive Control: The positive control involves another set of plants treated with a well-known fertilizer, expected to promote plant growth. Adequate growth in these plants will validate that the experimental conditions are suitable to detect the influence of a good fertilizer on plant growth.
  • Experimental Group: The experimental group consists of the plants subjected to the newly formulated fertilizer. Investigating the growth in this group against the growth in the control groups will provide ascertained evidence whether the new fertilizer is efficient or not.

5. Evaluating the Impact of a New Teaching Method on Student Performance

This hypothetical study aims to evaluate the impact of a new teaching method on students’ performance. This study involves three groups, a group of students taught through traditional methods (negative control), another group taught through an established effective teaching strategy (positive control), and one more group of students taught through the new teaching method (experimental group).

  • Negative Control: The negative control comprises students taught by standard teaching methods, where you expect satisfactory but not top-performing results. Any unexpected high results in this group could signal external factors such as private tutoring or independent study, which in turn may distort the experimental outcome.
  • Positive Control: The positive control consists of students taught by a known efficient teaching strategy. High performance in this group would prove that the experimental conditions are competent to detect the efficiency of a teaching method.
  • Experimental Group: This group consists of students receiving instruction via the new teaching method. By analyzing their performance against both control groups, any difference in results could be attributed to the new teaching method, determining its efficacy.

Table Summary

AspectPositive ControlNegative Control
To confirm that the experiment is working properly and that results can be detected.To ensure that there is no effect when there shouldn’t be, and to provide a baseline for comparison.
A known effect or change.No effect or change.
Used to demonstrate that the experimental setup can produce a positive result.Used to demonstrate that any observed effects are due to the experimental treatment and not other factors.
Plants given known amounts of sunlight to ensure they grow.Plants given no sunlight to ensure they don’t grow.
A substrate known to be acted upon by the enzyme.A substrate that the enzyme doesn’t act upon.
A medium known to support bacterial growth.A medium that doesn’t support bacterial growth (sterile medium).
Validates that the experimental system is sensitive and can detect changes if they occur.Validates that observed effects are due to the variable being tested and not due to external or unknown factors.
If the positive control doesn’t produce the expected result, the experimental setup or procedure may be flawed.If the negative control shows an effect, there may be contamination or other unexpected variables influencing the results.

Chris

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Number Games for Kids (Free and Easy)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Word Games for Kids (Free and Easy)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Outdoor Games for Kids
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 50 Incentives to Give to Students

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

  • COVID-19 Tracker
  • Biochemistry
  • Anatomy & Physiology
  • Microbiology
  • Neuroscience
  • Animal Kingdom
  • NGSS High School
  • Latest News
  • Editors’ Picks
  • Weekly Digest
  • Quotes about Biology

Biology Dictionary

Experimental Group

BD Editors

Reviewed by: BD Editors

Experimental Group Definition

In a comparative experiment, the experimental group (aka the treatment group) is the group being tested for a reaction to a change in the variable. There may be experimental groups in a study, each testing a different level or amount of the variable. The other type of group, the control group , can show the effects of the variable by having a set amount, or none, of the variable. The experimental groups vary in the level of variable they are exposed to, which shows the effects of various levels of a variable on similar organisms.

In biological experiments, the subjects being studied are often living organisms. In such cases, it is desirable that all the subjects be closely related, in order to reduce the amount of genetic variation present in the experiment. The complicated interactions between genetics and the environment can cause very peculiar results when exposed to the same variable. If the organisms being tested are not related, the results could be the effects of the genetics and not the variable. This is why new human drugs must be rigorously tested in a variety of animals before they can be tested on humans. These different experimental groups allow researchers to see the effects of their drug on different genetics. By using animals that are closer and closer in their relation to humans, eventually human trials can take place without severe risks for the first people to try the drug.

Examples of Experimental Group

A simple experiment.

A student is conducting an experiment on the effects music has on growing plants. The student wants to know if music can help plants grow and, if so, which type of music the plants prefer. The students divide a group of plants in to two main groups, the control group and the experimental group. The control group will be kept in a room with no music, while the experimental group will be further divided into smaller experimental groups. Each of the experimental groups is placed in a separate room, with a different type of music.

Ideally, each room would have many plants in it, and all the plants used in the experiment would be clones of the same plant. Even more ideally, the plant would breed true, or would be homozygous for all genes. This would introduce the smallest amount of genetic variation into the experiment. By limiting all other variables, such as the temperature and humidity, the experiment can determine with validity that the effects produced in each room are attributable to the music, and nothing else.

Bugs in the River

To study the effects of variable on many organisms at once, scientist sometimes study ecosystems as a whole. The productivity of these ecosystems is often determined by the amount of oxygen they produce, which is an indication of how much algae is present. Ecologists sometimes study the interactions of organisms on these environments by excluding or adding organisms to an experimental group of ecosystems, and test the effects of their variable against ecosystems with no tampering. This method can sometimes show the drastic effects that various organisms have on an ecosystem.

Many experiments of this kind take place, and a common theme is to separate a single ecosystem into parts, with artificial divisions. Thus, a river could be separated by netting it into areas with and without bugs. The area with no nets allows bugs into the water. The bugs not only eat algae, but die and provide nutrients for the algae to grow. Without the bugs, various effects can be seen on the experimental portion of the river, covered by netting. The levels of oxygen in the water in each system can be measured, as well as other indicators of water quality. By comparing these groups, ecologists can begin to discern the complex relationships between populations of organisms in the environment.

Related Biology Terms

  • Control Group – The group that remains unchanged during the experiment, to provide comparison.
  • Scientific Method – The process scientists use to obtain valid, repeatable results.
  • Comparative Experiment – An experiment in which two groups, the control and experiment groups, are compared.
  • Validity – A measure of whether an experiment was caused by the changes in the variable, or simply the forces of chance.

Cite This Article

Subscribe to our newsletter, privacy policy, terms of service, scholarship, latest posts, white blood cell, t cell immunity, satellite cells, embryonic stem cells, popular topics, water cycle, animal cell, homeostasis, natural selection, scientific method, mitochondria, hermaphrodite.

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Control Group Definition and Examples

Control Group in an Experiment

The control group is the set of subjects that does not receive the treatment in a study. In other words, it is the group where the independent variable is held constant. This is important because the control group is a baseline for measuring the effects of a treatment in an experiment or study. A controlled experiment is one which includes one or more control groups.

  • The experimental group experiences a treatment or change in the independent variable. In contrast, the independent variable is constant in the control group.
  • A control group is important because it allows meaningful comparison. The researcher compares the experimental group to it to assess whether or not there is a relationship between the independent and dependent variable and the magnitude of the effect.
  • There are different types of control groups. A controlled experiment has one more control group.

Control Group vs Experimental Group

The only difference between the control group and experimental group is that subjects in the experimental group receive the treatment being studied, while participants in the control group do not. Otherwise, all other variables between the two groups are the same.

Control Group vs Control Variable

A control group is not the same thing as a control variable. A control variable or controlled variable is any factor that is held constant during an experiment. Examples of common control variables include temperature, duration, and sample size. The control variables are the same for both the control and experimental groups.

Types of Control Groups

There are different types of control groups:

  • Placebo group : A placebo group receives a placebo , which is a fake treatment that resembles the treatment in every respect except for the active ingredient. Both the placebo and treatment may contain inactive ingredients that produce side effects. Without a placebo group, these effects might be attributed to the treatment.
  • Positive control group : A positive control group has conditions that guarantee a positive test result. The positive control group demonstrates an experiment is capable of producing a positive result. Positive controls help researchers identify problems with an experiment.
  • Negative control group : A negative control group consists of subjects that are not exposed to a treatment. For example, in an experiment looking at the effect of fertilizer on plant growth, the negative control group receives no fertilizer.
  • Natural control group : A natural control group usually is a set of subjects who naturally differ from the experimental group. For example, if you compare the effects of a treatment on women who have had children, the natural control group includes women who have not had children. Non-smokers are a natural control group in comparison to smokers.
  • Randomized control group : The subjects in a randomized control group are randomly selected from a larger pool of subjects. Often, subjects are randomly assigned to either the control or experimental group. Randomization reduces bias in an experiment. There are different methods of randomly assigning test subjects.

Control Group Examples

Here are some examples of different control groups in action:

Negative Control and Placebo Group

For example, consider a study of a new cancer drug. The experimental group receives the drug. The placebo group receives a placebo, which contains the same ingredients as the drug formulation, minus the active ingredient. The negative control group receives no treatment. The reason for including the negative group is because the placebo group experiences some level of placebo effect, which is a response to experiencing some form of false treatment.

Positive and Negative Controls

For example, consider an experiment looking at whether a new drug kills bacteria. The experimental group exposes bacterial cultures to the drug. If the group survives, the drug is ineffective. If the group dies, the drug is effective.

The positive control group has a culture of bacteria that carry a drug resistance gene. If the bacteria survive drug exposure (as intended), then it shows the growth medium and conditions allow bacterial growth. If the positive control group dies, it indicates a problem with the experimental conditions. A negative control group of bacteria lacking drug resistance should die. If the negative control group survives, something is wrong with the experimental conditions.

  • Bailey, R. A. (2008).  Design of Comparative Experiments . Cambridge University Press. ISBN 978-0-521-68357-9.
  • Chaplin, S. (2006). “The placebo response: an important part of treatment”.  Prescriber . 17 (5): 16–22. doi: 10.1002/psb.344
  • Hinkelmann, Klaus; Kempthorne, Oscar (2008).  Design and Analysis of Experiments, Volume I: Introduction to Experimental Design  (2nd ed.). Wiley. ISBN 978-0-471-72756-9.
  • Pithon, M.M. (2013). “Importance of the control group in scientific research.” Dental Press J Orthod . 18 (6):13-14. doi: 10.1590/s2176-94512013000600003
  • Stigler, Stephen M. (1992). “A Historical View of Statistical Concepts in Psychology and Educational Research”. American Journal of Education . 101 (1): 60–70. doi: 10.1086/444032

Related Posts

Sciencing_Icons_Science SCIENCE

Sciencing_icons_biology biology, sciencing_icons_cells cells, sciencing_icons_molecular molecular, sciencing_icons_microorganisms microorganisms, sciencing_icons_genetics genetics, sciencing_icons_human body human body, sciencing_icons_ecology ecology, sciencing_icons_chemistry chemistry, sciencing_icons_atomic &amp; molecular structure atomic & molecular structure, sciencing_icons_bonds bonds, sciencing_icons_reactions reactions, sciencing_icons_stoichiometry stoichiometry, sciencing_icons_solutions solutions, sciencing_icons_acids &amp; bases acids & bases, sciencing_icons_thermodynamics thermodynamics, sciencing_icons_organic chemistry organic chemistry, sciencing_icons_physics physics, sciencing_icons_fundamentals-physics fundamentals, sciencing_icons_electronics electronics, sciencing_icons_waves waves, sciencing_icons_energy energy, sciencing_icons_fluid fluid, sciencing_icons_astronomy astronomy, sciencing_icons_geology geology, sciencing_icons_fundamentals-geology fundamentals, sciencing_icons_minerals &amp; rocks minerals & rocks, sciencing_icons_earth scructure earth structure, sciencing_icons_fossils fossils, sciencing_icons_natural disasters natural disasters, sciencing_icons_nature nature, sciencing_icons_ecosystems ecosystems, sciencing_icons_environment environment, sciencing_icons_insects insects, sciencing_icons_plants &amp; mushrooms plants & mushrooms, sciencing_icons_animals animals, sciencing_icons_math math, sciencing_icons_arithmetic arithmetic, sciencing_icons_addition &amp; subtraction addition & subtraction, sciencing_icons_multiplication &amp; division multiplication & division, sciencing_icons_decimals decimals, sciencing_icons_fractions fractions, sciencing_icons_conversions conversions, sciencing_icons_algebra algebra, sciencing_icons_working with units working with units, sciencing_icons_equations &amp; expressions equations & expressions, sciencing_icons_ratios &amp; proportions ratios & proportions, sciencing_icons_inequalities inequalities, sciencing_icons_exponents &amp; logarithms exponents & logarithms, sciencing_icons_factorization factorization, sciencing_icons_functions functions, sciencing_icons_linear equations linear equations, sciencing_icons_graphs graphs, sciencing_icons_quadratics quadratics, sciencing_icons_polynomials polynomials, sciencing_icons_geometry geometry, sciencing_icons_fundamentals-geometry fundamentals, sciencing_icons_cartesian cartesian, sciencing_icons_circles circles, sciencing_icons_solids solids, sciencing_icons_trigonometry trigonometry, sciencing_icons_probability-statistics probability & statistics, sciencing_icons_mean-median-mode mean/median/mode, sciencing_icons_independent-dependent variables independent/dependent variables, sciencing_icons_deviation deviation, sciencing_icons_correlation correlation, sciencing_icons_sampling sampling, sciencing_icons_distributions distributions, sciencing_icons_probability probability, sciencing_icons_calculus calculus, sciencing_icons_differentiation-integration differentiation/integration, sciencing_icons_application application, sciencing_icons_projects projects, sciencing_icons_news news.

  • Share Tweet Email Print
  • Home ⋅
  • Math ⋅
  • Probability & Statistics ⋅
  • Independent/Dependent Variables

Definitions of Control, Constant, Independent and Dependent Variables in a Science Experiment

experimental control in biology

Why Should You Only Test for One Variable at a Time in an Experiment?

The point of an experiment is to help define the cause and effect relationships between components of a natural process or reaction. The factors that can change value during an experiment or between experiments, such as water temperature, are called scientific variables, while those that stay the same, such as acceleration due to gravity at a certain location, are called constants.

The scientific method includes three main types of variables: constants, independent, and dependent variables. In a science experiment, each of these variables define a different measured or constrained aspect of the system.

Constant Variables

Experimental constants are values that should not change either during or between experiments. Many natural forces and properties, such as the speed of light and the atomic weight of gold, are experimental constants. In some cases, a property can be considered constant for the purposes of an experiment even though it technically could change under certain circumstances. The boiling point of water changes with altitude and acceleration due to gravity decreases with distance from the earth, but for experiments in one location these can also be considered constants.

Sometimes also called a controlled variable. A constant is a variable that could change, but that the experimenter intentionally keeps constant in order to more clearly isolate the relationship between the independent variable and the dependent variable.

If extraneous variables are not properly constrained, they are referred to as confounding variables, as they interfere with the interpretation of the results of the experiment.

Some examples of control variables might be found with an experiment examining the relationship between the amount of sunlight plants receive (independent variable) and subsequent plant growth (dependent variable). The experiment should control the amount of water the plants receive and when, what type of soil they are planted in, the type of plant, and as many other different variables as possible. This way, only the amount of light is being changed between trials, and the outcome of the experiment can be directly applied to understanding only this relationship.

Independent Variable

The independent variable in an experiment is the variable whose value the scientist systematically changes in order to see what effect the changes have. A well-designed experiment has only one independent variable in order to maintain a fair test. If the experimenter were to change two or more variables, it would be harder to explain what caused the changes in the experimental results. For example, someone trying to find how quickly water boils could alter the volume of water or the heating temperature, but not both.

Dependent Variable

A dependent variable – sometimes called a responding variable – is what the experimenter observes to find the effect of systematically varying the independent variable. While an experiment may have multiple dependent variables, it is often wisest to focus the experiment on one dependent variable so that the relationship between it and the independent variable can be clearly isolated. For example, an experiment could examine how much sugar can dissolve in a set volume of water at various temperatures. The experimenter systematically alters temperature (independent variable) to see its effect on the quantity of dissolved sugar (dependent variable).

Control Groups

In some experiment designs, there might be one effect or manipulated variable that is being measured. Sometimes there might be one collection of measurements or subjects completely separated from this variable called the control group. These control groups are held as a standard to measure the results of a scientific experiment.

An example of such a situation might be a study regarding the effectiveness of a certain medication. There might be multiple experimental groups that receive the medication in varying doses and applications, and there would likely be a control group that does not receive the medication at all.

Representing Results

Identifying which variables are independent, dependent, and controlled helps to collect data, perform useful experiments, and accurately communicate results. When graphing or displaying data, it is crucial to represent data accurately and understandably. Typically, the independent variable goes on the x-axis, and the dependent variable goes on the y-axis.

Related Articles

Why should you only test for one variable at a time..., difference between manipulative & responding variable, what is the meaning of variables in research, what is the difference between a control & a controlled..., what are constants & controls of a science project..., what is a constant in a science fair project, what are comparative experiments, how to grow a plant from a bean as a science project, how to calculate experimental value, what is an independent variable in quantitative research, what is a responding variable in science projects, what is a positive control in microbiology, how to write a testable hypothesis, how to use the pearson correlation coefficient, 5 components of a well-designed scientific experiment, distinguishing between descriptive & causal studies, how to write a protocol for biology experiments, what is a standardized variable in biology, school science projects for juniors, how to write a summary on a science project.

  • ScienceBuddies.org: Variables in Your Science Fair Project

About the Author

Benjamin Twist has worked as a writer, editor and consultant since 2007. He writes fiction and nonfiction for online and print publications, as well as offering one-on-one writing consultations and tutoring. Twist holds a Master of Arts in Bible exposition from Columbia International University.

Find Your Next Great Science Fair Project! GO

  • About MiniOne Systems
  • General Information
  • Partnerships
  • Free Teacher Resources
  • Educational Grants and Funding
  • Videos and User Manuals
  • MiniOne Teacher & Student Guides
  • Free Classroom Activities
  • MiniOne Systems Catalog 2023
  • Testimonials

MiniOne Systems

Experimental Design

experimental control in biology

The best way to test a hypothesis is through controlled, systematic, and reproducible experiments. Proper experimental design is critical for obtaining usable, reliable, and applicable data. There are two important components to be considered: variables and controls.

Variables are the conditions and components that are changeable and controllable during an experiment. There are both independent variables and dependent variables.

Independent variables are factors that can be intentionally changed and their impact can be measured on the dependent variable.

Dependent variables are the outcomes or what happens as a result of the independent variable(s).

There are typically many independent and dependent variables to consider when performing an experiment, but it is often best to alter only one independent variable at a time to accurately gauge the effect of that variable on the dependent variable. Multiple independent variables can be changed if you are interested determining a combined effect.

Controls are the components and conditions that are known and kept constant during an experiment. Controls are used for a point of reference and they are often safeguards against internal factors that may influence the outcome of an experiment. Different types of experiments may require different types of controls, depending on the testing procedures. The three main types of controls are positive, negative, and experimental controls.

A positive control is something known to produce a positive result and will often be included (especially for diagnostic tests) to ensure that a negative result is not due to experimental or reaction failure.

A negative control is something known to produce a negative result and will often be included to ensure that a positive result is truly positive and not due to contamination or other interference.

Experimental controls (or “control groups”) are used in controlled experiments to acquire baseline data. This baseline data can be compared to the experimental data to see the relative effect (if any) of the independent variable(s) on the dependent variable. This type of control is a parallel of the experiment, except no changes are made to any of the independent variables. Sometimes an experimental control is also a negative control, depending on the expected outcome and type of experiment. An experimental control can have an outcome similar to the experimental subject if the independent variable does not greatly impact it, whereas with a negative control, no outcome is expected at all.

Determining what types and how many controls to include in an experiment can affect the reliability and accuracy of your data and ultimately your conclusions.

Data Collection and Analysis

Once the experiment is designed, decide what type of data you need to collect and how you will collect it in order to evaluate your hypothesis. Consistency when making measurements and collecting data is important to ensure accuracy, precision, and ultimately repeatability of your experiment. When you perform any experiment, be sure to record all your findings, preferably in ink. Good record keeping, observations, and notes will help you make a more thorough and reliable analysis of your data and will give more credibility to your results.

Search the Site

Minione pcr system.

advertisement

Latest Blog Posts

  • Celebrating 10 Years of MiniOne
  • MiniOne Beyond the Classroom: Supporting the Conroe ISD Biotech Competition
  • MiniOne Partners with FL Dairy Council to Bring Hands-on Biotech to Palm Beach County Teachers
  • Want to Attend a Professional Development Convention? Download a Conference Justification Letter!
  • Biotechnology Classes At-Home Edition – MiraCosta

Privacy Overview

MiniOne Systems (Logo)

Get MiniOne Updates

Sign up for our email communications to get the latest updates from MiniOne Systems .

Your privacy is important and your information will never be shared.

Control Group vs Experimental Group

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

In a controlled experiment , scientists compare a control group, and an experimental group is identical in all respects except for one difference – experimental manipulation.

Differences

Unlike the experimental group, the control group is not exposed to the independent variable under investigation. So, it provides a baseline against which any changes in the experimental group can be compared.

Since experimental manipulation is the only difference between the experimental and control groups, we can be sure that any differences between the two are due to experimental manipulation rather than chance.

Almost all experimental studies are designed to include a control group and one or more experimental groups. In most cases, participants are randomly assigned to either a control or experimental group.

Because participants are randomly assigned to either group, we can assume that the groups are identical except for manipulating the independent variable in the experimental group.

It is important that every aspect of the experimental environment is the same and that the experimenters carry out the exact same procedures with both groups so researchers can confidently conclude that any differences between groups are actually due to the difference in treatments.

Control Group

A control group consists of participants who do not receive any experimental treatment. The control participants serve as a comparison group.

The control group is matched as closely as possible to the experimental group, including age, gender, social class, ethnicity, etc.

The difference between the control and experimental groups is that the control group is not exposed to the independent variable , which is thought to be the cause of the behavior being investigated.

Researchers will compare the individuals in the control group to those in the experimental group to isolate the independent variable and examine its impact.

The control group is important because it serves as a baseline, enabling researchers to see what impact changes to the independent variable produce and strengthening researchers’ ability to draw conclusions from a study.

Without the presence of a control group, a researcher cannot determine whether a particular treatment truly has an effect on an experimental group.

Control groups are critical to the scientific method as they help ensure the internal validity of a study.

Assume you want to test a new medication for ADHD . One group would receive the new medication, and the other group would receive a pill that looked exactly the same as the one that the others received, but it would be a placebo. The group that takes the placebo would be the control group.

Types of Control Groups

Positive control group.

  • A positive control group is an experimental control that will produce a known response or the desired effect.
  • A positive control is used to ensure a test’s success and confirm an experiment’s validity.
  • For example, when testing for a new medication, an already commercially available medication could serve as the positive control.

Negative Control Group

  • A negative control group is an experimental control that does not result in the desired outcome of the experiment.
  • A negative control is used to ensure that there is no response to the treatment and help identify the influence of external factors on the test.
  • An example of a negative control would be using a placebo when testing for a new medication.

Experimental Group

An experimental group consists of participants exposed to a particular manipulation of the independent variable. These are the participants who receive the treatment of interest.

Researchers will compare the responses of the experimental group to those of a control group to see if the independent variable impacted the participants.

An experiment must have at least one control group and one experimental group; however, a single experiment can include multiple experimental groups, which are all compared against the control group.

Having multiple experimental groups enables researchers to vary different levels of an experimental variable and compare the effects of these changes to the control group and among each other.

Assume you want to study to determine if listening to different types of music can help with focus while studying.

You randomly assign participants to one of three groups: one group that listens to music with lyrics, one group that listens to music without lyrics, and another group that listens to no music.

The group of participants listening to no music while studying is the control group, and the groups listening to music, whether with or without lyrics, are the two experimental groups.

Frequently Asked Questions

1. what is the difference between the control group and the experimental group in an experimental study.

Put simply; an experimental group is a group that receives the variable, or treatment, that the researchers are testing, whereas the control group does not. These two groups should be identical in all other aspects.

2. What is the purpose of a control group in an experiment

A control group is essential in experimental research because it:

Provides a baseline against which the effects of the manipulated variable (the independent variable) can be measured.

Helps to ensure that any changes observed in the experimental group are indeed due to the manipulation of the independent variable and not due to other extraneous or confounding factors.

Helps to account for the placebo effect, where participants’ beliefs about the treatment can influence their behavior or responses.

In essence, it increases the internal validity of the results and the confidence we can have in the conclusions.

3. Do experimental studies always need a control group?

Not all experiments require a control group, but a true “controlled experiment” does require at least one control group. For example, experiments that use a within-subjects design do not have a control group.

In  within-subjects designs , all participants experience every condition and are tested before and after being exposed to treatment.

These experimental designs tend to have weaker internal validity as it is more difficult for a researcher to be confident that the outcome was caused by the experimental treatment and not by a confounding variable.

4. Can a study include more than one control group?

Yes, studies can include multiple control groups. For example, if several distinct groups of subjects do not receive the treatment, these would be the control groups.

5. How is the control group treated differently from the experimental groups?

The control group and the experimental group(s) are treated identically except for one key difference: exposure to the independent variable, which is the factor being tested. The experimental group is subjected to the independent variable, whereas the control group is not.

This distinction allows researchers to measure the effect of the independent variable on the experimental group by comparing it to the control group, which serves as a baseline or standard.

Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9.

Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (2nd ed.). Wiley. ISBN 978-0-471-72756-9.

Print Friendly, PDF & Email

Ray and Stephanie Lane Computational Biology Department

School of computer science.

Red CB

What is Computational Biology?

Modern biology is in the middle of a paradigm shift….

Robert F. Murphy Ray and Stephanie Lane Professor of Computational Biology Emeritus

Computational biology   is the science that answers the question “How can we learn and use models of biological systems constructed from experimental measurements?”  These models may describe what biological tasks are carried out by particular nucleic acid or peptide sequences, which gene (or genes) when expressed produce a particular phenotype or behavior, what sequence of changes in gene or protein expression or localization lead to a particular disease, and how changes in cell organization influence cell behavior.   This field is sometimes referred to as   bioinformatics , but many scientists use the latter term to describe the field that answers the question “How can I efficiently store, annotate, search and compare information from biological measurements and observations?”  (This subject has been discussed previously by an early NIH task force report and by Raul Isea.)

A number of factors contribute to the confusion between the terms, including the fact that one of the top journals in computational biology is entitled “Bioinformatics” and that in German for example, computer science is referred to as “informatik” and computational biology is referred to as “bioinformatik.”  Some also feel that bioinformatics emphasizes the information flow in biology.  In any case, the two fields are closely linked, since “bioinformatics” systems typically are needed to provide data to “computational biology” systems that create models, and the results of those models are often returned for storage in “bioinformatics” databases.

Computational biology is a very broad discipline, in that it seeks to build models for diverse types of experimental data (e.g., concentrations, sequences, images, etc.) and biological systems (e.g., molecules, cells, tissues, organs, etc.), and that it uses methods from a wide range of mathematical and computational fields (e.g., complexity theory, algorithmics, machine learning, robotics, etc.).

Perhaps the most important task that computational biologists carry out (and that training in computational biology should equip prospective computational biologists to do) is to   frame   biomedical problems as computational problems.  This often means looking at a biological system in a new way, challenging current assumptions or theories about the relationships between parts of the system, or integrating different sources of information to make a more comprehensive model than had been attempted before.  In this context, it is worth noting that the primary goal need not be to increase human understanding of the system; even small biological systems can be sufficiently complex that scientists cannot fully comprehend or predict their properties.  Thus the goal can be the creation of the model itself; the model should account for as much currently available experimental data as possible.  Note that this does not mean that the model has been   proven , even if the model makes one or more correct predictions about new experiments.  With the exception of very restricted cases, it is not possible to   prove   that a model is correct, only to   disprove   it and then   improve   it by modifying it to incorporate the new results.

This view emphasizes the importance of machine learning for constructing models.  In most current machine learning applications, statistical and computational methods are used to construct models from large existing datasets and those models are used to process new data.  Examples include learning to classify spam emails, to enable fingerprint access to your phone, and to recognize human speech.  However, an increasing number of machine learning applications don’t stop learning after their initial training.  They can either learn from additional data as it becomes available, or, even choose what additional data they would like to learn from.  This last area is termed   active   machine learning, and it promises to play a very important role in biomedical research in the coming years.

Once the problem has been framed, the second major task of computational biologists begins.  This is to borrow, refine, or invent methods to solve the problem.  Current computational biology research can be divided into a number of broad areas, mainly based on the type of experimental data that is analyzed or modeled.  Among these are analysis of protein and nucleic acid structure and function, gene and protein sequence, evolutionary genomics and proteomics, population genomics, regulatory and metabolic networks, biomedical image analysis and modeling, gene-disease associations, and development and spread of disease.

  • Open access
  • Published: 16 August 2024

Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets

  • Maximilian Sprang   ORCID: orcid.org/0000-0002-8438-4747 1 ,
  • Jannik Möllmann 1 ,
  • Miguel A. Andrade-Navarro 1 &
  • Jean-Fred Fontaine   ORCID: orcid.org/0000-0002-1101-4091 1 , 2  

Genome Biology volume  25 , Article number:  222 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature.

Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance between groups of samples on the reproducibility of gene expression studies. High-quality imbalance is frequent (14 datasets; 35%), and hundreds of quality markers are present in more than 50% of the datasets. Enrichment analysis suggests common stress-driven effects among the low-quality samples and highlights a complementary role of transcription factors and miRNAs to regulate stress response. Preliminary ChIP-seq results show similar trends. Quality imbalance has an impact on the number of differential genes derived by comparing control to disease samples (the higher the imbalance, the higher the number of genes), on the proportion of quality markers in top differential genes (the higher the imbalance, the higher the proportion; up to 22%) and on the proportion of known disease genes in top differential genes (the higher the imbalance, the lower the proportion). We show that removing outliers based on their quality score improves the resulting downstream analysis.

Conclusions

Thanks to a stringent selection of well-designed datasets, we demonstrate that quality imbalance between groups of samples can significantly reduce the relevance of differential genes, consequently reducing reproducibility between studies. Appropriate experimental design and analysis methods can substantially reduce the problem.

Lack of reproducibility is a major concern in biomedical research, for example in clinical studies, neuroscience, or cancer biology [ 1 , 2 , 3 ], and also in other scientific fields such as artificial intelligence, drug discovery, or computer science [ 4 , 5 , 6 ]. There have been already many publications and initiatives to address the problem such as community and statistically driven guidelines for publication or data deposition in scientific repositories [ 7 , 8 , 9 , 10 , 11 , 12 ]. In functional genomics, complex sequencing technologies were instrumental in producing an unprecedented amount of data covering a great variety of topics relevant to life sciences. Community-derived experimental guidelines [ 13 ] and the latest computational and mathematical methods to produce and analyze results from those technologies did not solve the reproducibility problem. Gene expression studies based on RNA sequencing are a prominent example where reproducibility is limited by factors such as batch effect or quality differences between groups of biological samples. Although methods have been proposed to identify and correct batch effects [ 14 , 15 , 16 ], the impact of quality imbalance (QI) between groups of biological samples or patients is largely ignored in the biomedical literature. It is therefore critical to characterize the impact of quality differences on gene expression to enable studies that can be successfully reproduced.

Gene-expression analysis of clinical datasets is impacted by various factors such as sample extraction conditions [ 17 ], experimental protocols [ 18 ], batch effects, or non-homogeneous sample quality [ 19 ]. Methods correcting batch effect from the data have been used successfully to integrate data from different batches or from independent datasets although they should be used only when necessary, as they could remove biological signals from the data, and do not address the problem of factors highly confounded with study design (sample groups being compared) [ 16 , 20 ]. Unfortunately, design-quality problems are rarely considered in published studies and are often difficult or impossible to derive from open-access datasets commonly used for testing reproducibility [ 19 ]. Very few studies report comprehensive quality control results necessary to evaluate the quality of the samples [ 21 ]. Quality metrics are also not perfect and their usefulness can be highly specific to the experimental conditions including cell and assay types [ 12 ]. The poor reporting or documentation of methods, data, and analysis results is also a problem for reproducibility [ 22 ]. Although it is a common practice to filter out mitochondrial and ribosomal genes from gene-expression data [ 23 , 24 ], this is not always possible or desired, for example when studying topics related to mitochondria, respiration, or programmed cell death. In addition, there could be other genes equally related to sample quality that should be considered. In statistics, the impact of a confounding factor not considered, such as the sample quality, is known as the omitted variable bias [ 25 ]. Depending on the correlation of this factor to the dependent or independent variables, it can lead to hide, reverse, strengthen, or weaken an effect under study (e.g., the expression of a gene as a result of a variable condition). Taken together, given the complexity of biomedical experiments and the low reporting standard of the literature, the impact of variable sample quality on biomedical results and thus on reproducibility still needs to be characterized.

In order to increase our understanding about why reproducibility is low in clinically relevant RNA-seq results, we have studied 40 disease-related datasets. The public availability of many published datasets together with recent quality-related research in machine learning gives the opportunity to discover factors of reproducibility at the level of gene expression [ 26 ]. We studied the impact of quality imbalance on the number of differential genes associated with groups of disease and control patients in each dataset. We also searched if some genes could correlate with sample quality in the datasets. Finally, we evaluated the reproducibility potential of differential gene lists derived from different datasets depending on quality imbalance and showed preliminary results on ChIP-seq data.

In order to better understand the impact of low-quality samples in RNA-seq datasets on reproducibility, we have stringently selected 40 publicly available and clinically relevant human datasets comparing disease to control patient samples (Additional file 2 : Table S1). The stringency of this selection aimed at minimizing potential confounding factors that would prevent us from observing the impact of sample quality over gene expression. Accordingly, we chose groups of patients that were as homogeneous as possible within each dataset based on provided clinical information such as age range, gender ratio, and other clinical features when available. We then derived the quality of each sample as a probability of being of low quality from an accurate machine learning algorithm [ 26 ]. Important to our study was the definition of a quality imbalance (QI) index for each dataset which ranges from 0 to 1, where 0 means that quality is not confounded with the groups, and 1 means that quality is fully confounded with the groups (see the “ Methods ” section for details).

Dataset quality

From the 40 datasets, our stringent manual selection included a total of 1164 human samples. There was an average number of samples equal to 29.1 per dataset ranging from 8 to 96. Our selection defined two equally sized groups of samples per dataset as disease and control. Fourteen (35%) datasets had a high QI index above 0.30 (Fig.  1 A). Note that various statistical methods or tests provide comparable numbers of high-QI datasets, ranging from 9 to 15 (Additional file 1 ). In total, twenty-six diseases were covered (Fig.  1 B). Eleven (27.5%) datasets had few significant differential genes ( n  ≤ 50) (Fig.  1 E). The QI index moderately correlated positively with pair-ended library layout ( r  = 0.266), moderately correlated negatively with 5-year journal impact factors ( r  =  − 0.272), but did not correlate with paired dataset design ( r  = 0.03). Figure  1 C to F gives an overview of the library layout of the datasets, as well as whether datasets had paired control and disease samples, the number of differentially expressed genes, and the impact factors of the journals in which the data were published. We did a similar analysis for ChIP-seq data sets and found that 3 out of 10 datasets had a high QI index, indicating that such quality imbalance exists in other assays too (Additional file 3 : Table S2, Additional file 1 : Fig. S1; see the “ Methods ” section for details).

figure 1

Quality imbalance of the 40 datasets. Clinically relevant human datasets were selected for patient samples’ homogeneity in the comparison groups (control vs disease groups). A quality imbalance (QI) index ( x -axis) of each dataset ( y -axis). The QI index is calculated as the absolute correlation coefficient between the samples’ probabilities of being of low quality and their groups. If its QI index is above 0.3, a dataset is considered highly imbalanced (red bars). If it is less than 0.18, the QI index is considered low (blue bars). The number of significant differential genes is given as annotation to the bars. B stacked barplot for the number of samples ( y -axis) in each dataset. C number of datasets designed with sample pairing. D number of datasets sequenced with either single- or paired-end reads. E distribution of journal impact factors for the published articles related to the datasets

Impact of quality imbalance on differential gene analysis results

Analysis of data subsets of the same size.

Working with three of the largest datasets (GSE105130, GSE174330, and GSE54456) and the quality probabilities associated with their samples, we built several data subsets based on different selections of patients in the control and disease groups to represent low- and high-QIs. In a differential gene analysis, the larger the patient groups, the more powerful the statistical tests, and thus the more significant differential genes can be found. Therefore, we set the same number of samples ( n  = 20) to each data subset to focus our observations on the impact of QI on the number of differential genes derived by comparing the disease and control groups in each subset. The analysis of the subsets of the three selected datasets shows a clear linear relationship between QI and the number of differential genes ( R 2  = 0.57, 0.43, and 0.44, respectively): the higher the QI, the more the differential genes (Fig.  2 ). For those 3 datasets, an increase of the QI from 0 to 1 translates into an increase of 1222 differential genes on average (1160, 1720, and 785, respectively). This large variability might be due to a synergistic effect between quality and other confounding factors in some subsets that could have been created by the random selection of a limited number of patients per group, although patient characteristics are comparable between groups when considering full datasets.

figure 2

QI and differential genes in data subsets. For three large datasets (panels) in our study, we randomly sampled several smaller subsets (points) of 20 samples each to compare their number of differential genes to their respective QI indices in equally sized and sourced datasets (see the “ Methods ” section for details). This simulation allows us to isolate and observe the effect of a quality imbalance on the number of differential genes from the effect of any other confounding factors such as particular patient characteristics (e.g., age, gender, or ethnic group) or dataset size (number of samples). For each dataset, we can observe a positive correlation between the quality imbalance and the number of differential genes of its subsets. Gray areas indicate confidence intervals

Analysis of 40 full datasets of various sizes

The impact of the QI could also be observed on the 40 full datasets with different characteristics including other diseases and various numbers of samples (Fig.  3 and Additional file 1 : Fig. S2). Based on linear regressions, the number of differential genes derived using a false discovery rate cutoff (FDR < 0.05) increased four times faster with the dataset size for highly imbalanced datasets compared to more balanced datasets (slope = 114 vs 23.8, respectively; Fig.  3 A). When removing datasets designed with sample pairing from the analysis, we observed a similar difference (slope = 108 vs 23.5, respectively; Additional file 1 : Fig. S2A). Analyzed separately, a smaller number of paired-sample datasets showed a similar trend (Additional file 1 : Fig. S2C). We could also observe that deriving differential genes using not only an FDR cutoff but also a fold-change cutoff decreased the slope for the high-QI datasets and consequently considerably reduced the differences (Fig.  3 B and Additional file 1 : Fig. S2B and D).

figure 3

Impact of quality imbalance (QI) on the number of differential genes. On the scatter plots, points represent datasets colored by quality imbalance status: low (blue; QI index ≤ 0.3) or high (red; QI index > 0.3). The plotted datasets have each a minimum of 50 significant differential genes. X -axis indicates the number of samples and y -axis the number of differential genes in the datasets. Solid blue and red lines show linear regression results (confidence interval in gray). On panel A , the number of differential genes was derived by using a false discovery rate (FDR) cutoff in the differential analysis, while on panel B this number was derived by using both an FDR cutoff and a fold change cutoff. Panel A shows a faster increase of the number of differential genes in relation to the number of samples for high QI datasets, while on panel B this difference is reduced

Recurrence of quality-associated genes

In order to identify quality-associated genes occurring in several datasets (quality markers), we analyzed 13 datasets with the lowest QI indices (index ≤ 0.18; only one dataset per disease). We considered a gene to be a low- or a high-quality marker if its expression significantly correlated with the low- or high-quality of the samples, respectively.

We found a total of 7708 low-quality markers occurring in at least 2 (15%) out of 13 datasets (Fig.  4 A, Table  1 and Additional file 4 : Table S3). There were low-quality markers occurring in up to 10 (77%) datasets. The list of top low-quality markers was significantly enriched in targets of 48 transcription factors which could be themselves low- (e.g., snrnp70, thap1, psmb5) or high-quality markers (e.g., setd7, fxr1) (Fig.  5 A and Additional file 5 : Table S4). The list was also enriched in various molecular pathways, including the expected mitochondria-related pathways (e.g., cell respiration and oxidative phosphorylation) and ribosomal pathways, but also other pathways such as response to starvation, response to ultra-violet radiation, housekeeping genes (largely overlapping mitochondrial and ribosomal pathways) and various diseases such as influenza (not included in our dataset selection), neuro-degenerative and cancer diseases (Fig.  5 C).

figure 4

Top 25 markers of quality. A low-quality marker genes. B high-quality marker genes. Genes whose expression correlates positively or negatively with low sample quality in multiple datasets were considered low- or high-quality markers, respectively. Computations were done on 13 datasets with the lowest QI index values (QI index < 0.18) and each representing a different disease. Comprehensive gene lists, including more low-quality markers found in 8 datasets and more high-quality markers found in 6 datasets are provided as supplementary material (Additional file 4 : Table S3 and Additional file 6 : Table S5)

figure 5

Gene set enrichment analysis. Database annotations of the low- and high-quality markers were used to find related regulators (top regulators in panels A and B , respectively) and pathways (top pathways in panels C and D , respectively). In the regulatory enrichment analysis, we found a regulation of low-quality markers by transcription factors ( A ; n  = 50), while high-quality markers are mostly regulated by miRNAs ( B ; n  = 302; 89% miRNAs and 11% transcription factors). In the pathway enrichment analysis, low-quality markers were notably enriched in mitochondria-related and ribosomal pathways. The pathway enrichment analysis for high-quality markers found various regulators and diseases

We found a total of 5243 high-quality markers occurring in at least 2 (15%) out of 13 datasets (Fig.  4 B, Table  1 and Additional file 6 : Table S5). There were high-quality markers occurring in up to 7 (54%) datasets. The list of top high-quality markers was significantly enriched in targets of 306 regulators including 280 (91.5%) miRNAs (Fig.  5 B and Additional file 7 : Table S6). There were also 19 (6.2%) transcription factors but they were not represented in the top 100 regulators of the gene set enrichment analysis. Some of these transcription factors were observed as low- (e.g., kdm7a or tfeb) or high-quality markers (e.g., taf9b or znf184). The list was also enriched in various molecular pathways, including some diseases (e.g., skin carcinogenesis, bladder cancer, uveal melanoma, diabetic nephropathy) and other pathways (e.g., uv response, serum response) (Fig.  5 D).

Recurrence of quality-associated genes in multiple data types

Adapting the approach above, we derived quality marker genes using protein-DNA binding data from 10 human ChIP-seq datasets targeting the H3K27ac histone mark. When considering only genes that arise in more than 20% of datasets in each data type (RNA-seq or ChIP-seq), we see an overlap of low- and high-quality marker genes of 438 and 298, respectively (Additional file 8 : Table S7 and Additional file 10 : S9). For the molecular pathways related to those markers in each data type, we see an overlap of pathways for low- and high-quality marker genes of 5 and 11, respectively (Additional file 9 : Table S8 and Additional file 11 : S10).

Since in ChIP-seq analysis, mitochondrial genes are blacklisted [ 27 ], they cannot be found in these overlaps. However, we still find the related pathway for oxidative phosphorylation enriched in the ChIP-seq analysis for low-quality markers, and subsequently in the overlap with RNA-seq-derived pathways.

Impact of quality on reproducibility

As seen above, quality markers were identified in a substantial proportion of the datasets. Considering the lists of significant differentially expressed genes from those datasets, if the quality markers are also found in the lists, they will reduce the reproducibility between gene-expression studies.

The proportion of quality markers in the top 500 differential genes had a strong linear relationship with the QI index of the related datasets: the higher the QI index, the higher the proportion of quality markers in the differential genes (Fig.  6 A). The proportion increased approximately 2 times faster for unpaired datasets than for paired datasets This proportion was equal in average to 13% and could represent up to 22% of the differential genes (namely, 110 genes out of the top 500 differential genes).

figure 6

Quality and disease gene proportions in differential genes. To test the presence of quality markers or known disease genes in lists of differentially expressed genes (DEGs), we selected datasets with 50 samples or less, with at least 500 differential genes, and about a disease with at least 50 known genes. Only the top 500 DEGs per dataset and the top 50 known disease genes per disease were used in this analysis. For those datasets (black points on the plots), the proportion of potential quality markers ( A /left) or known disease genes ( B /right) in the DEGs on the y -axis is compared to the dataset QI index on the x -axis. Sample pairing design of the datasets is detailed (paired or unpaired design)

For the same datasets, we also compared the proportion of the top 50 known disease genes in the top 500 differential genes to the QI index (Fig.  6 B). There was a negative linear relationship: the higher the QI index, the lower the proportion of known disease genes in the differential genes. The proportion decreased approximately 2 times faster for unpaired datasets than for paired datasets. In 10 selected diseases (with more than 300 associated disease genes), potential markers for low- and high-quality constitute up to 19% and 9% of the 300 top-associated disease genes, respectively (Additional file 12 : Table S11). This proportion was significant or marginally significant for high-quality markers in half of the diseases (e.g., Alzheimer’s disease, colorectal neoplasms, amyotrophic lateral sclerosis) and significant for the low-quality markers in two datasets: Parkinson’s disease and amyotrophic lateral sclerosis. Notably, based on the literature analysis, quality markers cannot be easily filtered out from lists of disease genes as they could be highly ranked as known disease genes (Additional file 1 : Fig. S3).

Quality imbalance mitigation

The low- or high-quality information of the samples provided by machine learning was used either as a confounding factor in the differential gene analysis or to remove quality outlier samples from the datasets before the analysis, to see if it could impact the results of the downstream pathways enrichment analysis (Additional file 13 : Table S12).

When using the quality information as a confounding factor in the differential gene analysis, we could observe a reduction of low-quality marker pathways in the results of the downstream gene set enrichment analysis. However, the results did not change for most datasets. When using the quality information to filter out outlier samples, we observed a strong decrease in low-quality marker pathways for almost all datasets. A combination of both methods did not further impact the outcome of the enrichment analysis.

While Next Generation Sequencing has revolutionized biomedical science with an unprecedented amount of data and novel clinically relevant applications, reproducibility between research results is still limited even in well-designed studies. Although reasons for low reproducibility in poorly designed studies may be easily identified (e.g., imbalanced sex ratio or large age difference between sample groups), they are not obvious for well-designed studies. We hypothesized that differences between the quality of the samples within a study may bias the results and negatively impact reproducibility. Therefore, we studied 40 well-designed and clinically relevant datasets in order to evaluate and quantify the impact of sample quality differences on significant differential genes (disease vs control samples). We found that the expression profile of many genes correlated with quality, we classified those genes as markers of quality, and we were able to relate them to molecular regulators. Finally, we quantified the negative impact of QI between sample groups on the relevance of top differential genes.

Although the GEO database has only minimal requirements for standardized metadata [ 28 ], it is still the largest repository for gene expression datasets and contains a great variety of clinically relevant datasets. By exploring the GEO database to find RNA-seq datasets, we have seen a large majority of poorly designed datasets and only a minority of datasets that could meet our inclusion standards. Patient sample metadata was often inconsistently distributed across the three data and information sources considered (scientific journal article, GEO, and SRA databases) and in different locations within each source (main text and supplementary files in journals, web pages, and downloaded metadata files in GEO and SRA). Notably, the GEO and SRA web pages often contained considerably less information than the corresponding downloaded metadata files (GEOSeries and SraRunTable files, respectively). Yet, the metadata was, in general, not sufficient to identify common confounding factors such as age and gender. Samples metadata could also be provided as summary tables, including age and gender balance in percentages within each comparison group but with no possibility to trace the information back at the sample level [ 29 , 30 , 31 ]. As another example, batch effect or sample pairing could be recognized in the journal article, but related information could not be found or was unclear to identify the corresponding samples [ 32 , 33 , 34 , 35 , 36 , 37 ]. Nevertheless, we used the available information to include 40 well-designed datasets according to our criteria. The latter were also met thanks to a sub-selection of samples in several datasets to ensure homogeneity between the comparison groups (controls vs patient samples with similar age ranges and gender ratios). Because good experimental designs minimize the effect of common confounders (including batch) [ 38 ], we could focus our observations on the effect of sample quality on gene expression.

Sample quality was rarely mentioned in the dataset articles or databases and specific sample-level quality information was provided only for a very few datasets using the RNA integrity number [ 39 , 40 ]. Definitively useful to filter out samples of the lowest quality, the RNA integrity number is not commonly used for a fine classification by quality [ 41 ]. This lack of reporting may indicate the low importance experimentalists place on further quality-related analyses after data generation. Thanks to a finer classification of sample quality using machine learning models, we observed that 35% of well-designed gene-expression datasets had a high QI between comparison groups (QI index > 0.3). Importantly, this result derived from 40 independent, well-designed datasets free from other confounding factors proves that a substantial proportion of the published datasets in the biomedical literature is flawed by quality problems. A comparable percentage was found in ChIP-seq experiments. It should be noted that when running the analysis with more robust and non-parametric tests like Spearman’s correlation and the median-based central tendency difference [ 42 ], this ratio stayed rather stable at around 30%. Unfortunately, we found many more poorly designed or poorly described datasets in the repositories for which additional confounding factors such as imbalances in age, gender, weight, or active medication should be considered and which we could not include, since they were not stratifiable. Interestingly, there was only a moderate negative correlation between QI index and journal impact factor.

The impact of those quality differences between sample groups has been mainly overlooked in the literature (Fig.  1 ). The number of significant differential genes is expected to increase with the number of samples (definition of statistical tests), and it has been shown that this relation could be linear in gene expression studies [ 43 ]. Irrespective of the number of samples, we found that QI has a direct impact on the number of differential genes: the higher the QI, the more the differential genes (Figs. 2 and 3 ). Between a perfectly balanced dataset and a perfectly imbalanced dataset, the number of differential genes could increase by 1222 genes on average. Selecting differential genes using both FDR and fold change cutoffs as done by default in our study seems to considerably reduce this difference in comparison to using FDR cutoffs only (Fig.  3 ), confirming a common usage in the literature. These results partly explain the difficulty of studies based on current sequencing technologies to derive relevant small gene-expression changes.

Thanks to the comparison of gene expression profiles to sample quality across a sub-selection of the less quality-imbalanced datasets, we found many markers for low or high sample quality that can neither be associated with those diseases nor with other common confounding factors (Fig.  4 ). A number of markers could be identified in a large proportion of the studied datasets (up to 77%). We found enrichment of (i) targets of transcription factors (e.g., psmb5, gtf2a2) (Fig.  5 A) and (ii) mitochondria-related and ribosomal pathways [ 23 , 44 ] among low-quality markers (Fig.  5 C). This is consistent with the activation of additional regions of the gene regulatory network in response to stress in low-quality samples [ 45 , 46 ] and confirms that those marker genes are intrinsically linked to quality. Interestingly, the enrichment among high-quality markers (genes with lower expression in low-quality samples) indicates enrichment of miRNA targets (e.g., mir659, mir142) and targets of various regulators (Fig.  5 B and D , respectively). This result could reflect the regulatory role of miRNAs activated in the low-quality samples in regulating stress response, cell repair, or cell proliferation mechanisms [ 47 , 48 , 49 , 50 , 51 ], resulting in decreased gene expression of particular genes including various transcription factors in the low-quality samples. Our results across datasets and diseases identify molecular regulators (transcription factors and miRNAs) that are the most sensitive to quality changes. Research fields such as System Biology could use this information to calibrate regulatory network models which are often sensitive to small changes in their parameters [ 52 ].

Given the high proportion of datasets in the literature that are highly quality imbalanced, it is likely that many differential genes are associated with quality rather than disease. Indeed, on the one hand, we found that the proportion of quality markers in the differential gene lists (up to 22%) positively correlated with the QI index of the dataset. Many of those recurring genes could also be found to be related to QI in ChIP-seq. For example, the ChIP-seq low-quality markers were still enriched for the oxidative phosphorylation pathway, despite mitochondrial genes being removed from the analysis through blacklists. This suggests, in this context, that genes that are related to the mitochondrial environment or mitochondrial pathways hold similar information about quality status as the mitochondrial RNA transcripts themselves. On the other hand, the proportion of known disease genes negatively correlated with the QI index (Fig.  6 ). Interestingly, those correlations were lower for paired-sample datasets. Provided that pairs of samples are not confounded with quality differences or batch processing, designing large datasets with sample pairing and applying appropriate statistics (paired statistical tests) would be the optimal way to maximize the relevance and reproducibility of RNA-seq results. We also found that known disease genes derived from the literature could be significantly enriched in quality markers (up to 19%). Although it questions the validity of the literature, we must note that RNA integrity could be impacted by a disease state, such as in some types of cancer [ 53 ]. Therefore, some genes could logically be both quality and disease markers. However, our analyses highlight the possibility that, even if recurrently observed in different experiments in the literature, some genes deemed as disease markers may actually indicate quality bias. Further investigations paired with wet-lab experiments would be required to identify those genes more precisely.

Taken together, we have demonstrated that QI between sample groups impairs the reproducibility of clinically relevant RNA-seq results. In the future, it would be interesting to reproduce our analyses with many more datasets to test statistical associations with additional clinical parameters. We already mentioned an association with tumor stage [ 53 ] but other parameters may be relevant for other diseases. The collection of studied datasets could also contain a more comprehensive selection of diseases or also non-disease datasets. It included many cancer-like diseases that could have influenced some of our results. We tried to minimize such an impact by carefully selecting the datasets involved based on the analysis. For example, quality markers were derived on a subset of the datasets with the lowest QI indices and each representing a different disease. It would be also interesting to study other sequencing data types such as ATAC-seq or ChIP-seq in more detail. Our preliminary results show the same proportion of high QI in clinically relevant human ChIP-seq datasets. Those preliminary results should be further investigated and compared to the literature about non-biologically relevant binding sites [ 27 , 54 , 55 , 56 ]. We could also show that removing quality outliers will decrease the number of pathways of low-quality marker genes in almost all cases. We suggest using the seqQscorer tool as not only a quality control tool, but an additional possibility to search for outliers in the data, as these will not always coincide with outliers in PCA-projection [ 57 ].

In conclusion, a difference of quality between sample groups of clinically relevant gene-expression datasets impairs reproducibility across studies thanks to a compendium of genes masking the relevance of true disease genes in statistical results. A large proportion of the published datasets present a substantial imbalance between the quality of the sample groups, their results are thus put into question. From individual studies showing sample quality as a major confounder in a limited number of experimental conditions [ 57 , 58 , 59 ], the field of genome-wide data analysis would benefit from studies accurately comparing many possible confounders and their impact on gene expression and reproducibility, although the latter is expected to be low for some diseases [ 59 ] and some quality markers would likely be specific to experimental conditions [ 12 ]. In this way, we will be able to optimize study design, moving away from generic blacklists to more rational filters of gene-based results, specific to data type and study.

All the statistical methods, scripts and software used in this study are described in this section.

RNA-seq datasets

Dataset metadata was derived by gathering information of the following three sources: the metadata from the GEO database (Web pages and GEO Series file), the metadata from the SRA database which hosts the raw data files (SraRunTable.csv files), and the corresponding published article (PubMed Central or publisher’s web site: main article and supplementary files if available). Starting from the GEO database, we searched those metadata sources to get 40 well-designed, publicly available, and clinically relevant RNA-seq datasets with the following criteria:

Assay: either single-ended or pair-ended Illumina sequencing technology.

Samples: human primary cells or tissues from a disease and its control group. Duplicate samples were not used.

Design: At least four samples per group. All samples from a same batch if documented. Paired or unpaired samples.

Sample groups: Samples were selected to balance the age range and gender ratio per group. If possible, only samples of the same gender were selected (either male or female). If documented, other clinical factors were used to balance the sample groups such as the BMI index, ethnicity, or disease-relevant mutations (no mutations if possible).

If samples were paired and if pairing identifiers were available at the sample level, we used the information to run appropriate analysis methods.

Quality imbalance (QI) index and quality marker genes

For RNA-seq or ChIP-seq data, sample quality was evaluated by the seqQscorer machine learning tool [ 26 ], which returns a probability of low quality per sample: P low . For each dataset, we first derived a P low value for each sample. Additional file 14 : Table S13 contains the list of all samples and their quality features and low-quality probabilities P low as used and returned by seqQscorer, respectively. Additional file 1 : Fig. S7A shows a heatmap of FastQC’s ordinal output (0 = fail, 1 = warn, 2 = pass) and the mapping metrics of Bowtie2 in percent. Additional file 1 : Fig. S7B and S7C show the distribution of uniquely mapped reads and the sequence duplication levels over bins of P low , two of the most relevant features for the classification of the samples by quality. Using these low-quality probabilities P low , we derived a quality imbalance (QI) index for each dataset to quantify how much the sample quality is confounded with the sample group (control or disease). The QI index is equal to the absolute value of Pearson’s correlation coefficient between P low values and the sample group numerical codes (0 for control and 1 for disease, or the other way around); this is equivalent to a point-biserial correlation usually used for correlations of numerical and dichotomous variables [ 60 ]. The QI index will be equal to 0 if the quality is not confounded with the group, and it will be equal to 1 if the quality is fully confounded with the group (e.g., all disease samples have lower quality than control samples, or vice versa). Additional file 1 : Fig. S8 shows exemplary datasets and their distribution of samples over P low . After visual inspection of the sample low-quality probabilities within the datasets, a QI index greater than 0.30 was considered high, and a QI index less than 0.18 was considered low. Those cutoffs were also defined to create two groups of RNA-seq datasets not too small for analytical purposes: there were 18 low-QI datasets and 14 high-QI datasets.

With RNA-seq data, quality marker genes are genes whose expression strongly correlates with P low values independently in multiple datasets. For quality markers, a Pearson’s correlation coefficient with an absolute value greater than 0.4 was considered strongly correlated. A gene will be a low-quality marker if its expression significantly and positively correlates with P low in multiple datasets. A gene will be a high-quality marker if its expression significantly and negatively correlates with P low in multiple datasets.

With ChIP-seq data, quality marker bins are bins whose enrichment values (see definition below) correlate with P low values across samples independently in multiple datasets (within each dataset, only bins with peaks in at least 3 samples were considered for correlations). For quality marker bins, a Pearson's correlation coefficient with an absolute value greater than 0.3 was considered significant. We used a less stringent cutoff as the binning and subsequent annotation of bins to the nearest gene introduces uncertainty in the gene-to-quality correlation. A bin will be a low-quality marker bin if its peak enrichment values significantly and positively correlate with P low in multiple datasets. A bin will be a high-quality marker bin if its peak enrichment values significantly and negatively correlate with P low in multiple datasets.

Alternative metrics to define the QI index

As a first alternative, the QI index of a dataset was defined by the absolute value of Spearman’s correlation coefficient between P low values of the samples and the sample group numerical codes (0 for control and 1 for disease, or the other way around). This method had the advantage of producing a QI index also ranging from 0 to 1, thus we used the same cutoff values to define the low- and high-QI groups of datasets (0.18 and 0.3, respectively). As a second alternative, the QI index of a dataset was defined by the central tendency difference, CTDiff, as defined by Lötsch and Ultsch [ 42 ]. Shortly, the CTDiff of a dataset is equal to the absolute difference of the median P low values of the control and the disease groups of samples divided by the expected value of absolute inner differences in the dataset. This metric provides results from 0 but it has no upper limit. After manual review, we set the cutoff values to define the low- and high-QI groups of datasets to 0.5 and 1, respectively, setting a much more stringent cut-off than with the two correlation metrics we used.

Simulated RNA-seq subsets

To study the impact of quality imbalance in similar experimental conditions, we created different subsets of 20 samples from 3 of the largest datasets in our selection (GSE105130, GSE174330, and GSE54456). The 3 datasets were selected to have at least 50 samples each and a broad distribution of P low values associated with their samples. Each subset created was composed of 10 samples in the control group and 10 samples in the disease group from the same dataset. The 10 samples of a given group (control or disease group) were all randomly selected either from the 15 top-quality samples (lowest P low values) or from the 15 bottom-quality samples (highest P low values) of the same group in the source dataset. In a first iteration, for each dataset, we sampled 2 control and 2 disease groups of 10 samples each and combined them by pairs to build 4 data subsets as follows: bottom-quality control vs bottom-quality disease samples, bottom-quality control vs top-quality disease samples, top-quality control vs bottom-quality disease samples, and top-quality control vs top-quality disease samples. We performed then a total of 3 iterations to produce 12 subsets of different QI indices for each source dataset. It was not possible to always cover the theoretical range of P low values from 0 to 1, as the groups sometimes had an intrinsic QI bias.

Processing and analysis of RNA-seq data

The RNA-seq data was downloaded from the SRA database (single-ended reads: 10 M reads downloaded and 1 M randomly selected for analysis; pair-ended reads: 10 M read pairs downloaded and 1 M randomly selected for analysis). Sequencing reads were mapped to the transcriptome using Salmon v1.6.0 (index with decoy; quantification parameters: –seqBias –gcBias –posBias –reduceGCMemory; transcriptome GRCh38.101). Sample quality control was performed using Picard v2.23.6 and MultiQC v1.9 [ 61 , 62 ]. Data manipulation was done using Samtools v1.9, seqtk v1.3, bedtools v2.29.2 [ 63 , 64 , 65 ]. Differential genes were derived using deseq2 v1.22.1 [ 66 ]. Except if explicitly written, significance was defined at an adjusted p -value < 0.05 and the absolute value of log2 fold change > 1 ( p -values adjustment using the Benjamini–Hochberg method). Gene set enrichment analysis was done using the function fora from the fgsea package v1.20.0 [ 67 ] and gene annotations from MsigDB v7.4 [ 68 , 69 ] and GS2D [ 70 ].

Processing and analysis of ChIP-seq data

We selected 10 human ChIP-seq datasets from the GEO database that were targeting the H3K27ac histone mark and had a design containing a healthy group and a disease group. The ChIP-seq data was aligned to the genome using bowtie2 v2.3.5 and Samtools v1.9 and significant peaks were called using MACS2 v2.2.7 (narrow peak mode; adjusted p -value < 0.05) [ 71 ]. Peaks were assigned to genomic bins of length 500 bp with bedtools v2.29.2 (function makewindows). Bedtools functions intersect, groupby, and coverage were used to count the number of peaks per bin and generate min, max, and mean peak enrichment values for each bin.

For the gene set enrichment analysis of the ChIP-seq data, the KEGG subset of canonical pathways included in the MsigDB database version 7.4 was used along with the human reference genome hg38 (fgsea function from R package fgsea with parameters minSize = 15, maxSize = 500 and scoreType = “pos”).

Mitigating impact of quality imbalances

When using the quality information as confounding factor in the differential gene analysis, we used 1- P low as a continuous covariate in DESeq2 to give the samples with a higher quality the higher covariate value. This should reflect that the low-quality marker genes we found were expressed higher in the high P low samples and the model should weight those samples lower.

When using the quality information to filter out outlier samples of a given dataset, outlier samples were defined as samples with P low values beyond 1.5 times the interquartile range above the third quartile or below the first quartile of P low values within the dataset (function is_outlier() from the rstatix package [ 72 ]). The impact of those 2 mitigation methods was evaluated for each dataset on the lists of pathways from the gene set enrichment analysis for the differential genes (differential pathways) and the low-quality markers (low-quality pathways). We first calculated the percentage of low-quality pathways in the differential pathways when using a mitigation method or not. We compared then the mitigation methods only for datasets for which the pathways overlap was at least equal to 1 pathway. A mitigation method was deemed to have a positive or a negative impact if the overlap decreased or increased by at least 15%, respectively.

Availability of data and materials

All the data used in this study is publicly available at the GEO database. GEO identifiers of the ChIP-seq datasets: GSE104404, GSE107734, GSE112342, GSE120738, GSE124070, GSE126571, GSE128242, GSE139377, GSE153875, GSE74230. GEO identifiers of the RNA-seq datasets: GSE100925, GSE105130, GSE108643, GSE114564, GSE116250, GSE117875, GSE119834, GSE123496, GSE125583, GSE147339, GSE148355, GSE159851, GSE25599, GSE54456, GSE73189, GSE74697, GSE76220, GSE85567, GSE101432, GSE117970, GSE133039, GSE135036, GSE136630, GSE144269, GSE147352, GSE148036, GSE151243, GSE151282, GSE155800, GSE164158, GSE164213, GSE165595, GSE173078, GSE174330, GSE179397, GSE179448, GSE182440, GSE77314, GSE79362 and GSE82177.

Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116(1):116–26.

Article   CAS   PubMed   Google Scholar  

Gilmore RO, Diaz MT, Wyble BA, Yarkoni T. Progress toward openness, transparency, and reproducibility in cognitive neuroscience. Ann N Y Acad Sci. 2017;1396(1):5–18.

Article   PubMed   PubMed Central   Google Scholar  

Errington TM, Iorns E, Gunn W, Tan FE, Lomax J, Nosek BA. An open investigation of the reproducibility of cancer biology research. Elife. 2014;10(3):e04333.

Article   Google Scholar  

Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712.

Hutson M. Artificial intelligence faces reproducibility crisis. Science. 2018;359(6377):725–6.

Article   PubMed   Google Scholar  

Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, et al. Enhancing reproducibility for computational methods. Science. 2016;354(6317):1240–1.

Marcial LH, Hemminger BM. Scientific data repositories on the Web: An initial survey. J Am Soc Inform Sci Technol. 2010;61(10):2029–48.

Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3: 160018.

Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29(4):365–71.

Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41(Database issue):D991-5.

CAS   PubMed   Google Scholar  

Tonzani S, Fiorani S. The STAR Methods way towards reproducibility and open science. iScience. 2021;24(4).

Sprang M, Krüger M, Andrade-Navarro MA, Fontaine JF. Statistical guidelines for quality control of next-generation sequencing techniques. Life Sci Alliance. 2021;4(11):e202101113.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Consortium EP. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306(5696):636–40.

Jacob L, Gagnon-Bartsch JA, Speed TP. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Biostatistics. 2016;17(1):16–28.

Leek JT. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 2014;42(21): e161.

Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom Bioinform. 2020;2(3):lqaa078.

Murkin JT, Amos HE, Brough DW, Turley KD. In Silico Modeling Demonstrates that User Variability During Tumor Measurement Can Affect In Vivo Therapeutic Efficacy Outcomes. Cancer Inform. 2022;21:11769351221139256.

Chao HP, Chen Y, Takata Y, Tomida MW, Lin K, Kirk JS, et al. Systematic evaluation of RNA-Seq preparation protocol performance. BMC Genomics. 2019;20(1):571.

Simeon-Dubach D, Perren A. Better provenance for biobank samples. Nature. 2011;475(7357):454–5.

Soneson C, Gerster S, Delorenzi M. Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation. PLoS ONE. 2014;9(6): e100335.

Hamilton DG, Page MJ, Finch S, Everitt S, Fidler F. How often do cancer researchers make their data and code available and what factors are associated with sharing? BMC Med. 2022;20(1):438.

Simeon-Dubach D, Burt AD, Hall PA. Quality really matters: the need to improve specimen quality in biomedical research. J Pathol. 2012;228(4):431–3.

Subramanian A, Alperovich M, Yang Y, Li B. Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics. Genome Biol. 2022;23(1):267.

10x Genomics®. CG000130 Rev A Technical Note – Removal of Dead Cells from Single Cell Suspensions Improves Performance for 10xGenomics® Single Cell Applications. 2019. https://www.10xgenomics.com/support/single-cell-gene-expression/documentation/steps/sample-prep/removal-of-dead-cells-from-single-cell-suspensions-improves-performance-for-10-x-genomics-r-single-cell-applications . Accessed 01 Jun 2023.

Wilms R, Mäthner E, Winnen L, Lanwehr R. Omitted variable bias: A threat to estimating causal relationships. Methods in Psychology. 2021;5: 100075.

Albrecht S, Sprang M, Andrade-Navarro MA, Fontaine JF. seqQscorer: automated quality control of next-generation sequencing data using machine learning. Genome Biol. 2021;22(1):75.

Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354.

Wang Z, Lachmann A, Ma’ayan A. Mining data and metadata from the gene expression omnibus. Biophys Rev. 2019;11(1):103–10.

Nicodemus-Johnson J, Myers RA, Sakabe NJ, Sobreira DR, Hogarth DK, Naureckas ET, et al. DNA methylation in lung cells is associated with asthma endotypes and genetic risk. JCI Insight. 2016;1(20):e90151.

Li B, Tsoi LC, Swindell WR, Gudjonsson JE, Tejasvi T, Johnston A, et al. Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms. J Invest Dermatol. 2014;134(7):1828–38.

Jin Y, Lee WY, Toh ST, Tennakoon C, Toh HC, Chow PK, et al. Comprehensive analysis of transcriptome profiles in hepatocellular carcinoma. J Transl Med. 2019;17(1):273.

Cassetta L, Fragkogianni S, Sims AH, Swierczak A, Forrester LM, Zhang H, et al. Human Tumor-Associated Macrophage and Monocyte Transcriptional Landscapes Reveal Cancer-Specific Reprogramming, Biomarkers, and Therapeutic Targets. Cancer Cell. 2019;35(4):588-602 e10.

Bondar G, Togashi R, Cadeiras M, Schaenman J, Cheng RK, Masukawa L, et al. Association between preoperative peripheral blood mononuclear cell gene expression profiles, early postoperative organ function recovery potential and long-term survival in advanced heart failure patients undergoing mechanical circulatory support. PLoS ONE. 2017;12(12):e0189420.

Garrido-Martin EM, Mellows TWP, Clarke J, Ganesan AP, Wood O, Cazaly A. M1hot tumor-associated macrophages boost tissue-resident memory T cells infiltration and survival in human lung cancer. J Immunother Cancer. 2020;8(2):e000778.

Suppli MP, Rigbolt KTG, Veidal SS, Heeboll S, Eriksen PL, Demant M, et al. Hepatic transcriptome signatures in patients with varying degrees of nonalcoholic fatty liver disease compared with healthy normal-weight individuals. Am J Physiol Gastrointest Liver Physiol. 2019;316(4):G462–72.

Wang Y, Tatakis DN. Human gingiva transcriptome during wound healing. J Clin Periodontol. 2017;44(4):394–402.

Kim SK, Kim SY, Kim JH, Roh SA, Cho DH, Kim YS, et al. A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol Oncol. 2014;8(8):1653–66.

Auer PL, Doerge RW. Statistical design and analysis of RNA sequencing data. Genetics. 2010;185(2):405–16.

Li P, Ensink E, Lang S, Marshall L, Schilthuis M, Lamp J, et al. Hemispheric asymmetry in the human brain and in Parkinson’s disease is linked to divergent epigenetic patterns in neurons. Genome Biol. 2020;21(1):61.

Lim Y, Beane-Ebel JE, Tanaka Y, Ning B, Husted CR, Henderson DC, et al. Exploration of alcohol use disorder-associated brain miRNA-mRNA regulatory networks. Transl Psychiatry. 2021;11(1):504.

Wang L, Nie J, Sicotte H, Li Y, Eckel-Passow JE, Dasari S, et al. Measure transcript integrity using RNA-seq data. BMC Bioinformatics. 2016;17:58.

Lotsch J, Ultsch A. A non-parametric effect-size measure capturing changes in central tendency and data distribution shape. PLoS ONE. 2020;15(9):e0239623.

Boer JM, Huber WK, Sultmann H, Wilmer F, von Heydebreck A, Haas S, et al. Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Genome Res. 2001;11(11):1861–70.

Marquez-Jurado S, Diaz-Colunga J, das Neves RP, Martinez-Lorente A, Almazan F, Guantes R, et al. Mitochondrial levels determine variability in cell death by modulating apoptotic gene expression. Nat Commun. 2018;9(1):389.

Vihervaara A, Duarte FM, Lis JT. Molecular mechanisms driving transcriptional stress responses. Nat Rev Genet. 2018;19(6):385–97.

Zhang Q, Andersen ME. Dose response relationship in anti-stress gene regulatory networks. PLoS Comput Biol. 2007;3(3): e24.

Herrera JA, Schwartz MA. MicroRNAs in Mechanical Homeostasis. Cold Spring Harb Perspect Med. 2022;12(8):a041220.

Chiarella E, Aloisio A, Scicchitano S, Bond HM, Mesuraca M. Regulatory Role of microRNAs Targeting the Transcription Co-Factor ZNF521 in Normal Tissues and Cancers. Int J Mol Sci. 2021;22(16):8461.

Jie M, Feng T, Huang W, Zhang M, Feng Y, Jiang H, Wen Z. Subcellular Localization of miRNAs and Implications in Cellular Homeostasis. Genes (Basel). 2021;12(6):856.

Valeri N, Gasparini P, Fabbri M, Braconi C, Veronese A, Lovat F, et al. Modulation of mismatch repair and genomic stability by miR-155. Proc Natl Acad Sci U S A. 2010;107(15):6982–7.

Babar IA, Slack FJ, Weidhaas JB. miRNA modulation of the cellular stress response. Future Oncol. 2008;4(2):289–98.

Macneil LT, Walhout AJ. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res. 2011;21(5):645–57.

Zheng XH, Zhou T, Li XZ, Zhang PF, Jia WH. Banking of Tumor Tissues: Effect of Preanalytical Variables in the Phase of Pre- and Postacquisition on RNA Integrity. Biopreserv Biobank. 2023;21(1):56–64.

Andreani T, Albrecht S, Fontaine JF, Andrade-Navarro MA. Computational identification of cell-specific variable regions in ChIP-seq data. Nucleic Acids Res. 2020;48(9):e53.

Wreczycka K, Franke V, Uyar B, Wurmus R, Bulut S, Tursun B, et al. HOT or not: examining the basis of high-occupancy target regions. Nucleic Acids Res. 2019;47(11):5735–45.

Jain D, Baldi S, Zabel A, Straub T, Becker PB. Active promoters give rise to false positive “Phantom Peaks” in ChIP-seq experiments. Nucleic Acids Res. 2015;43(14):6959–68.

Sprang M, Andrade-Navarro MA, Fontaine JF. Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality. BMC Bioinformatics. 2022;23(Suppl 6):279.

Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 2016;17:29.

Hoffman GE, Jaffe AE, Gandal MJ, Collado-Torres L, Sieberts SK, Devlin B, et al. Comment on: What genes are differentially expressed in individuals with schizophrenia? A systematic review Mol Psychiatry. 2023;28(2):523–5.

MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychol Methods. 2002;7(1):19–40.

Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8.

Picard toolkit. https://broadinstitute.github.io/picard/ : Broad Institute; 2019.

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2):giab008.

Li H. seqtk. https://github.com/lh3/seqtk 2023.

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. bioRxiv 060012. https://doi.org/10.1101/060012

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.

Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40.

Andrade-Navarro MA, Fontaine JF. Gene set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data. Genomics and Computational Biology. 2016;2(1):e33.

Liu T. Use model-based Analysis of ChIP-Seq (MACS) to analyze short reads generated by sequencing protein-DNA interactions in embryonic stem cells. Methods Mol Biol. 2014;1150:81–95.

Kassambara A. rstatix. https://rpkgs.datanovia.com/rstatix/ 2023.

Download references

Acknowledgements

We thank members of the Computational Biology and Data Mining Group at the Johannes-Gutenberg University in Mainz and Dr. Benno Kleinhenz of the Central Institute for Decision Support Systems in Crop Protection for their kind support.

Peer review information

Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Review history

The review history is available as Additional file  15 .

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany

Maximilian Sprang, Jannik Möllmann, Miguel A. Andrade-Navarro & Jean-Fred Fontaine

Central Institute for Decision Support Systems in Crop Protection (ZEPP), Rüdesheimer Str. 60-68, Bad Kreuznach, 55545, Germany

Jean-Fred Fontaine

You can also search for this author in PubMed   Google Scholar

Contributions

MS designed and performed experiments, curated datasets, and analyzed the results. JM processed and analyzed ChIP-seq data. MAAN analyzed the results. JFF conceived the study, designed and performed the experiments, performed the stringent datasets curation, analyzed the results, and drafted the manuscript. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Miguel A. Andrade-Navarro .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interest for this study.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13059_2024_3331_moesm1_esm.pdf.

Additional file 1. Supplementary information and figures. This file contains a section named “Quality Imbalance based on other metrics” and the supplementary figures (Fig. S1 to Fig. S8).

Additional file 2: Table S1. Datasets Metadata RNA-seq.

Additional file 3: table s2. datasets metadata chip-seq., 13059_2024_3331_moesm4_esm.tsv.

Additional file 4: Table S3. Low-Quality Marker Genes. Column N: number of datasets out of 13 in which the gene expression positively correlated with a machine-learning derived low-quality probability. Each of the 13 datasets represented a different disease and had a low-quality imbalance between control and patient groups of samples.

13059_2024_3331_MOESM5_ESM.tsv

Additional file 5: Table S4. Regulators of the Low-Quality Marker Genes. This table shows statistically significant molecular regulators of the top 500 low-quality marker genes derived by an enrichment analysis (transcription factors or miRNAs). P-value was derived from a Fisher’s exact test and adjusted by the Bonferroni method.

13059_2024_3331_MOESM6_ESM.tsv

Additional file 6: Table S5. High-Quality Marker Genes. Column N: number of datasets out of 13 in which the gene expression negatively correlated with a machine-learning derived low-quality probability. Each of the 13 datasets represented a different disease and had a low-quality imbalance between control and patient groups of samples.

13059_2024_3331_MOESM7_ESM.tsv

Additional file 7: Table S6. Regulators of the High-Quality Marker Genes. This table shows statistically significant molecular regulators of the top 500 high-quality marker genes derived by an enrichment analysis (transcription factors or miRNAs). P-value was derived from a Fisher’s exact test and adjusted by the Bonferroni method.

Additional file 8: Table S7. Overlapping Low-Quality Markers in RNA- and ChIP-seq Datasets.

Additional file 9: table s8. overlapping pathways enriched in low-quality markers in rna- and chip-seq datasets., additional file 10: table s9. overlapping high-quality markers in rna- and chip-seq datasets., additional file 11: table s10. overlapping pathways enriched in high-quality markers in rna- and chip-seq datasets., 13059_2024_3331_moesm12_esm.tsv.

Additional file 12: Table S11. Quality Markers in Known Disease Genes. For six diseases, the top 300 known disease genes were derived from the literature and compared to lists of low- or high-quality marker genes. For each disease, the table shows the number of quality markers in the disease genes and a P-value derived from a Fisher’s exact test.

13059_2024_3331_MOESM13_ESM.tsv

Additional file 13: Table S12. Relevance of pathways enrichment after quality imbalance mitigation. Mitigation of quality imbalance was performed either within the differential analysis (using low-quality probabilities of the samples, P low values, as covariates in DESeq2) or by removing quality outlier samples (based on P low values). First, pathways associated with low-quality markers (low-quality pathways) were derived for various reference gene lists of the MSigDB database (column gene_sets). For each dataset, a gene set enrichment analysis was performed on differential genes (differential pathways) with a positive or negative log-fold change (pos and neg values in column deg_sign, respectively). We summarized results only for datasets for which at least 1 low-quality pathway overlaps with differential pathways (n_datasets) and counted the number of datasets where the mitigation method had positive (column better), negative (column worse), or no substantial impact (column no_change).

13059_2024_3331_MOESM14_ESM.csv

Additional file 14: Table S13. Quality Features and P low values as input and output by seqQscorer respectively. Column prefixes designate types of quality features as follows. BowtieMI: mapping metrics from bowtie2 ran on the first fastq file for paired reads to allow comparison to single read data as used by seqQscorer. FASTQC: ordinal outputs of FastQC. TSS: percentage of counted reads in bins around transcription start sites (TSS) with relative bin position in base pairs given in full feature name. ReadsAnno: percentage of sequencing reads in different types of genomic regions described in full feature name. Accession: SRR run accession number. Dataset: GEO accession number. Plow: low-quality probability predicted by the seqQscorer tool.

Additional file 15: Review history.

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sprang, M., Möllmann, J., Andrade-Navarro, M.A. et al. Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets. Genome Biol 25 , 222 (2024). https://doi.org/10.1186/s13059-024-03331-6

Download citation

Received : 13 June 2023

Accepted : 08 July 2024

Published : 16 August 2024

DOI : https://doi.org/10.1186/s13059-024-03331-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quality markers
  • Clinical datasets
  • Bioinformatics

Genome Biology

ISSN: 1474-760X

experimental control in biology

bioRxiv

Identification of stable reference genes in Edwardsiella ictaluri for accurate gene expression analysis

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nawar Al-Janabi
  • ORCID record for Attila Karsi
  • For correspondence: [email protected]
  • Info/History
  • Preview PDF

Edwardsiella ictaluri is a Gram-negative bacterium causing enteric septicemia of catfish (ESC), leading to significant economic losses in the catfish farming industry. RT-PCR analysis is a powerful technique for quantifying gene expression, but normalization of expression data is critical to control experimental errors. Using stable reference genes, also known as housekeeping genes, is a common strategy for normalization, yet reference gene selection often lacks proper validation. In this work, our goal was to determine the most stable reference genes in E. ictaluri during catfish serum exposure and various growth phases. To this goal, we evaluated the expression of 27 classical reference genes (16SrRNA, abcZ, adk, arc, aroE, aspA, atpA, cyaA, dnaG, fumC, g6pd, gdhA, glnA, gltA, glyA, grpE, gyrB, mdh, mutS, pgi, pgm, pntA, recA, recP, rpoS, tkt, and tpi) using five analytical programs (GeNorm, BestKeeper, NormFinder, Comparative DeltaCT, and Comprehensive Ranking). Results showed that aspA, atpA, dnaG, glyA, gyrB, mutS, recP, rpoS, tkt, and tpi were the most stable reference genes during serum exposure, whereas fumC, g6pd, gdhA, glnA, and mdh were the least stable. During various growth phases, aspA, g6pd, glyA, gyrB, mdh, mutS, pgm, recA, recP, and tkt were the most stable, while 16S rRNA, atpA, grpE, and tpi were the least stable. At least four analysis methods confirmed the stability of aspA, glyA, gyrB, mutS, recP, and tkt during serum exposure and different growth stages. However, no consensus was found among the programs for unstable reference genes under both conditions.

Competing Interest Statement

The authors have declared no competing interest.

View the discussion thread.

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Twitter logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
  • Animal Behavior and Cognition (5526)
  • Biochemistry (12568)
  • Bioengineering (9435)
  • Bioinformatics (30819)
  • Biophysics (15849)
  • Cancer Biology (12919)
  • Cell Biology (18522)
  • Clinical Trials (138)
  • Developmental Biology (10000)
  • Ecology (14969)
  • Epidemiology (2067)
  • Evolutionary Biology (19151)
  • Genetics (12735)
  • Genomics (17541)
  • Immunology (12680)
  • Microbiology (29726)
  • Molecular Biology (12369)
  • Neuroscience (64724)
  • Paleontology (479)
  • Pathology (2001)
  • Pharmacology and Toxicology (3455)
  • Physiology (5330)
  • Plant Biology (11090)
  • Scientific Communication and Education (1728)
  • Synthetic Biology (3063)
  • Systems Biology (7685)
  • Zoology (1729)

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

actuators-logo

Article Menu

experimental control in biology

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research on vibration control regarding mechanical coupling for maglev trains with experimental verification.

experimental control in biology

Share and Cite

Liang, S.; Dai, C.; Long, Z. Research on Vibration Control Regarding Mechanical Coupling for Maglev Trains with Experimental Verification. Actuators 2024 , 13 , 313. https://doi.org/10.3390/act13080313

Liang S, Dai C, Long Z. Research on Vibration Control Regarding Mechanical Coupling for Maglev Trains with Experimental Verification. Actuators . 2024; 13(8):313. https://doi.org/10.3390/act13080313

Liang, Shi, Chunhui Dai, and Zhiqiang Long. 2024. "Research on Vibration Control Regarding Mechanical Coupling for Maglev Trains with Experimental Verification" Actuators 13, no. 8: 313. https://doi.org/10.3390/act13080313

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. 1.01 Scientific Methodology

    experimental control in biology

  2. PPT

    experimental control in biology

  3. Biology: Controlled Experiments

    experimental control in biology

  4. PPT

    experimental control in biology

  5. What An Experimental Control Is And Why It’s So Important

    experimental control in biology

  6. PPT

    experimental control in biology

COMMENTS

  1. Controlled experiments (article)

    There are two groups in the experiment, and they are identical except that one receives a treatment (water) while the other does not. The group that receives the treatment in an experiment (here, the watered pot) is called the experimental group, while the group that does not receive the treatment (here, the dry pot) is called the control group.The control group provides a baseline that lets ...

  2. Controlled Experiment

    In biology, a controlled experiment often includes restricting the environment of the organism being studied. This is necessary to minimize the random effects of the environment and the many variables that exist in the wild. ... Statisticians can use the difference between the control group and experimental group and the expected difference to ...

  3. What An Experimental Control Is And Why It's So Important

    By. Daniel Nelson. An experimental control is used in scientific experiments to minimize the effect of variables which are not the interest of the study. The control can be an object, population, or any other variable which a scientist would like to "control.". You may have heard of experimental control, but what is it?

  4. Why control an experiment?

    P < 0.05 tacitly acknowledges the explicate order. Another example of the "subjectivity" of our perception is the level of accuracy we accept for differences between groups. For example, when we use statistical methods to determine if an observed difference between control and experimental groups is a random occurrence or a specific effect, we conventionally consider a p value of less than ...

  5. What Is a Controlled Experiment?

    Published on April 19, 2021 by Pritha Bhandari . Revised on June 22, 2023. In experiments, researchers manipulate independent variables to test their effects on dependent variables. In a controlled experiment, all variables other than the independent variable are controlled or held constant so they don't influence the dependent variable.

  6. Controlled Experiments: Definition, Steps, Results, Uses

    Identifying Variables and Control Groups. Identifying and defining independent, dependent, and control variables is fundamental to experimental planning. Precise identification ensures that the experiment is designed to isolate the effect of the independent variable while controlling for other influential factors.

  7. Scientific control

    A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable (i.e. confounding variables ). [ 1] This increases the reliability of the results, often through a comparison between control measurements and the other measurements. Scientific controls are a part of the ...

  8. Controlling control—A primer in open-source experimental control

    Reactor control systems—such as Chi-Bio ( https://chi.bio/ ), recently published in PLOS Biology —enable biologists to drive multiple processes within living biological samples, using a single experimental framework. Consequently, the dynamic relationships between many biological variables can be explored simultaneously in situ.

  9. Controlled Experiments

    Control in experiments is critical for internal validity, which allows you to establish a cause-and-effect relationship between variables. Example: Experiment. You're studying the effects of colours in advertising. You want to test whether using green for advertising fast food chains increases the value of their products.

  10. Controlled Experiments: Definition and Examples

    In controlled experiments, researchers use random assignment (i.e. participants are randomly assigned to be in the experimental group or the control group) in order to minimize potential confounding variables in the study. For example, imagine a study of a new drug in which all of the female participants were assigned to the experimental group and all of the male participants were assigned to ...

  11. What Is a Control Variable? Definition and Examples

    Both the control group and experimental group should have the same control variables. Control Variable Examples. Anything you can measure or control that is not the independent variable or dependent variable has potential to be a control variable. Examples of common control variables include: Duration of the experiment; Size and composition of ...

  12. Experimental Design

    The " variables " are any factor, trait, or condition that can be changed in the experiment and that can have an effect on the outcome of the experiment. An experiment can have three kinds of variables: i ndependent, dependent, and controlled. The independent variable is one single factor that is changed by the scientist followed by ...

  13. 7 Types of Experiment Controls

    Negative Control. The process of conducting the experiment in the exact same way on a control group except that the independent variables are a placebo that is not expected to produce a result. For example, an experiment on plants where one group of plants are given a fertilizer delivered in a solution and a control group that are given the ...

  14. Positive Control vs Negative Control: Differences & Examples

    A positive control is designed to confirm a known response in an experimental design, while a negative control ensures there's no effect, serving as a baseline for comparison.. The two terms are defined as below: Positive control refers to a group in an experiment that receives a procedure or treatment known to produce a positive result. It serves the purpose of affirming the experiment's ...

  15. What is a control in an experiment?

    A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable. This increases the reliability of the results, often through a comparison between control measurements and the other measurements. Answered by Alessandra P. • Biology tutor. 9482 Views.

  16. Experimental Group

    Experimental Group Definition. In a comparative experiment, the experimental group (aka the treatment group) is the group being tested for a reaction to a change in the variable. There may be experimental groups in a study, each testing a different level or amount of the variable. The other type of group, the control group, can show the effects ...

  17. Control Group Definition and Examples

    A control group is not the same thing as a control variable. A control variable or controlled variable is any factor that is held constant during an experiment. Examples of common control variables include temperature, duration, and sample size. The control variables are the same for both the control and experimental groups.

  18. What Is a Control in an Experiment? (Definition and Guide)

    When conducting an experiment, a control is an element that remains unchanged or unaffected by other variables. It's used as a benchmark or a point of comparison against which other test results are measured. Controls are typically used in science experiments, business research, cosmetic testing and medication testing.

  19. Definitions of Control, Constant, Independent and Dependent ...

    These control groups are held as a standard to measure the results of a scientific experiment. An example of such a situation might be a study regarding the effectiveness of a certain medication. There might be multiple experimental groups that receive the medication in varying doses and applications, and there would likely be a control group ...

  20. Experimental Design • MiniOne Systems

    Experimental controls (or "control groups") are used in controlled experiments to acquire baseline data. This baseline data can be compared to the experimental data to see the relative effect (if any) of the independent variable(s) on the dependent variable. This type of control is a parallel of the experiment, except no changes are made to ...

  21. Control Group Vs Experimental Group In Science

    In research, the control group is the one not exposed to the variable of interest (the independent variable) and provides a baseline for comparison. The experimental group, on the other hand, is exposed to the independent variable. Comparing results between these groups helps determine if the independent variable has a significant effect on the outcome (the dependent variable).

  22. What is Computational Biology?

    Robert F. Murphy Ray and Stephanie Lane Professor of Computational Biology Emeritus. Computational biology is the science that answers the question "How can we learn and use models of biological systems constructed from experimental measurements?" These models may describe what biological tasks are carried out by particular nucleic acid or peptide sequences, which gene (or genes) when ...

  23. Genomic reproducibility in the bioinformatics era

    In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific ...

  24. Experimental and Computational Methods for Allelic Imbalance ...

    Single-cell RNA-seq (scRNA-seq) is emerging as a powerful tool for understanding gene function across diverse cells. Recently, this has included the use of allele-specific expression (ASE) analysis to better understand how variation in the human genome affects RNA expression at the single-cell level. We reasoned that because intronic reads are more prevalent in single-nucleus RNA-Seq (snRNA ...

  25. Overlooked poor-quality patient samples in sequencing data impair

    Lack of reproducibility is a major concern in biomedical research, for example in clinical studies, neuroscience, or cancer biology [1,2,3], and also in other scientific fields such as artificial intelligence, drug discovery, or computer science [4,5,6].There have been already many publications and initiatives to address the problem such as community and statistically driven guidelines for ...

  26. Postdoctoral Research Associate

    The successful candidate will be expected to develop an independent research program in biological control of invasive ornamental and landscape pests that leads to publishing research in peer-reviewed journals. ... molecular biology, thrips taxonomy, phytoseiid mite rearing, banker plant system, application of chemical insecticides and ...

  27. The banana apocalypse is near, but biologists might have found a key to

    The bananas in your supermarket and that you eat for breakfast are facing functional extinction due to the disease Fusarium wilt of banana (FWB) caused by a fungal pathogen called Fusarium ...

  28. Identification of stable reference genes in Edwardsiella ...

    Edwardsiella ictaluri is a Gram-negative bacterium causing enteric septicemia of catfish (ESC), leading to significant economic losses in the catfish farming industry. RT-PCR analysis is a powerful technique for quantifying gene expression, but normalization of expression data is critical to control experimental errors. Using stable reference genes, also known as housekeeping genes, is a ...

  29. Actuators

    The proposed control framework is both effective and concise, making it easy to implement in engineering applications. This research holds significant practical value in improving the stability of maglev trains. ... "Research on Vibration Control Regarding Mechanical Coupling for Maglev Trains with Experimental Verification" Actuators 13, no. 8 ...