Experimental manipulation refers to the intentional alteration of an independent variable by a researcher to observe its effects on a dependent variable in a controlled setting. This technique is crucial for establishing causal relationships and understanding how specific factors influence behavior, communication, and social dynamics among animals.

  • In studies involving visual signals, researchers might manipulate the color or intensity of signals to understand how these changes affect animal communication and responses.
  • Manipulating physical traits associated with sexual dimorphism can help scientists observe differences in behavior or mating success between genders in various species.
  • In dominance hierarchies, experimental manipulation might involve altering social structures to see how changes impact aggression or submission behaviors among individuals.
  • Behavioral plasticity can be examined by manipulating environmental conditions, allowing researchers to observe how animals adapt their behaviors based on specific stimuli or stressors.
  • Through experimental manipulation, scientists can provide insights into the mechanisms behind evolutionary adaptations, allowing for predictions about future behavioral changes.

  • Experimental manipulation allows researchers to systematically change aspects of visual signals, such as color or size, and then measure animal responses. By observing how alterations in these signals affect communication, researchers can identify which traits are most effective in conveying information. This helps clarify the role of visual signals in social interactions and mating behaviors among different species.
  • By experimentally manipulating traits that differ between sexes, such as size or coloration, researchers can assess how these differences influence behaviors like mate choice or competition. For example, if male coloration is altered, scientists can observe changes in female preferences during mating rituals. This experimental approach sheds light on the evolutionary pressures that shape sexual dimorphism and its implications for reproductive strategies.
  • Experimental manipulation is crucial for studying behavioral plasticity because it allows researchers to create specific environmental scenarios and observe how different species adapt their behaviors in response. For instance, altering food availability or social structures can reveal insights into survival strategies and adaptability. This evaluation highlights how manipulative experiments contribute to our understanding of evolutionary biology, informing conservation efforts and predicting future behavioral trends in changing environments.

Independent Variable : The factor that is changed or manipulated in an experiment to test its effects on the dependent variable.

Dependent Variable : The outcome or response that is measured in an experiment to assess the impact of the independent variable.

Controlled Experiment : An experimental setup where all variables except the independent variable are kept constant to ensure that any observed effects can be attributed solely to the manipulation.

© 2024 fiveable inc. all rights reserved.

  • Knowledge Base


  • What Is a Controlled Experiment? | Definitions & Examples

What Is a Controlled Experiment? | Definitions & Examples

Published on April 19, 2021 by Pritha Bhandari . Revised on June 22, 2023.

In experiments , researchers manipulate independent variables to test their effects on dependent variables. In a controlled experiment , all variables other than the independent variable are controlled or held constant so they don’t influence the dependent variable.

Controlling variables can involve:

  • holding variables at a constant or restricted level (e.g., keeping room temperature fixed).
  • measuring variables to statistically control for them in your analyses.
  • balancing variables across your experiment through randomization (e.g., using a random order of tasks).

Why does control matter in experiments, methods of control, problems with controlled experiments, other interesting articles, frequently asked questions about controlled experiments.

Control in experiments is critical for internal validity , which allows you to establish a cause-and-effect relationship between variables. Strong validity also helps you avoid research biases , particularly ones related to issues with generalizability (like sampling bias and selection bias .)

  • Your independent variable is the color used in advertising.
  • Your dependent variable is the price that participants are willing to pay for a standard fast food meal.

Extraneous variables are factors that you’re not interested in studying, but that can still influence the dependent variable. For strong internal validity, you need to remove their effects from your experiment.

  • Design and description of the meal,
  • Study environment (e.g., temperature or lighting),
  • Participant’s frequency of buying fast food,
  • Participant’s familiarity with the specific fast food brand,
  • Participant’s socioeconomic status.

You can control some variables by standardizing your data collection procedures. All participants should be tested in the same environment with identical materials. Only the independent variable (e.g., ad color) should be systematically changed between groups.

Other extraneous variables can be controlled through your sampling procedures . Ideally, you’ll select a sample that’s representative of your target population by using relevant inclusion and exclusion criteria (e.g., including participants from a specific income bracket, and not including participants with color blindness).

By measuring extraneous participant variables (e.g., age or gender) that may affect your experimental results, you can also include them in later analyses.

After gathering your participants, you’ll need to place them into groups to test different independent variable treatments. The types of groups and method of assigning participants to groups will help you implement control in your experiment.

Control groups

Controlled experiments require control groups . Control groups allow you to test a comparable treatment, no treatment, or a fake treatment (e.g., a placebo to control for a placebo effect ), and compare the outcome with your experimental treatment.

You can assess whether it’s your treatment specifically that caused the outcomes, or whether time or any other treatment might have resulted in the same effects.

To test the effect of colors in advertising, each participant is placed in one of two groups:

  • A control group that’s presented with red advertisements for a fast food meal.
  • An experimental group that’s presented with green advertisements for the same fast food meal.

Random assignment

To avoid systematic differences and selection bias between the participants in your control and treatment groups, you should use random assignment .

This helps ensure that any extraneous participant variables are evenly distributed, allowing for a valid comparison between groups .

Random assignment is a hallmark of a “true experiment”—it differentiates true experiments from quasi-experiments .

Masking (blinding)

Masking in experiments means hiding condition assignment from participants or researchers—or, in a double-blind study , from both. It’s often used in clinical studies that test new treatments or drugs and is critical for avoiding several types of research bias .

Sometimes, researchers may unintentionally encourage participants to behave in ways that support their hypotheses , leading to observer bias . In other cases, cues in the study environment may signal the goal of the experiment to participants and influence their responses. These are called demand characteristics . If participants behave a particular way due to awareness of being observed (called a Hawthorne effect ), your results could be invalidated.

Using masking means that participants don’t know whether they’re in the control group or the experimental group. This helps you control biases from participants or researchers that could influence your study results.

You use an online survey form to present the advertisements to participants, and you leave the room while each participant completes the survey on the computer so that you can’t tell which condition each participant was in.

Although controlled experiments are the strongest way to test causal relationships, they also involve some challenges.

Difficult to control all variables

Especially in research with human participants, it’s impossible to hold all extraneous variables constant, because every individual has different experiences that may influence their perception, attitudes, or behaviors.

But measuring or restricting extraneous variables allows you to limit their influence or statistically control for them in your study.

Risk of low external validity

Controlled experiments have disadvantages when it comes to external validity —the extent to which your results can be generalized to broad populations and settings.

The more controlled your experiment is, the less it resembles real world contexts. That makes it harder to apply your findings outside of a controlled setting.

There’s always a tradeoff between internal and external validity . It’s important to consider your research aims when deciding whether to prioritize control or generalizability in your experiment.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

Construct Validation of Experimental Manipulations in Social Psychology: Current Practices and Recommendations for the Future

Experimental manipulations in social psychology must exhibit construct validity by influencing their intended psychological constructs. Yet how do experimenters in social psychology attempt to establish the construct validity of their manipulations? Following a preregistered plan, we coded 348 experimental manipulations from the 2017 issues of the Journal of Personality and Social Psychology . Representing a reliance upon ‘on the fly’ experimentation, the vast majority of these manipulations were created ad hoc for a given study and not previously validated prior to implementation. A minority of manipulations had their construct validity evaluated by pilot testing prior to implementation or via a manipulation check. Of the manipulation checks administered, most were face-valid, single item self-reports and only a few met criteria for ‘true’ validation. In aggregate, roughly two-fifths of manipulations relied solely on face validity. To the extent that they are representative of the field, these results suggest that best practices for validating manipulations are not commonplace — a potential contributor to replicability issues. These issues can be remedied by validating manipulations prior to implementation, using validated manipulation checks, standardizing manipulation protocols, estimating the size and duration of manipulations’ effects, and estimating each manipulation’s effects on multiple constructs within the target nomological network.


Social psychology emphasizes the power of the situation ( Lewin, 1939 ). To examine the causal effects of situational variables, social psychological studies often employ experimental manipulations of such factors and examine their impact on human thoughts, feelings, and behaviors ( Campbell, 1957 ; Cook & Campbell, 1979 ). However, experimental manipulations are only as useful as the extent to which they exhibit construct validity (i.e., that they meaningfully affect the psychological processes that they are intended to affect; Brewer, 2000 ; Garner, Hake, & Eriksen, 1956 ; Wilson, Aronson, & Carlsmith, 2010 ). Yet few recent studies have systematically documented the approaches that social psychological experiments use to estimate and establish the construct validity of their manipulations. Towards addressing this limitation in our understanding, we meta-analyzed the frequency with which various manipulation validation practices were adopted (or not adopted) by a representative sample of studies from what is widely perceived as the flagship publication for experimental social psychology: the Journal of Personality and Social Psychology ( JPSP ).

Validity in Experimental Manipulations of Psychological Processes

Experimental social psychologists often focus on ‘internal validity’ and ‘external validity’ ( Haslam & McGarty, 2004 ). Internal validity is present when experimenters (I) eliminate extraneous variables that might incidentally influence the outcome-of-interest and (II) maximize features of the experimental manipulation that ensure a precise, causal conduit from manipulation to outcome ( Brewer, 2000 ). Experimenters establish internal validity via practices such as removing sources of experimenter bias and demand characteristics and by cultivating ‘experimental realism’, which maximize the chances that the manipulation is the source of experimental effects and not some unwanted artifact of design ( Cook & Campbell, 1979 ; Wilson et al., 2010 ). Other efforts are directed toward maximizing ‘external validity’, ensuring that the experiment captures effects that exist in the ‘real world and that findings of the experiment are able to generalize to other settings, populations, time periods, and cultures ( Highhouse, 2009 ; c.f. Berkowitz & Donnerstein, 1982 ; Mook, 1983 ). Integral to both internal and external validity is a concept most often invoked in the context of clinical assessments and personality questionnaires — construct validity .

Psychological Constructs and the Nomological Network

Psychological scientists often seek to measure and manipulate psychological constructs — so called because they are psychological entities constructed by people, they are not objective realities ( Cronbach & Meehl, 1955 ). Such constructs are considered latent as they are readily imperceptible, as compared to their associated manifestations that are designed to capture (e.g., psychological questionnaires) or influence (e.g., experimental manipulations) them. Latent constructs exist in a nomological (i.e., lawful) network, which is a prescribed array of relationships (or lack thereof) with other constructs ( Cronbach & Meehl, 1955 ). In a nomological network, constructs exist in varying degrees of proximity to one another, with closer proximities reflecting stronger patterns of association. Each construct has its own idiographic network, including construct-specific arrays of associated constructs and construct-specific patterns of associations with those constructs. The constellations of constructs within each nomological network are articulated by psychological theory ( Gray, 2017 ). Nomological networks, when distilled accurately from strong theory, are the basis of construct validity ( Messick, 1995 ).

Construct Validity of Psychological Measures

Construct validity is a methodological and philosophical property that largely reflects how accurately a given manifestation of a study has mapped onto a construct’s latent nomological network ( Borsboom, Mellenbergh, & van Heerden, 2004 ; Embretson, 1983 ; Strauss & Smith, 2009 ). Conventionally, construct validity has been largely invoked in the context of psychological measurement, assessment, and tests. In this context, construct validity is present when a manifest psychological measure (I) accurately quantifies its intended latent psychological construct, (II) shares theoretically-appropriate associations with other latent variables in that construct’s nomological network, and (III) does not capture confounding extraneous latent constructs ( Cronbach & Meehl, 1955 ; Messick, 1995 ; Figure 1 ). According to modern standards in psychology, construct validity is not a property of a given measure or the scores derived from it, but instead such validity pertains to the uses and interpretations of the scores that are derived from the measure ( AERA, APA, & NCME, 2014 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0001.jpg

Schematic depiction of a hypothetical nomological network surrounding the construct of ‘rejection’. Plus signs depict positive associations and minus signs depict negative associations. Greater numbers of plus signs and thicker arrows depict stronger associations and effects.

As depicted in the above schematic, a measure of a given construct (e.g., a scale that measures feelings of rejection), should exhibit a pattern of associations with theoretically-linked variables (e.g., positive correlations with pain and shame, negative correlation with happiness) and null associations with variables outside of the nomological network (e.g., awe).

Estimating the Construct Validity of Psychological Measures

The process of testing the construct validity of measures is well defined (for an overview see Flake, Pek, & Hehman, 2017 ). First, investigators should conduct a comprehensive literature review to define the properties of the construct, prominent theories of the construct, and its associated nomological network ( Simms, 2008 ). This substantive portion of construct validation and research design more broadly is perhaps the most crucial (and oft neglected) aspect. Rigorous theoretical work prior to measure construction is needed to ensure that the manifestation of the measure accurately captures the full range of the construct, distinguishes it from related constructs, and includes measures of other constructs to test the construct’s nomological network ( Benson, 1998 ; Loevinger, 1957 ; Zumbo & Chan, 2014 ).

Second, researchers apply their theoretical understanding to design the content of the measure to capture the breadth and depth of the construct (i.e., content validity; Haynes, Richard, & Kubany, 1995), often in consultation with experts outside the study team. Third, this preliminary measure is administered and empirical analyses (e.g., item response theory, exploratory and confirmatory factor analyses) are used on the resulting data to (A) ensure that the measure’s data structure exhibits the expected form, to (B) select content with good empirical qualities, and to (C) ensure the measure is invariant across groups it should be invariant across ( Clark & Watson, 2019 ). Fourth, a refined version of the measure is administered alongside other measures to ensure that it (A) positively corresponds to measures of the same or similar constructs (i.e., convergent validity), it (B) negatively or weakly corresponds to measures of different or dissimilar constructs (i.e., discriminant validity), it (C) is linked to theoretically-appropriate real-world outcomes (i.e., criterion validity), and that it (D) differs across groups that it should differ across ( Smith, 2005 ). Measures that meet these stringent psychometric criteria can be said to exhibit construct validity (i.e., they measure the construct they are intended to measure and do not capture problematically large amounts of unintended constructs). Yet how do these concepts and practices translate to experimental manipulations of psychological processes?

Construct Validity of Psychological Manipulations

Construct validity is not confined to psychometrics and is a crucial element in experimental psychology ( Cook & Campbell, 1979 ). Translated to an experimental setting, construct validity is present when a manifest psychological manipulation (I) accurately and causally affects its intended latent psychological construct in the intended direction, (II) exerts theoretically-appropriate effects upon other latent variables in that construct’s nomological network, and (III) does not affect or weakly affects confounding extraneous latent constructs ( Campbell, 1957 ; Shadish, Cook, & Campbell, 2002 ). This desired pattern of effects is illustrated in a phenomenon we deem the nomological shockwave .

The nomological shockwave.

In a nomological shockwave, a psychological manipulation (e.g., a social rejection manipulation; Chester, DeWall, & Pond, 2016 ) exerts its initial and strongest causal effects on the target latent construct in the intended direction (e.g., greatly increased feelings of rejection; Figure 2 ). This change in the target construct then ripples out through that construct’s latent nomological network — causally affecting related constructs in ways that reflect the degree and strength of their latent associations with the target construct. More specifically, the shockwave exerts stronger effects upon constructs that are closer to the manipulation’s point of impact (e.g., moderately increased pain). Conversely, the shockwave’s effects get progressively weaker as the theoretical distance from the target construct increases (e.g., modestly increased shame, modestly reduced happiness). The shockwave will not reach constructs that lie beyond the target construct’s nomological network (e.g., no effect on awe). Back in the manifest domain, these latent shockwave effects are then captured with manipulation check and the various discriminant validity checks that are causally affected by the latent nomological shockwave.

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0002.jpg

Schematic depiction of a hypothetical nomological shockwave elicited by a construct valid social rejection manipulation. Plus signs depict positive effects and minus signs depict negative effects. Greater numbers of plus signs and thicker arrows depict stronger associations and effects.

Internal versus construct validity.

Construct validity differs from another type of validity that is critical for experimental manipulations — internal validity. Internal validity reflects the extent to which the intended aspects of the manifest experimental manipulation — and not some artifact(s) of the research methodology — exerted a causal effect on an outcome ( Campbell, 1957 ; Shadish et al., 2002 ; Wilson et al., 2010 ). Threats to internal validity include unintended differences between the participants in the experimental conditions, participant attrition and fatigue over the course of the experiment, environmental and experimenter effects that undermine the manipulation, measures that are not valid or reliable, and participant awareness (of the experiment’s hypotheses, of deceptive elements of the study, or that they are being studied; Shadish et al., 2002 ; Wilson et al., 2010 ). Each of these issues can elicit spurious effects that are not due to the intended aspects of the experimental manipulation.

Although construct validity requires that the causal chain of events from manipulation to outcome effect was intact (i.e., that the manipulation possessed internal validity), its focus is on the ability of the manipulation to impact the intended constructs in the intended manner ( Shadish et al., 2002 ). In other words, internal validity ensures that the manipulation’s effect was causal and construct validity ensures that the manipulation’s effect was accurate. Threats to a manipulation’s construct validity are ‘instrumental incidentals’ --- or confounding aspects of the manipulation that elicited the intended cause in the targeted constructs but were not the aspects of the manipulation that were intended to elicit that effect ( Campbell, 1969 ). For instance, imagine that an experimental condition (e.g., writing an essay that recalls an experience of rejection) was compared to an inappropriate control condition (e.g., writing an essay that tells a story of a brave and adorable otter). This manipulation design would cause an intended increase in rejection, but this effect would be due to both the intended aspect of the manipulation (i.e., the rejection-related content of the essay) and unintended, confounding aspects as well (e.g., positive attitudes towards brave and adorable otters, ease of writing about a fictional character). Another threat to construct validity is a lack specificity, in which a manipulation exerts a similarly-sized impact on a broad array of constructs instead of isolating the target construct (e.g., a rejection manipulation that also increases sadness and anger to the same extent as it does feelings of rejection). A construct valid experimental manipulation will exert its intended, targeted effects on the intended, specific constructs only through theoretically-appropriate aspects of the manipulation ( Reichardt, 2006 ).

Whereas internal validity can be established prior to testing the construct validity of a manipulation, construct validity first requires that a manipulation exhibit internal validity. Indeed, if an experimental artifact caused by some other aspect of the experiment (e.g., participant selection bias caused by a lack of random assignment) was the actual and unintended source of an observed experimental effect, then it is impossible to claim that the manipulation is what affected the target construct ( Cook & Campbell, 1979 ). This is akin to how psychological questionnaires can have internal consistency among their items without exhibiting construct validity, yet the construct validity of this measure requires the presence of internal consistency. The process through which measures are validated can be instructive for determining how to establish the construct validity of experimental manipulations.

Current Construct Validity Practices for Psychological Manipulations

A survey of the literature on experimental manipulation in social psychology revealed three primary approaches to establishing that a given manipulation has construct validity. These approaches do not map neatly onto the process through which psychological measures are validated, an issue we return to in the Discussion.

Employ previously validated manipulations.

The simplest means to establish the validity of a manipulation is to replicate one that has been already validated in previous research. Many experimental paradigms are frequently re-used in other investigations and modified for other purposes. For instance, the seminal article that introduced the Cyberball social rejection paradigm has been cited over 1,900 times ( Williams, Cheung, & Choi, 2000 ). However, the value of employing previously-used manipulations is predicated on the extent to which they were adequately validated in such pre-existing work. Previously-used manipulations, whether they have been validated or not, are often modified prior to implementation (e.g., the identities of the Cyberball partners are varied; Gonsalkorale & Williams, 2007) or are conceptually-replicated by implementing the manipulation through an entirely different paradigm (e.g., being left out of an online chatroom instead of a ball-tossing game; Donate et al., 2017 ). These conceptual replications are important means to establish the ability of the manipulated construct’s ability to exert its effects irrespective of the manifest characteristics of the manipulation. However, conceptual replication cannot alone establish construct validity.

Pilot validity studies.

Whether a manipulation is newly created or acquired from a prior publication, authors often ‘pilot test’ them prior to implementation in hypothesis testing. This practice entails conducting at least one separate, ‘pilot study’ of the manipulation outside of the context of the full study procedure ( Ellsworth & Gonzalez, 2003 ). Such pilot studies are used to examine various aspects of the manipulation, from its feasibility to participant comprehension of the instructions to various forms of validity. Of particular interest to the present research, pilot validity studies (a subset of the broader ‘pilot study’ category) estimate the manipulation’s effect on the target construct (i.e., they pilot test the manipulation’s construct validity). In this way, pilot validity studies are a hybrid of experimental pilot studies and the ‘validation studies’ used by clinical and personality psychologists who examine the psychometric properties of new measures using the steps we previously outlined.

Pilot validity testing of a new manipulation is an essential step to ensure that the manipulation has the intended effect on a target manipulation check and to rule out confounding processes ( Wilson et al., 2010 ). Pilot validity testing can also estimate the magnitude and duration of the intended effect. If the effect is so small or transient that it is nearly impossible to detect or if the effect is so strong or long-lasting that it produces ceiling effects or excessive distress among your participants, then the manipulation can be altered to address these issues and re-piloted. If deception is used, suspicion probes can be included in a pilot study to estimate whether the deception was perceived by your participants ( Blackhart, Brown, Clark, Pierce, & Shell, 2012 ). Even if the manipulation has been acquired from previous work, pilot validity testing is a crucial way to ensure that you have accurately recreated the protocol and replicated the validity of the manipulation ( Ellsworth & Gonzalez, 2003 ). As all of these factors have an immense impact on whether a given manipulation will affect its target construct, pilot validity studies are an important means of ensuring the construct validity of a manipulation.

Manipulation checks.

A diverse array of measurements fall under the umbrella term of ‘manipulation check’. The over-arching theme of such measures is to ensure that a given manipulation had its intended effect ( Hauser, Ellsworth, & Gonzalez, 2018 ). We adopt a more narrow definition to conform to the topic of construct validity — manipulation checks are measures of the construct that the manipulation is intended to affect. This definition excludes attention checks, comprehension checks, and other forms of instructional manipulation checks ( Oppenheimer, Meyvis, & Davidenko, 2009 ), as they do not explicitly quantify the target construct. These instructional manipulation checks are useful tools, especially because they can identify construct irrelevant variance that is caused by the manipulation. However, our present focus on construct validity entails that we apply the label of ‘manipulation check’ to measures of a manipulation’s target construct. Measures of different constructs that are used to ensure that a given manipulation did not exert similarly robust effects onto other, non-target constructs we refer to as ‘discriminant validity checks’. Such discriminant validity checks are specific to each investigation and should include theoretically-related constructs to the target construct so that the manipulation’s specificity and nomological shockwave can be estimated.

Many articles have debated the utility and validity of manipulation checks, with some scholars arguing for their exclusion ( Fayant, Sigall, Lemonnier, Retsin, & Alexopoulos, 2017 ; Sigall & Mills, 1998 ). Indeed, manipulation checks can have unintended consequences (e.g., drawing participants’ attention to deceptive elements of the experiment, interrupting naturally unfolding psychological processes). Minimally intrusive validation assessments are thus preferable to overt self-report scales ( Hauser et al., 2018 ). Although many such challenges remain with the use of manipulation checks, they are a necessary source of construct validity data that an empirical science cannot forego. Without manipulation checks, the validity of experimental manipulations would be asserted by weaker forms of validity (e.g., face validity), which provide deeply flawed footing when used as the sole basis for construct validity ( Grand, Ryan, Schmitt, & Hmurovic, 2010 ). In an ideal world, such manipulation checks would be validated according to best psychometric practices (see Flake et al., 2017 ). Without validated manipulation checks, it is uncertain what construct the given check is capturing. As such, an apparently ‘successful’ manipulation check could be an artifact of another construct entirely.

The Present Research

The present research was purposed with a central, descriptive research aim related to construct validation practices for experimental manipulations in social psychology: document the frequency with which manipulations were (I) acquired from previous research or newly created, (II) paired with a pilot validity study, and/or (III) paired with a manipulation check. It was impractical to estimate whether each manipulation that was acquired from previous research was adequately validated by that prior work, so we gave authors the benefit of the doubt and assumed that the research that they cited alongside their manipulations presented sufficient evidence of the manipulation’s construct validity. Based on findings from the present research, it is likely that many of these cited papers did not report sufficient evidence for the manipulation’s construct validity. Therefore, this is a relatively liberal criterion that probably overestimates the extent to which manipulations have been truly validated.

We focused on social psychology given its heavy reliance upon experimental manipulations, our membership in this field, and this field’s ongoing reckoning with replication issues that may result, in part, from experimental practices. We hope that other experimentally-focused fields such as cognitive and developmental psychology, economics, management, marketing, and neuroscience may glean insights into their own manipulation validation practices and standards from this investigation. Further, clinical and counseling psychologists might learn approaches to improving the construct validity of clinical trials, which are similar to experiments in many ways.

In addition to these descriptive analyses, we also empirically examined several important qualities of pilot validity studies and manipulation checks. There is only a sparse literature on these topics and we aimed to fill this gap in our understanding. Given the widespread evidence for publication bias in the field of psychology ( Head, Holman, Lanfear, Kahn, & Jennions, 2015 ), our primary goal in these analyses was to estimate the extent to which pilot and manipulation check effects are impacted by such biases. First, we tested the evidentiary value of these effects via p -curve analyses in order to estimate the extent to which pilot validity studies and manipulation checks capture ‘true’ underlying effects and are not merely the result of publication bias and questionable research practices ( Simonsohn, Nelson, & Simmons, 2014 ). Second, p -curve analyses estimated the statistical power of these reported pilot validity and check effects to examine whether long-standing claims that pilot validity studies in social psychology are underpowered ( Albers & Lakens, 2018 ; Kraemer, Mintz, Noda, Tinklenberg, & Yesavage, 2006). Third, we employed conventional meta-analyses to estimate the average size and heterogeneity of pilot validity study and manipulation check effects, useful information for future power analyses. Fourth, these meta-analyses also estimated the presence of publication bias to establish the extent to which pilot validity studies and manipulation checks are selectively reported based on the favorability of their results.

Finally, we returned to our descriptive approach to examine the presence of suspicion probes in the literature. Given the crucial role of suspicion probes in much of social psychological experiments ( Blackhart et al., 2012 ; Nichols & Edlund, 2015 ), we examined whether manipulations were associated with a suspicion probe and whether suspicious participants were retained or excluded from analyses.

Open Science Statement

This project was intended to capture an exploratory snapshot of the literature and therefore no hypotheses were advanced a priori . The preregistration plan for the present research is publicly available online (original plan: ; amendment: ), as is the disclosure table of all included studies and their associated codes ( ).

Literature Search Strategy

We conducted our literature search within a journal that is often reputed to be the flagship journal of experimental social psychology, JPSP . We limited our literature search to a single year of publication (as in Flake et al., 2017 ), selecting the year 2017 because it was recent enough to reflect current practices in the field. Our preregistration plan stated that we would examine volume 113 of JPSP , limiting our coding procedures to the two experimentally focused sections: Attitudes and Social Cognition ( ASC ) and Interpersonal Relations and Group Processes ( IRGP ). We excluded the Personality Processes and Individual Differences ( PPID ) section of JPSP due to its focus on measurement and not manipulation. However, we deviated from our preregistration plan by also including volume 112 in our analysis in order to increase our sample size and therefore our confidence in our findings.

Inclusion Criteria

We sought to first identify every experimental manipulation within the articles that fell within our literature search. In our initial preregistration plan, we defined experimental manipulations as “any systematic alteration of a study’s procedure meant to change a specific psychological construct.” However, this definition did not always provide clear guidance in many instances in which a systematically-altered aspect of a given study might or might not constitute an experimental manipulation. The ambiguity around many of these early decisions caused us to rapidly deem it impossible to implement this definition in any rigorous or objective manner. Instead, we revised our preregistration plan to follow two, simple heuristics. First, we decided that a study aspect would be deemed an experimental manipulation if it was described by the authors as a ‘manipulation’. This approach lifted the burden of determining whether a given aspect of a study was a ‘true’ manipulation from the coders and instead allowed a given article’s authors, their peer reviewers, and editor to determine whether something could be accurately described as an experimental manipulation. Second, if participants were ‘randomly assigned’ to different treatments or conditions, this aspect of the study procedure would be considered an experimental manipulation, as random assignment is the core aspect of experimental manipulation ( Wilson et al., 2010 ). We deviated from our preregistration plans by deciding to exclude studies from our analyses that were not presented as part of the main sequence of hypothesis-testing studies in each paper (e.g., pilot studies). This deviation was motivated by the realization that pilot validity studies were often provided as the very sources of purported validity evidence we sought to identify for each paper’s main experiments, and therefore should be examined separately.

Coding Strategy

We coded every experimental manipulation for several criteria that either provided descriptive detail or spoke to the evidence put forward for the construct validity of the manipulation.

Coding process.

All manipulations were coded independently by the first and last author, who each possess considerable expertise and training in experimental social psychology, research methodology, and construct validation. The first and last authors met frequently throughout the coding process to identify coding discrepancies. Such discrepancies were reviewed by both authors until both authors agreed upon one coding outcome (as in Flake et al., 2017 ). Prior to such discrepancy reviews and meetings, the authors each created 459 codes of the nine key coded variables of our meta-analysis (e.g., whether a given study included a manipulation, how many manipulations were included in each study, whether a manipulation was paired with a manipulation check) from the first 11 articles in our literature review. In an exploratory fashion, we examined the inter-rater agreement in these initial codes (459 codes per rater × 2 raters = 918 codes; 102 codes per coded variable), which were uncontaminated because the authors had yet to meet and conduct a discrepancy review. These initial codes exhibited substantial inter-rater agreement across all coded variables, κ = .89. Inter-rater agreement estimates for each of the uncontaminated coded variables are presented below.

Condition number and type.

Each manipulation was coded for the number of conditions it contained, κ = .94, and whether it was administered in a between- or within-participants fashion, κ = .92. Deviation from our preregistration plan, we also coded whether each of the between-participants manipulations were described as randomly-assigning participants to each condition of the manipulation, κ = .63.

Use in prior research.

We coded each manipulation for whether the manipulation was paired with a citation that indicated the manipulation was acquired from previously published research, κ = .84. If this was not the case, we assumed that the manipulation was uniquely created for the given study. Manipulations that were acquired from prior publications were then coded for whether or not the authors stated that the manipulations were modified from the referenced version of the manipulation, κ = .75. Crucially, we did not code for or select manipulations based on whether that manipulation had been previously validated by the cited work. We refrained from doing so for two reasons. First, because each cited manipulation could have required a laborious search through a trail of citations in order to find evidence of validation. Second, because simply citing a paper in which the manipulation was previously used is likely an implicit argument that the manipulation has been validated by that work.

As a deviation from our preregistration plans, we also coded each manipulation for whether the manipulation’s construct validity was pilot tested. More specifically, we coded whether each manipulation was paired with any pilot validity studies that empirically tested the effect of the manipulation on the intended construct (i.e., tested the manipulation’s construct validity), κ = .91.

Each manipulation was coded for whether a manipulation check was employed, κ = .88. If such a check was employed, we coded the form of the manipulation check (e.g., self-report measure) and whether it was validated in previously published research or was created uniquely for the given study and not validated. We did not rely on authors to make this determination (i.e., we did not deem a measure a manipulation check simply because the authors of an article referred to it as such, and we did not exclude a measure from consideration as a manipulation check simply because the authors did not refer to it as a manipulation check). Instead, we defined a manipulation check as any measure of the construct that the given manipulation was intended to influence ( Hauser et al., 2018 ; Lench, Taylor, & Bench, 2014 ) and included any measure that met this criterion. This process therefore excluded instructional manipulation checks and other measures that authors deemed ‘manipulation checks’, but did not actually assess the construct that the manipulation was designed to alter (as in Lench et al., 2014 ). For each manipulation check we identified, we then coded the form that it took (e.g., self-report questionnaire) and the number of measurements that comprised it (e.g., the number of items in the questionnaire).

Suspicion probes.

We also coded for whether investigators assessed for participant suspicion of their manipulation, κ = .92. If such a suspicion probe was used, we coded the form that it took and whether participants who were deemed ‘suspicious’ were excluded from analyses, κ = .92.

Volumes 112 and 113 of the ASC and IRGP sections of JPSP contained 58 articles. Four of these articles were excluded as they were meta-analyses or non-empirical, leaving 54 articles that summarized 355 independent studies. Of these studies, 244 (68.73%) presented at least one experimental manipulation for a total of 348 experimental manipulations acquired from 49 articles.

Manipulations Per Study

The majority of studies that contained experimental manipulations reported one (66.80%) or two (25.00%) manipulations, though there was considerable variability in the number of manipulations per study: M = 1.43, SD = 0.68, mode = 1, range = 1 – 4.

Conditions Per Manipulation

The majority of studies reported two (82.18%) or three (12.64%) conditions for each manipulation, though we observed wide variation in the number of conditions per manipulation: M = 2.30, SD = 0.98, mode = 2, range = 2 – 13).

Between- Versus Within-Participants Designs

The overwhelming majority of manipulations were conducted in a between-participants manner (94.54%), as opposed to a within-participants (5.46%) approach. Variability in the number of conditions was observed in both within- and between-participants manipulations. These frequencies are depicted in Figure 3 , an alluvial plot created with SankeyMATIC: . Alluvial plots visually mimic the flow of rivers into an alluvial fan of smaller tributaries. These flowing figures depict how frequency distributions fall from left to right into a hierarchy of categories. In each plot, a full distribution originates on the left-hand side that then ‘flows’ to the right into different categories whose width is based on the proportion assigned to that initial category. These streams then flow into even more specific sub-categories based on their proportions in an additional category.

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0003.jpg

Alluvial plot of condition frequencies by condition type.

Manipulation Validation Practices

Of the manipulations, only a modest majority of 202 (58.04%) were accompanied by at least one of the following sources of purported validity evidence: a citation indicating that the manipulation was used in prior research, a pilot validity study, and/or a manipulation check (see Table 1 and Figure 4 for a breakdown of these statistics). Pilot validity study analyses were not preregistered and therefore, exploratory.

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0004.jpg

Alluvial plot depicting distributions of the types of purported validity evidence reported for each manipulation.

Frequencies and percentages (in parentheses) of the number of manipulations that were presented alongside each type of purported validity evidence (i.e., a citation indicating published research that the manipulation had been acquired from, a pilot validity study, and/or a manipulation check measure).

No CitationWith Citation
Not PilotedPilotedNot PilotedPiloted
No Check146 (41.96%)35 (10.06%)36 (10.34%)4 (1.15%)
With Check63 (18.10%)37 (10.63%)26 (7.47%)1 (0.29%)

Citations from previous publications.

Of all manipulations, 67 (19.25%) were paired with a citation that indicated the manipulation was used in previously published research. Of these cited manipulations, 16 (23.88%) were described as being modified in some way from their original version. The majority of the remaining 51 cited manipulations were not described in a way in which it was clear whether they had been modified from the original citation or not. Therefore, the number of modified manipulations provided here may be an underestimate of their presence in the larger literature.

Across all manipulations, 127 (36.49%) were accompanied by a manipulation check measure. These 127 manipulation checks took the form of self-report questionnaires ( n = 105; 82.68%), coded behavior ( n = 3; 2.36%), behavioral task performance ( n = 9; 7.09%), or an unspecified format ( n = 10; 7.87%; Figure 5 ). Of the 105 self-report manipulation check questionnaires, 68 (64.76%) were comprised of a single item and the rest included a range of items: M = 1.68, SD = 1.27, range = 1 – 10 ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0005.jpg

Alluvial plot depicting distributions of the types of manipulation check measures reported for each manipulation and numbers of self-report items.

Suspicion Probes

Of all manipulations, only 31 (8.90%) were accompanied by a suspicion probe. Probing procedures were invariably described in vague terms (e.g., ‘a funnel interview’) and no experimenter scripts or sample materials were provided that gave any further detail. Of these probed manipulations, only five (16.10%) from two articles reported that they excluded ‘suspicious’ participants from analyses. The exact criteria for what determined whether a participant was ‘suspicious’ or not was not provided in any of these cases nor was the impact of excluding these participants estimated.

Exploratory Analyses

Random assignment..

We found that 205 (62.31%) of between-participants manipulations declared that participants were randomly assigned to conditions. No articles described the method they used to randomly assign participants.

Pilot validity study meta-analyses.

Pilot validity studies were reported as purported validity evidence for 77 (22.13%) of all manipulations. However, the majority of these studies either did not report inferential statistics, described the results too vaguely to identify the target effect, or were drawn from overlapping samples of participants. Often, the results of pilot validity studies were summarized in a qualitative fashion without accompanying inferential statistics or methodological details (e.g., “Pilot testing suggested that the effect … tended to be large”; Gill & Cerce, 2017 , p. 364). Based on the 15 pilot validity study effects that we could extract, p -curve analyses revealed that pilot validity studies exhibited remarkable evidentiary value and were statistically powered at 99% ( Figure 6 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0006.jpg

Results of the p -curve analysis on pilot validity study effects.

Exploratory random-effects meta-analyses on 14 of the Fisher’s Z -transformed pilot validity effects (one effect could not be translated into an effect size estimate) revealed an overall medium-to-large effect size, r = .46 [ 95% CI = .34, .59], SE = 0.06, Z = 7.28, p < .001, with significant underlying inter-study heterogeneity, Q (13) = 136.70, p < .001. The average sample size of these studies was N = 186.47, which explains the high statistical power we observed for such relatively strong effects. Given that the Little evidence was found for publication bias in pilot validity studies (see Supplemental Document 1 ).

Manipulation check meta-analyses.

Of the 127 manipulations with manipulation checks, six did not report the results of the manipulation check and 14 others reported incomplete inferential statistics (e.g., a range of p-values, no test statistics) such that it was difficult to verify the veracity of their claims. From these manipulation checks, 82 independent manipulation check effects were extracted and submitted to exploratory p -curve analyses, which revealed that manipulation checks exhibited remarkable evidentiary value and were statistically powered at 99% ( Figure 7 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1615850-f0007.jpg

Results of the p-curve analysis of manipulation check effects.

Exploratory random-effects meta-analyses on these Fisher’s Z -transformed manipulation check effects revealed an overall medium-to-large effect size, r = .55 [ 95% CI = .48, .62], SE = 0.03, Z = 16.31, p < .001, with significant underlying inter-study heterogeneity, Q (81) = 2,167.90, p < .001. The average sample size of these studies was N = 304.79, which explains the high statistical power we observed for such relatively strong effects. No evidence was found for publication bias (see Supplemental Document 1 ).

Internal consistency of manipulation checks.

Among the 37 manipulation checks that took the form of multiple item self-report scales, exact Cronbach’s alphas were provided for 18 (48.65%) of them and these estimates by-and-large exhibited sufficient internal consistency: M = .83, SD = .12, range = .49 – .98.

Validity of manipulation checks.

Crucially, only eight of all of the manipulation checks (6.30%) were accompanied by a citation indicating that the check was acquired from previous research. After reading the cited validity evidence for each case, only six (4.27%) manipulation checks actually met the criteria for established validation, taking the forms of the Need Threat Scale (NTS; Williams, 2009 ) and the Positive Affect Negative Affect Schedule (PANAS; Watson, Clark, & Tellegen, 1988 ).

Construct valid measures in psychology are able to accurately capture the target construct and not extraneous variables ( Borsboom et al., 2004 ; Cronbach & Meehl, 1955 ; Embretson, 1983 ; Strauss & Smith, 2009 ). Such construct validity is not limited to psychometrics but applies equally to experimental manipulations of psychological processes. Indeed, construct valid manipulations must affect their intended construct in the intended way, and not exert their effect via confounding variables ( Cook & Campbell, 1979 ). To better understand the current practices through which experimental social psychologists provide evidence that their manipulations possess construct validity, we examined published articles from the field’s flagship journal: JPSP .

Chief among our findings was that approximately 42% of experimental manipulations were paired with no evidence beyond face validity of their underlying construct validity — no citations, no pilot validity testing, and no manipulation checks. Indeed, the most common approach in our review was that of presenting no construct validity evidence whatsoever. To the extent that this estimate generalizes across the field, this suggests that social psychology’s experimental foundations rest upon considerably unknown ground instead of empirical adamant. In what follows, we highlight other key findings from each domain of our meta-analysis, while providing recommendations for future practice in the hope of improving the state of experimental psychological science.

Prevalence and Complexity of Experimental Manipulations

At a first glance, we find that experimental manipulation is alive and well in social psychology. A little more than two-thirds of the studies we reviewed had at least one experimental manipulation. Suggesting a preference for simplicity, over 90% of studies with manipulations employed only one or two manipulations, and a similar number of manipulations contained only two or three conditions. This prevalence of relatively simple experimental designs is promising as exceedingly complex designs (e.g., a 2 × 3 × 2 factorial design) undermine statistical power and inflate type I and II error rates ( Smith, Levine, Lachlan, & Fediuk, 2002 ).

Over 90% of manipulations were conducted in a between-participant manner, demonstrating a neglect of within-participants experimental designs. Within-participants designs are able to maximize statistical power, as compared to between-participants designs ( Aberson, 2019 ). As such, the over-reliance we observed on between-participants designs may undermine the overall power of the findings from experimental social psychology. However, many manipulations may simply be impossible to present in a repeated-measures fashion without undermining the internal validity thereof.

Random Assignment and the Lack of Detail in Descriptions of Manipulations

Of the between-participants manipulations, a considerable number (approximately two-fifths) failed to mention whether participants were randomly assigned to their experimental conditions. Given that random assignment is a necessary condition for a true experimental manipulation ( Cook & Campbell, 1979 ; Wilson et al., 2010 ), explicit statements of what assignment procedure was used to place participants in their given condition should be included in every report of experimental results. Furthermore, none of the manipulations that did mention random assignment to condition described precisely what procedure was used to randomize the assignment process. Without this information, it is impossible to know if condition assignment was truly randomized or perhaps the randomization procedure could have introduced a systematic bias of some kind. Relatedly, we did not learn about whether or how within-participants manipulations randomized the order of the conditions across participants. Future research would benefit from examining the prevalence of these practices and their impact on the construct validity of within-participants manipulations.

This lack of information about random assignment reflected a much more general lack of basic information that authors provided about their manipulations. It was often the case where manuscripts did not even mention the validity information we sought. Pilot validity studies and manipulation checks were frequently described in a cursory fashion, absent necessary methodological detail and inferential statistics. More transparency is needed in order to evaluate each manipulation’s validity and for researchers to replicate the procedure in their own labs. Towards this end, we have created a checklist of information that we hope peer reviewers will apply to new research in order to ensure that each manipulation, manipulation check, and pilot validity study is described in sufficient detail ( Appendix A ). We further encourage experimenters to use this checklist to adequately detail these important aspects of their experimental methodology.

Previously Used vs. ‘On The Fly’ Manipulations

Approximately 80% of manipulations were not acquired from previous research and were instead created ad hoc for a given study. This suggests that researchers heavily rely upon ‘on the fly’ manipulation (term adapted from Flake et al., 2017 ), in which ad hoc manipulations are routinely created from scratch to fit the parameters of a given study. The prevalence of this ‘on the fly’ manipulation is almost twice that of ‘on the fly’ measurement in social and personality psychology (~46%; Flake et al., 2017 ). This prevalence rate may be inflated by a tendency for authors to simply fail to provide such citations for manipulations that have, in fact, been implemented in prior publications. We encourage experimenters to cite publications that empirically examine the validity of their manipulations, whenever they exist. These ad hoc procedures appear to acutely afflict experimental designs and future work is needed to determine the reasons underlying this disproportionate practice.

The field’s reliance on creating manipulations de novo is concerning. This practice entails that much time and resources are spent on creating new manipulations instead of implementing and improving upon existing, validated manipulations. This tendency towards ‘on the fly’ manipulation may reflect psychological science’s bias towards novelty and away from replicating past research ( Neuliep & Crandall, 1993 ), which has known adverse consequences ( Open Science Collaboration, 2015 ). We therefore recommend that experimenters avoid ‘on the fly’ manipulation and instead employ existing, previously validated manipulations whenever possible (Recommendation 1), though we note few of such manipulations are likely available.

Of the relatively small number of manipulations that were acquired from previous research, roughly one-fourth of them were modified from their original form. This is likely an underestimate of modification rates, as none of the articles we coded explicitly stated that their manipulation was not modified in any way. As such, modification rates may be considerably higher. This practice can have consequences as modifying a manipulation undermines the established validity of that manipulation, just as modifying a questionnaire often requires it to be re-validated ( Flake et al., 2017 ). This practice of unvalidated modification compounds these issues when the original manipulation that has been modified was never validated itself. We therefore recommend that experimenters avoid modifying previously validated manipulations whenever possible (Recommendation 2A). When modification is unavoidable, we recommend that investigators re-validate the modified manipulation prior to implementation (Recommendation 2B).

We realize that Recommendations 1 and 2 are likely to be difficult to adhere to given the pessimistic nature of our findings. Indeed, it is difficult to avoid ‘on the fly’ manipulation development and modification when there are no validated versions of a given manipulation already in existence. However, we are optimistic that if experimenters begin to improve their validation practices, this will not be an issue for long. These recommendations are given with that bright future in mind.

Pilot Validity Testing

Approximately one in five manipulations were associated with a pilot validity study prior to implementation in hypothesis testing. This low adoption rate of pilot validity studies suggests that the practice of pilot validity testing is somewhat rare, which is problematic as such testing is a critical means of establishing the construct validity of a manipulation ( Ellsworth & Gonzalez, 2003 ; Wilson et al., 2010 ). Pilot validity testing has several advantages over simply including manipulation checks during hypothesis testing. First, pilot validity testing prevents unwanted effects of a manipulation check from intruding upon other aspects of the study ( Hauser et al., 2018 ). Second, pilot validity studies allow for changes to be made to the manipulation to optimize its effects before it is implemented. Pilot validity testing would further ensure that time and resources are not wasted on testing hypotheses with manipulations of unknown construct validity. We therefore recommend that experimenters conduct well-powered pilot validity studies for each manipulation prior to implementation in hypothesis testing (Recommendation 3A).

These relatively rare reports of pilot validity studies may have been artificially suppressed by the practice of not publishing pilot validity evidence ( Westlund & Stuart, 2017 ). However, all pilot validity evidence should be published alongside the later studies it was used to develop in order to transparently communicate the evidence for and against the validity of the given manipulation ( Asendorpf et al., 2013 ). Keeping pilot validity studies behind a veil may also reflect a broader culture that under-values this crucial phase of the manipulation validation process. Pilot validity studies should not be viewed as mere ‘dress rehearsals’ for the main event (i.e., hypothesis testing), but should be granted the same importance, resources, and time as the studies in which they are subsequently employed. Robust training, investment, and transparency in pilot validity testing will produce more valid manipulations and therefore, more valid experimental findings. We therefore recommend that the results of pilot validity studies should be published as validation articles (Recommendation 3B) and these validation articles should be accompanied by detailed protocols and stimuli needed to replicate the manipulation (Recommendation 3C).

On an optimistic note, meta-analyses revealed that pilot validity studies exhibited substantial evidentiary value and a robust meta-analytic effect size. These findings imply that researchers are conducting pilot validity tests that capture real and impactful effects and are not just capitalizing on sources of flexibility or variability. Little evidence of p -hacking ( Simonsohn et al., 2014 ) or publication bias were observed, suggesting that researchers are not simply selectively reporting their pilot validity data to artificially evince an underlying effect, nor are they merely submitting unsuccessful pilot validity studies to the ‘file drawer’ and cherry picking those that obtain effects. These meta-analyses also revealed that these studies were statistically powered to a maximal degree, arguing against characterizations of pilot validity studies as underpowered ( Albers & Lakens, 2018 ; Kraemer et al., 2006).

Manipulation Checks

Approximately one-third of manipulations were paired with a manipulation check measure. This estimate is much lower than those from other meta-analyses. Hauser and colleagues (2018) reported that 63% of articles in the Attitudes & Social Cognition section of 2016 JPSP included at least one manipulation check. Sigall and Mills (1998) reported that 68% of JPSP articles in 1998 reported an experimental manipulation. The differences in our estimates are likely due to our focus at the manipulation-level, rather than the article-level, which we employed because articles present multiple studies with multiple manipulations and article-level analyses obscure these statistics. We also applied a strict definition of a manipulation check, whereas the authors of these other investigations may have counted any measure that the authors referred to as a ‘manipulation check’. It is also possible that manipulation check prevalence rates have actually decreased in recent years, due to published critiques of manipulation checks (e.g., Fayant et al., 2017 ; Sigall & Mills, 1998 ).

A central issue with manipulation checks is that they intrude upon the experiment, calling participants’ attention and suspicion to the manipulation and subsequently to the construct under study ( Hauser et al. 2018 ). For instance, asking participants how rejected they felt may raise suspicions about the ball-tossing task they were just excluded from. Such effects can be manifold and insidious, causing participants to guess at the experimenters’ hypotheses, heighten their suspicion, change their thoughts or feelings by reflecting upon them, or change the nature of the manipulation itself ( Hauser et al., 2018 ). However, the concerns raised by these critiques are obviated if the manipulation check is administered during the pilot validation of the manipulation and excluded during implementation of the manipulation in hypothesis testing. We therefore recommend that experimenters administer manipulation checks during the pilot validity testing of each manipulation (Recommendation 4A) and post-pilot manipulation checks should only be administered if they do not negatively impact other aspects of the study (Recommendation 4B).

Pilot validity studies may differ substantially from the primary experiments that employ the manipulations that they seek to validate. Indeed, the presence of other manipulations, measures, and environmental factors might lead a manipulation that exhibited evidence of possessing construct validity to no longer exert its ‘established’ effect on the target construct. When such differences occur between pilot validity studies and focal experiments, including a manipulation check in the focal experiment could establish whether these changes have affected the manipulation’s construct validity. If there are legitimate concerns that including a manipulation check could negatively impact the validity of the manipulation, then experimenters could randomly-assign participants to either receive the check or not in order to estimate the effect that the check has on the manipulation’s hypothesized effects (assuming sufficient power to detect such effects).

As with the manipulations themselves, the overwhelming majority of manipulation checks were created ad hoc for the given manipulation. The purported validity evidence provided for the manipulation checks was often simple face validity and in some cases, a Cronbach’s α . Many were single-item self-report measures. These forms of purported validity evidence are insufficient to establish the construct validity of a measure ( Flake et al., 2017 ). Not knowing whether the check captured the latent construct of interest, or instead tapped into some other construct(s), renders any inferences drawn upon such measures theoretically compromised. We therefore recommend that experimenters validate the instruments they use as manipulation checks prior to use in pilot validity testing (Recommendation 4C). Requiring that manipulation checks be validated would entail a large-scale shift in the practices of experimental social psychologists, who would now often find themselves having to preempt new experiments with the task of creating and validating a new state measure. This would require a new emphasis on training in psychometrics, resources devoted to the manipulation check validation process, and rewards given to those who do so.

Meta-analyses revealed that manipulation checks exhibited evidentiary value and a robust meta-analytic effect size. Though these findings are promising indicators that the manipulations employed in these studies exerted true effects that these checks were able to capture, they cannot speak to the underlying construct validity of these manipulation effects. Indeed, just because manipulations are exerting some effect on their manipulation checks, these findings do not tell us whether the intended aspect of the manipulation exerted the observed effect or whether the manipulation checks measured the target construct. Manipulation check effects were also maximally statistically powered, which implies that manipulations are at least well powered enough to influence their intended constructs. As with pilot validity studies, there was no evidence for publication bias.

Only approximately one-tenth of manipulations assessed the extent to which participants were suspicious of the deceptive elements of the study. Though studies vary in the extent to which they are deceptive, almost all experimental manipulations entail some degree of deception in that participants are being influenced without their explicit awareness of the full nature and intent of the manipulation. As such, the majority of studies were unable to estimate the extent to which participants detected their manipulation procedures. Even fewer adequately described how suspicion was assessed, often referring vaguely to an experimenter interview or an open-ended survey question. No specific criteria were given for what delineated ‘suspicious’ from ‘non-suspicious’ participants, and only five studies excluded participants from the former group. Given that no well-validated, standardized suspicion assessment procedures exist and there is little in the way of data on what effect that removing ‘suspicious’ participants from analyses might have on subsequent results ( Blackhart et al., 2012 ), we do not make any recommendations in this domain. Much work is needed to establish the best practices of suspicion assessment and analysis.

Size and Duration of Manipulation Effects

Although many articles established the size of a manipulation’s effect on the manipulation check, no manipulation checks repeatedly assessed any manipulation’s effect in order to estimate the timecourse of these effects. The effect of a given experimental manipulation wanes over time (e.g., Zadro, Boland, & Richardson, 2006 ) and its timecourse is a critical element to determine for several reasons. First, experimenters need to know if the manipulation’s effect is still psychologically active at the time point in which they administer their outcome measures, and its strength at that given timepoint. This would allow experimenters to identify an experimental ‘sweet spot’ when the manipulation’s effect is strongest. Second, for ethical reasons it is crucial to ensure that the manipulation’s effect has adequately decayed by the time the study has ended and participants are returned to the real world. This is especially important when the manipulated process is distressing or interferes with daily functioning ( Miketta & Friese, 2019 ). We therefore recommend that whenever possible, that experimenters estimate the timecourse of their manipulation’s effect by repeatedly administering manipulation checks during pilot validity testing (Recommendation 5).

Estimating the Nomological Shockwave via Discriminant Validity Checks

Across the manipulations we surveyed, construct validity was most often assessed (when it was assessed) by estimating the manipulation’s effect on the construct that the manipulation was primarily intended to affect. However, a requisite of construct validity is discriminant validity, such that the given manipulation influences the target construct and not a different, confounding construct ( Cronbach & Meehl, 1955 ). Absent this practice, ‘successful’ manipulation checks may obscure the possibility that although the manipulation influences the desired construct, it also impacts a related, non-targeted variable to a confounding degree. In this context, discriminant validity can be established by examining the manipulation’s nomological shockwave (i.e., the manipulation’s effect on other constructs that exist in within the target construct’s nomological network). This can be done by administering discriminant validity checks, which are measures of constructs within the target construct’s nomological network. In its simplest form, the nomological shockwave can empirically established by demonstrating that the manipulation’s largest effect is upon the target construct and then exerts progressively weaker and non-overlapping effects on theoretically-related constructs as a function of their proximity to the target construct in the nomological network. We therefore recommend that experimenters administer measures of theoretically related constructs in pilot testing (i.e., discriminant validity checks; Recommendation 6A) and that these are used to estimate the nomological shockwave of the manipulation (Recommendation 6B).

Estimating the nomological shockwave by simply comparing effect sizes and their confidence intervals is admittedly a crude empirical approach. Inherently, the shockwave rests on the assumption that the manipulation exerts a causal effect on the target construct, this target construct then exerts a causal effect on the discriminant validity constructs by virtue of their latent associations. Ideally, causal models could test this sequence of effects, though such quantitative approaches are often limited in their abilities to do so ( Fiedler, Schott, & Meiser, 2011 ). Future research is needed to understand the accuracy and utility of employing causal modeling to estimate nomological shockwaves.

Limitations and Future Directions

This project only examined articles from JPSP and did not include a wider array of publication outlets in social psychology. It may be that our assessment of validation practices would change if we had cast a wider meta-analytic net. Future work should test whether our findings replicate in other journals and in other subfields of psychology. Other experimentally focused fields such as cognitive, developmental, and biological psychology may also vary in their approaches to the validation of their experimental manipulations. Future research is needed in these areas to see if this is the case. We also used subjective codes and definitions of the manipulation features that we coded, allowing for our own biases to have influenced our findings. We have made all of our codes publicly available so that interested parties might review them for such biases and modify the codes according to their own sensibilities and examine their effect on our results. Indeed, we do not see our findings as conclusive but that the coded dataset we have created will be a resource for other investigators to examine in the future.

Experimental manipulations are the methodological foundation of much of social psychology. Our meta-analytic review suggests that the construct validity of such manipulations rests on practices that could be improved. We have made recommendations for how to make such changes, which largely revolve around translating the validation approach taken towards personality questionnaires to experimental manipulations. This new model would entail that validated manipulations are used whenever available and when new manipulations are created, they are validated (i.e., pilot validated) prior to implementation in hypothesis testing. Validity would then be established by demonstrating that the manipulation has its strongest effect on the target construct and theoretically appropriate effects on the nomological network surrounding it. Adopting this model would mean a dramatic change in practices for most laboratories in experimental social psychology. The costs inherent in doing so should be counteracted by a rise in replicability and veridicality of the field’s findings. We hope that our assessment of the field’s practices is an important initial step in that direction.

Research reported in this publication was supported by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) of the National Institutes of Health under award number K01AA026647 (PI: Chester).

Below are pieces of information that should be included for research using experimental manipulations in psychology. If you don’t see them mentioned, consider requesting that the authors ensure that this information is explicitly stated in the manuscript.

  • The number of manipulations in each study.
  • The number of conditions in each manipulation.
  • The definition of the construct that each manipulation was intended to affect.
  • Whether each manipulation was administered between- or within-participants.
  • Whether random assignment (for between-participants designs) or counterbalancing (for within-participants designs) were used in each manipulation.
  • How random assignment or counterbalancing was conducted in each manipulation.
  • Whether each manipulation was acquired from previous research or newly-created for the study.
  • The pre-existing validity evidence for each manipulation that was acquired from previous research.
  • Whether each manipulation that was acquired from previous research was modified from the version of the manipulation detailed in the previous research.
  • The validity evidence for each manipulation that was modified from previous research.
  • Whether each manipulation was pilot tested prior to implementation.
  • The validity evidence for each measure employed in each pilot study.
  • The pilot validity evidence for each manipulation that was pilot tested.
  • The detailed Methods and Results of each pilot study.
  • Whether each manipulation was paired with a manipulation check that quantified the manipulation’s target construct.
  • The validity evidence for each manipulation check.
  • Whether each manipulation was paired with a discriminant validity check that quantified potentially confounding constructs.
  • The validity evidence for each discriminant validity check.
  • Whether deception-by-omission was used for each manipulation (i.e., facts about the manipulation were withheld from participants).
  • Whether deception-by-commission was used for each manipulation (i.e., untrue information about the manipulation was provided to participants).
  • Whether each deceptive manipulation was paired with a suspicion probe.
  • The methodological details of each suspicion probe.
  • The validity evidence for each suspicion probe.
  • How each suspicion probe was scored.
  • How participants were deemed to be suspicious or not for each suspicion probe.
  • How suspicious participants were handled (e.g., excluded from analysis, suspicion used as a covariate) in each manipulation study.
How the Experimental Method Works in Psychology

sturti/Getty Images

The Experimental Process

Types of experiments, potential pitfalls of the experimental method.

The experimental method is a type of research procedure that involves manipulating variables to determine if there is a cause-and-effect relationship. The results obtained through the experimental method are useful but do not prove with 100% certainty that a singular cause always creates a specific effect. Instead, they show the probability that a cause will or will not lead to a particular effect.

What Is the Experimental Method in Psychology?

The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis.

For example, researchers may want to learn how different visual patterns may impact our perception. Or they might wonder whether certain actions can improve memory . Experiments are conducted on many behavioral topics, including:

The scientific method forms the basis of the experimental method. This is a process used to determine the relationship between two variables—in this case, to explain human behavior .

Positivism is also important in the experimental method. It refers to factual knowledge that is obtained through observation, which is considered to be trustworthy.

When using the experimental method, researchers first identify and define key variables. Then they formulate a hypothesis, manipulate the variables, and collect data on the results. Unrelated or irrelevant variables are carefully controlled to minimize the potential impact on the experiment outcome.

History of the Experimental Method

The idea of using experiments to better understand human psychology began toward the end of the nineteenth century. Wilhelm Wundt established the first formal laboratory in 1879.

Wundt is often called the father of experimental psychology. He believed that experiments could help explain how psychology works, and used this approach to study consciousness .

Wundt coined the term "physiological psychology." This is a hybrid of physiology and psychology, or how the body affects the brain.

Other early contributors to the development and evolution of experimental psychology as we know it today include:

  • Gustav Fechner (1801-1887), who helped develop procedures for measuring sensations according to the size of the stimulus
  • Hermann von Helmholtz (1821-1894), who analyzed philosophical assumptions through research in an attempt to arrive at scientific conclusions
  • Franz Brentano (1838-1917), who called for a combination of first-person and third-person research methods when studying psychology
  • Georg Elias Müller (1850-1934), who performed an early experiment on attitude which involved the sensory discrimination of weights and revealed how anticipation can affect this discrimination

To understand how the experimental method works, it is important to know some key terms.

Dependent Variable

The dependent variable is the effect that the experimenter is measuring. If a researcher was investigating how sleep influences test scores, for example, the test scores would be the dependent variable.

Independent Variable

The independent variable is the variable that the experimenter manipulates. In the previous example, the amount of sleep an individual gets would be the independent variable.

A hypothesis is a tentative statement or a guess about the possible relationship between two or more variables. In looking at how sleep influences test scores, the researcher might hypothesize that people who get more sleep will perform better on a math test the following day. The purpose of the experiment, then, is to either support or reject this hypothesis.

Operational definitions are necessary when performing an experiment. When we say that something is an independent or dependent variable, we must have a very clear and specific definition of the meaning and scope of that variable.

Extraneous Variables

Extraneous variables are other variables that may also affect the outcome of an experiment. Types of extraneous variables include participant variables, situational variables, demand characteristics, and experimenter effects. In some cases, researchers can take steps to control for extraneous variables.

Demand Characteristics

Demand characteristics are subtle hints that indicate what an experimenter is hoping to find in a psychology experiment. This can sometimes cause participants to alter their behavior, which can affect the results of the experiment.

Intervening Variables

Intervening variables are factors that can affect the relationship between two other variables. 

Confounding Variables

Confounding variables are variables that can affect the dependent variable, but that experimenters cannot control for. Confounding variables can make it difficult to determine if the effect was due to changes in the independent variable or if the confounding variable may have played a role.

Psychologists, like other scientists, use the scientific method when conducting an experiment. The scientific method is a set of procedures and principles that guide how scientists develop research questions, collect data, and come to conclusions.

The five basic steps of the experimental process are:

  • Identifying a problem to study
  • Devising the research protocol
  • Conducting the experiment
  • Analyzing the data collected
  • Sharing the findings (usually in writing or via presentation)

Most psychology students are expected to use the experimental method at some point in their academic careers. Learning how to conduct an experiment is important to understanding how psychologists prove and disprove theories in this field.

There are a few different types of experiments that researchers might use when studying psychology. Each has pros and cons depending on the participants being studied, the hypothesis, and the resources available to conduct the research.

Lab Experiments

Lab experiments are common in psychology because they allow experimenters more control over the variables. These experiments can also be easier for other researchers to replicate. The drawback of this research type is that what takes place in a lab is not always what takes place in the real world.

Field Experiments

Sometimes researchers opt to conduct their experiments in the field. For example, a social psychologist interested in researching prosocial behavior might have a person pretend to faint and observe how long it takes onlookers to respond.

This type of experiment can be a great way to see behavioral responses in realistic settings. But it is more difficult for researchers to control the many variables existing in these settings that could potentially influence the experiment's results.


While lab experiments are known as true experiments, researchers can also utilize a quasi-experiment. Quasi-experiments are often referred to as natural experiments because the researchers do not have true control over the independent variable.

A researcher looking at personality differences and birth order, for example, is not able to manipulate the independent variable in the situation (personality traits). Participants also cannot be randomly assigned because they naturally fall into pre-existing groups based on their birth order.

So why would a researcher use a quasi-experiment? This is a good choice in situations where scientists are interested in studying phenomena in natural, real-world settings. It's also beneficial if there are limits on research funds or time.

Field experiments can be either quasi-experiments or true experiments.

Examples of the Experimental Method in Use

The experimental method can provide insight into human thoughts and behaviors, Researchers use experiments to study many aspects of psychology.

A 2019 study investigated whether splitting attention between electronic devices and classroom lectures had an effect on college students' learning abilities. It found that dividing attention between these two mediums did not affect lecture comprehension. However, it did impact long-term retention of the lecture information, which affected students' exam performance.

An experiment used participants' eye movements and electroencephalogram (EEG) data to better understand cognitive processing differences between experts and novices. It found that experts had higher power in their theta brain waves than novices, suggesting that they also had a higher cognitive load.

A study looked at whether chatting online with a computer via a chatbot changed the positive effects of emotional disclosure often received when talking with an actual human. It found that the effects were the same in both cases.

One experimental study evaluated whether exercise timing impacts information recall. It found that engaging in exercise prior to performing a memory task helped improve participants' short-term memory abilities.

Sometimes researchers use the experimental method to get a bigger-picture view of psychological behaviors and impacts. For example, one 2018 study examined several lab experiments to learn more about the impact of various environmental factors on building occupant perceptions.

A 2020 study set out to determine the role that sensation-seeking plays in political violence. This research found that sensation-seeking individuals have a higher propensity for engaging in political violence. It also found that providing access to a more peaceful, yet still exciting political group helps reduce this effect.

While the experimental method can be a valuable tool for learning more about psychology and its impacts, it also comes with a few pitfalls.

Experiments may produce artificial results, which are difficult to apply to real-world situations. Similarly, researcher bias can impact the data collected. Results may not be able to be reproduced, meaning the results have low reliability .

Since humans are unpredictable and their behavior can be subjective, it can be hard to measure responses in an experiment. In addition, political pressure may alter the results. The subjects may not be a good representation of the population, or groups used may not be comparable.

And finally, since researchers are human too, results may be degraded due to human error.

Every psychological research method has its pros and cons. The experimental method can help establish cause and effect, and it's also beneficial when research funds are limited or time is of the essence.

At the same time, it's essential to be aware of this method's pitfalls, such as how biases can affect the results or the potential for low reliability. Keeping these in mind can help you review and assess research studies more accurately, giving you a better idea of whether the results can be trusted or have limitations.

Unlike a descriptive study, an experiment is a study in which a treatment, procedure, or program is intentionally introduced and a result or outcome is observed. The American Heritage Dictionary of the English Language defines an experiment as “A test under controlled conditions that is made to demonstrate a known truth, to examine the validity of a hypothesis, or to determine the efficacy of something previously untried.”

True experiments have four elements: , , , and . The most important of these elements are manipulation and control. Manipulation means that something is purposefully changed by the researcher in the environment. Control is used to prevent outside factors from influencing the study outcome. When something is manipulated and controlled and then the outcome happens, it makes us more confident that the manipulation “caused” the outcome. In addition, experiments involve highly controlled and procedures in an effort to minimize and which also increases our confidence that the manipulation “caused” the outcome.

Another key element of a true experiment is random assignment. Random assignment means that if there are groups or treatments in the experiment, participants are assigned to these groups or treatments, or randomly (like the flip of a coin). This means that no matter who the participant is, he/she has an equal chance of getting into all of the groups or treatments in an experiment. This process helps to ensure that the groups or treatments are similar at the beginning of the study so that there is more confidence that the manipulation (group or treatment) “caused” the outcome. More information about random assignment may be found in section




Experimental Studies and Observational Studies

define manipulative experiment

Experimental studies: Experiments, Randomized controlled trials (RCTs) ; Observational studies: Non-experimental studies, Non-manipulation studies, Naturalistic studies


The experimental study is a powerful methodology for testing causal relations between one or more explanatory variables (i.e., independent variables) and one or more outcome variables (i.e., dependent variable). In order to accomplish this goal, experiments have to meet three basic criteria: (a) experimental manipulation (variation) of the independent variable(s), (b) randomization – the participants are randomly assigned to one of the experimental conditions, and (c) experimental control for the effect of third variables by eliminating them or keeping them constant.

In observational studies, investigators observe or assess individuals without manipulation or intervention. Observational studies are used for assessing the mean levels, the natural variation, and the structure of variables, as well as...

Atalay K, Barrett GF (2015) The impact of age pension eligibility age on retirement and program dependence: evidence from an Australian experiment. Rev Econ Stat 97:71–87.

Article   Google Scholar  

Bergeman L, Boker SM (eds) (2016) Methodological issues in aging research. Psychology Press, Hove

Google Scholar  

Byrkes CR, Bielak AMA (under review) Evaluation of publication bias and statistical power in gerontological psychology. Manuscript submitted for publication

Campbell DT, Stanley JC (1966) Experimental and quasi-experimental designs for research. Rand-McNally, Chicago

Carpenter D (2010) Reputation and power: organizational image and pharmaceutical regulation at the FDA. Princeton University Press, Princeton

Cavanaugh JC, Blanchard-Fields F (2019) Adult development and aging, 8th edn. Cengage, Boston

Fölster M, Hess U, Hühnel I et al (2015) Age-related response bias in the decoding of sad facial expressions. Behav Sci 5:443–460.

Freund AM, Isaacowitz DM (2013) Beyond age comparisons: a plea for the use of a modified Brunswikian approach to experimental designs in the study of adult development and aging. Hum Dev 56:351–371.

Haslam C, Morton TA, Haslam A et al (2012) "When the age is in, the wit is out": age-related self-categorization and deficit expectations reduce performance on clinical tests used in dementia assessment. Psychol Aging 27:778–784.

Institute for Social Research (2018) The health and retirement study. Aging in the 21st century: Challenges and opportunities for americans. Survey Research Center, University of Michigan

Jung J (1971) The experimenter's dilemma. Harper & Row, New York

Leary MR (2001) Introduction to behavioral research methods, 3rd edn. Allyn & Bacon, Boston

Lindenberger U, Scherer H, Baltes PB (2001) The strong connection between sensory and cognitive performance in old age: not due to sensory acuity reductions operating during cognitive assessment. Psychol Aging 16:196–205.

Löckenhoff CE, Carstensen LL (2004) Socioemotional selectivity theory, aging, and health: the increasingly delicate balance between regulating emotions and making tough choices. J Pers 72:1395–1424.

Maxwell SE (2015) Is psychology suffering from a replication crisis? What does "failure to replicate" really mean? Am Psychol 70:487–498.

Menard S (2002) Longitudinal research (2nd ed.). Sage, Thousand Oaks, CA

Mitchell SJ, Scheibye-Knudsen M, Longo DL et al (2015) Animal models of aging research: implications for human aging and age-related diseases. Ann Rev Anim Biosci 3:283–303.

Moher D (1998) CONSORT: an evolving tool to help improve the quality of reports of randomized controlled trials. JAMA 279:1489–1491.

Oxford Centre for Evidence-Based Medicine (2011) OCEBM levels of evidence working group. The Oxford Levels of Evidence 2. Available at: . Retrieved 2018-12-12

Patten ML, Newhart M (2018) Understanding research methods: an overview of the essentials, 10th edn. Routledge, New York

Piccinin AM, Muniz G, Sparks C et al (2011) An evaluation of analytical approaches for understanding change in cognition in the context of aging and health. J Geront 66B(S1):i36–i49.

Pinquart M, Silbereisen RK (2006) Socioemotional selectivity in cancer patients. Psychol Aging 21:419–423.

Redman LM, Ravussin E (2011) Caloric restriction in humans: impact on physiological, psychological, and behavioral outcomes. Antioxid Redox Signal 14:275–287.

Rutter M (2007) Proceeding from observed correlation to causal inference: the use of natural experiments. Perspect Psychol Sci 2:377–395.

Schaie W, Caskle CI (2005) Methodological issues in aging research. In: Teti D (ed) Handbook of research methods in developmental science. Blackwell, Malden, pp 21–39

Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston

Sonnega A, Faul JD, Ofstedal MB et al (2014) Cohort profile: the health and retirement study (HRS). Int J Epidemiol 43:576–585.

Weil J (2017) Research design in aging and social gerontology: quantitative, qualitative, and mixed methods. Routledge, New York

Share This Book

  1. PPT

    define manipulative experiment

  2. PPT

    define manipulative experiment

  3. SES DK 024

    define manipulative experiment

  4. Manipulative field experiments mimicking N deposition: (a) conventional

    define manipulative experiment

  5. PPT

    define manipulative experiment

  6. PPT

    define manipulative experiment


