Frequently asked questions

How do i prevent confounding variables from interfering with my research.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

Frequently asked questions: Methodology

Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.

Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .

Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.

Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.

Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.

A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”

To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

  • Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related

Content validity shows you how accurately a test or other measurement method taps  into the various aspects of the specific construct you are researching.

In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.

The higher the content validity, the more accurate the measurement of the construct.

If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.

Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.

Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .

This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .

Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.

Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .

Snowball sampling is best used in the following cases:

  • If there is no sampling frame available (e.g., people with a rare disease)
  • If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
  • If the research focuses on a sensitive topic (e.g., extramarital affairs)

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Reproducibility and replicability are related terms.

  • Reproducing research entails reanalyzing the existing data in the same manner.
  • Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data . 
  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.

The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).

Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.

A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.

The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.

Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.

On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.

Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.

However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.

In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.

A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.

Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.

Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.

Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.

You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity ,  because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.

There are two subtypes of construct validity.

  • Convergent validity : The extent to which your measure corresponds to measures of related constructs
  • Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs

Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.

The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.

Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.

You can think of naturalistic observation as “people watching” with a purpose.

A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.

In statistics, dependent variables are also called:

  • Response variables (they respond to a change in another variable)
  • Outcome variables (they represent the outcome you want to measure)
  • Left-hand-side variables (they appear on the left-hand side of a regression equation)

An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.

Independent variables are also called:

  • Explanatory variables (they explain an event or outcome)
  • Predictor variables (they can be used to predict the value of a dependent variable)
  • Right-hand-side variables (they appear on the right-hand side of a regression equation).

As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.

Overall, your focus group questions should be:

  • Open-ended and flexible
  • Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
  • Unambiguous, getting straight to the point while still stimulating discussion
  • Unbiased and neutral

A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when: 

  • You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
  • You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
  • Your research question depends on strong parity between participants, with environmental conditions held constant.

More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.

This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.

The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.

There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.

A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:

  • You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
  • Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.

An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.

Unstructured interviews are best used when:

  • You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
  • Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
  • You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
  • Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.

The four most common types of interviews are:

  • Structured interviews : The questions are predetermined in both topic and order. 
  • Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
  • Unstructured interviews : None of the questions are predetermined.
  • Focus group interviews : The questions are presented to a group instead of one individual.

Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .

In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.

Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.

Deductive reasoning is also called deductive logic.

There are many different types of inductive reasoning that people use formally or informally.

Here are a few common types:

  • Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
  • Statistical generalization: You use specific numbers about samples to make statements about populations.
  • Causal reasoning: You make cause-and-effect links between different things.
  • Sign reasoning: You make a conclusion about a correlational relationship between different things.
  • Analogical reasoning: You make a conclusion about something based on its similarities to something else.

Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.

Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.

In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.

Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.

Inductive reasoning is also called inductive logic or bottom-up reasoning.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Triangulation can help:

  • Reduce research bias that comes from using a single method, theory, or investigator
  • Enhance validity by approaching the same topic with different tools
  • Establish credibility by giving you a complete picture of the research problem

But triangulation can also pose problems:

  • It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
  • Your results may be inconsistent or even contradictory.

There are four main types of triangulation :

  • Data triangulation : Using data from different times, spaces, and people
  • Investigator triangulation : Involving multiple researchers in collecting or analyzing data
  • Theory triangulation : Using varying theoretical perspectives in your research
  • Methodological triangulation : Using different methodologies to approach the same topic

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure. 

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps: 

  • First, the author submits the manuscript to the editor.
  • Reject the manuscript and send it back to author, or 
  • Send it onward to the selected peer reviewer(s) 
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made. 
  • Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.

You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.

Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.

Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.

Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.

Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.

Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.

For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.

After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.

Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.

These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.

Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.

Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.

Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.

In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.

Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.

These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.

Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .

You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.

You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.

Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.

Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.

Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .

These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.

In multistage sampling , you can use probability or non-probability sampling methods .

For a probability sample, you have to conduct probability sampling at every stage.

You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.

Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.

But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .

These are four of the most common mixed methods designs :

  • Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions. 
  • Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
  • Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
  • Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.

Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.

Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.

In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.

This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

Quantitative research designs can be divided into two main categories:

  • Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
  • Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

The priorities of a research design can vary depending on the field, but you usually have to specify:

  • Your research questions and/or hypotheses
  • Your overall approach (e.g., qualitative or quantitative )
  • The type of design you’re using (e.g., a survey , experiment , or case study )
  • Your sampling methods or criteria for selecting subjects
  • Your data collection methods (e.g., questionnaires , observations)
  • Your data collection procedures (e.g., operationalization , timing and data management)
  • Your data analysis methods (e.g., statistical tests  or thematic analysis )

A research design is a strategy for answering your   research question . It defines your overall approach and determines how you will collect and analyze data.

Questionnaires can be self-administered or researcher-administered.

Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.

Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.

You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.

Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.

Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

Random error  is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .

You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.

Systematic error is generally a bigger problem in research.

With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.

Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.

Random and systematic error are two types of measurement error.

Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).

Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).

On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.

  • If you have quantitative variables , use a scatterplot or a line graph.
  • If your response variable is categorical, use a scatterplot or a line graph.
  • If your explanatory variable is categorical, use a bar graph.

The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.

Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.

The difference between explanatory and response variables is simple:

  • An explanatory variable is the expected cause, and it explains the results.
  • A response variable is the expected effect, and it responds to other variables.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

There are 4 main types of extraneous variables :

  • Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
  • Experimenter effects : unintentional actions by researchers that influence study outcomes.
  • Situational variables : environmental variables that alter participants’ behaviors.
  • Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

  • Only requires small samples
  • Statistically powerful
  • Removes the effects of individual differences on the outcomes

Disadvantages:

  • Internal validity threats reduce the likelihood of establishing a direct relationship between variables
  • Time-related effects, such as growth, can influence the outcomes
  • Carryover effects mean that the specific order of different treatments affect the outcomes

While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .

  • Prevents carryover effects of learning and fatigue.
  • Shorter study duration.
  • Needs larger samples for high power.
  • Uses more resources to recruit participants, administer sessions, cover costs, etc.
  • Individual differences may be an alternative explanation for results.

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.

Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.

Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.

If something is a mediating variable :

  • It’s caused by the independent variable .
  • It influences the dependent variable
  • When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.

A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.

A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.

There are three key steps in systematic sampling :

  • Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
  • Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
  • Choose every k th member of the population as your sample.

Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .

Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.

For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.

You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.

Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.

For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.

In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).

Once divided, each subgroup is randomly sampled using another probability sampling method.

Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.

There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

  • In single-stage sampling , you collect data from every unit within the selected clusters.
  • In double-stage sampling , you select a random sample of units from within the clusters.
  • In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,

If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.

The American Community Survey  is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .

If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.

  • In a single-blind study , only the participants are blinded.
  • In a double-blind study , both participants and experimenters are blinded.
  • In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.

Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .

A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.

However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).

For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).

The process of turning abstract concepts into measurable variables and indicators is called operationalization .

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

There are five common approaches to qualitative research :

  • Grounded theory involves collecting data in order to develop new theories.
  • Ethnography involves immersing yourself in a group or organization to understand its culture.
  • Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
  • Phenomenological research involves investigating phenomena through people’s lived experiences.
  • Action research links theory and practice in several cycles to drive innovative changes.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
  • You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Yes, but including more than one of either type requires multiple research questions .

For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.

You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .

To ensure the internal validity of an experiment , you should only change one independent variable at a time.

No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!

You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .

  • The type of soda – diet or regular – is the independent variable .
  • The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.

Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .

Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.

Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.

A sampling error is the difference between a population parameter and a sample statistic .

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.

The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).

The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.

Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .

Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.

Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.

Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.

The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .

Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.

Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.

Longitudinal study Cross-sectional study
observations Observations at a in time
Observes the multiple times Observes (a “cross-section”) in the population
Follows in participants over time Provides of society at a given point

There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .

Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

  • Abnormal Psychology
  • Assessment (IB)
  • Biological Psychology
  • Cognitive Psychology
  • Criminology
  • Developmental Psychology
  • Extended Essay
  • General Interest
  • Health Psychology
  • Human Relationships
  • IB Psychology
  • IB Psychology HL Extensions
  • Internal Assessment (IB)
  • Love and Marriage
  • Post-Traumatic Stress Disorder
  • Prejudice and Discrimination
  • Qualitative Research Methods
  • Research Methodology
  • Revision and Exam Preparation
  • Social and Cultural Psychology
  • Studies and Theories
  • Teaching Ideas

Confounding Variables

Travis Dixon October 24, 2016 Research Methodology

unconfound the experiment

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to email a link to a friend (Opens in new window)

Sometimes factors other than the IV may influence the DV in an experiment. These unwanted influences are called confounding variables . In laboratory experiments, researchers attempt to minimize their influence by carefully designing their experiment so all conditions are exactly the same – the only thing that’s different is the independent variable.

unconfound the experiment

Our lesson plans and support packs take the stress out of teaching tricky subjects. See more here.

Here are some confounding variables that you need to be looking out for in experiments:

  • Order Effects
  • Participant variability
  • Social desirability effect
  • Hawthorne effect
  • Demand characteristics
  • Evaluation apprehension
  • Lesson Idea: Experimental Designs
  • How to evaluate any study in 3 simple steps

ORDER EFFECTS: In repeated measures experiments one must be careful of order effe cts. Sometimes the order in which a participant does a task may alter the results. For instance, they may get better with practice and this could disrupt the results, or they remember something from the first condition that may alter their results.

Counterbalancing  is one way of controlling for order effects. Counterbalancing is when repeated measures is used but half the group do Condition A then Condition B and the other half do it in the opposite order. Using an independent samples design also controls for order effects.

IB Psych IA Tips: When explaining your Design in the IB Psych IA, try to identify one or more extraneous variables you’re controlling for. 

PARTICIPANT VARIABILITY is the extent to which participants are different and is another potential factor that could influence an experiment’s results. For instance, in a study on the effects of a new training technique on fitness levels the existing fitness of the participants might be quite varied. This can easily be controlled for though either using random allocation or a matched pairs design.

DEMAND CHARACTERISTICS are the cues in a study (characteristics) that may lead the participant to figure out how they’re supposed to act (according to the demands of the researcher/experiment). It leads to participants behaving in a way that they think they’re supposed to, not how they would naturally.

unconfound the experiment

The placebo effect is a type of participant expectancy effect.

PARTICIPANT EXPECTANCY EFFECT is the name given to the change in behaviour as a result of participants behaving in a way that they think they’re expected to. In other words, demand characteristics in an experiment’s design might lead to participant expectancy effect occurring. These terms are commonly used in correctly (Read more:  Demand characteristics: What are they really?)

SOCIAL DESIRABILITY EFFECT is which is when people change their behaviour because they have a nature desire to be liked by other people. Another factor that influences people’s behaviour is when they don’t act like they normally would simply because they are being watched by someone. This was first recorded in a study on the Hawthorne Electrical Plant in the USA and has become known as the HAWTHORNE EFFECT. In the original Hawthorne Plan research they found the workers were working harder simply because they were being watched. Doesn’t this happen in the classroom? Suddenly when the teacher starts walking around the room checking work you close youtube, put away your phone, tuck away the love-letter, etc etc.

The terms “confounding variable” and “extraneous variable” are used interchangeably. Technically speaking, an extraneous variable is any variable that  could  affect the results, whereas “Confounding occurs when the influence of extraneous variables on the DVs cannot be separated and measured,” (Street et al. 1995)

EVALUATION APPREHENSION  might occur when participants are anxious about being evaluated on a particular task or skill (sometimes called the spotlight effect). This might change their behaviour. Think about your oral assignments in some of your subjects, for instance. If you weren’t being graded you might be OK talking in front of your class but as soon as your teacher gets out their big red pen and beings giving you a grade on your work you’re likely to become nervous and this will affect your performance. People are often nervous about being in an “experiment” because the word might conjure many scary thoughts.

unconfound the experiment

Psychologists must balance validity, practicality and ethicality when designing experiments.

Some textbooks also mention maturation – when participants get better on the second or third trial simply because they have practiced the skill (like order effects). Information contamination is another term sometimes used. This is when outside information affects the results of the experiment.

You don’t have to know a confounding variable by name to evaluate an experiment. The following “experiment” has the independent variable of chewing gum. However, there are many flaws in this experiment. These flaws raise issues about the experiment’s validity (it’s really a commercial for gum so it’s heavily biased). What confounding variables and/or methodological limitations can you find in this experiment?

Street, D. L. (1995).  Controlling extraneous variables in experimental research: a research note. Accounting Education, 4(2), 169–188.

Travis Dixon

Travis Dixon is an IB Psychology teacher, author, workshop leader, examiner and IA moderator.

unconfound the experiment

PH717 Module 11 - Confounding and Effect Measure Modification

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  

On This Page sidebar

Three Methods for Minimizing Confounding in the Study Design Phase

Randomization in a clinical trial, strengths of randomization, limitations of randomization to control for confounding, restriction of enrollment, drawbacks of restriction, matching compared groups, advantages of matching, drawbacks of matching.

Learn More sidebar

Confounding is a major problem in epidemiologic research, and it accounts for many of the discrepancies among published studies. Nevertheless, there are ways of minimizing confounding in the design phase of a study, and there are also methods for adjusting for confounding during analysis of a study.

The ideal way to minimize the effects of confounding is to conduct a large randomized clinical trial so that each subject has an equal chance of being assigned to any of the treatment options. If this is done with a sufficiently large number of subjects, other risk factors (i.e., confounding factors) should be equally distributed among the exposure groups. The beauty of this is that even unknown confounding factors will be equally distributed among the comparison groups. If all of these other factors are distributed equally among the groups being compared, they will not distort the association between the treatment being studied and the outcome.

The success of randomization is usually evaluated in one of the first tables in a clinical trial, i.e., a table comparing characteristics of the exposure groups. If the groups have similar distributions of all of the known confounding factors, then randomization was successful. However, if randomization was not successful in producing equal distributions of confounding factors, then methods of adjusting for confounding must be used in the analysis of the data.

  • There is no limit on the number of confounders that can be controlled
  • It controls for both known and unknown confounders
  • If successful, there is no need to "adjust" for confounding
  • It is limited to intervention studies (clinical trials)
  • It may not be completely effective for small trials

Limiting the study to subjects in one category of the confounder is a simple way of ensuring that all participants have the same level of the confounder. For example,

  • If smoking is a confounding factor, one could limit the study population to only non-smokers or only smokers.
  • If sex is a confounding factor, limit the participants to only men or only women
  • If age is a confounding factor, restrict the study to subjects in a specific age category, e.g., persons >65.

Restriction is simple and generally effective, but it has several drawbacks:

  • It can only be used for known confounders and only when the status of potential subjects is known with respect to that variable
  • Residual confounding may occur if restriction is not narrow enough. For example, a study of the association between physical activity and heart disease might be restricted to subjects between the ages of 30-60, but that is a wide age range, and the risk of heart disease still varies widely within that range.
  • Investigators cannot evaluate the effect of the restricted variable, since it doesn't vary
  • Restriction limits the number of potential subjects and may limit sample size
  • If restriction is used, one cannot generalize the findings to those who were excluded.
  • Restriction is particularly cumbersome if used to control for multiple confounding variables.

Another risk factor can only cause confounding if it is distributed differently in the groups being compared. Therefore, another method of preventing confounding is to match the subjects with respect to confounding variables. This method can be used in both cohort studies and in case-control studies in order to enroll a reference group that has artificially been created to have the same distribution of a confounding factor as the index group. For example,

  • In a case-control study of lung cancer where age is a potential confounding factor, match each case with one or more control subjects of similar age. If this is done the age distribution of the comparison groups will be the same, and there will be no confounding by age.
  • In a cohort study on effects of smoking each smoker (the index group) who is enrolled is matched with a non-smoker (reference group) of similar age. Once again, the groups being compared will have the same age distribution, so confounding by age will be prevented
  • Matching is particularly useful when trying to control for complex or difficult to measure confounding variables, e.g., matching by neighborhood to control for confounding by air pollution.
  • It can also be used in case-control studies with few cases when additional control subjects are enrolled to increase statistical power, e.g., 4 to 1 matching of controls to cases.
  • It can only be used for known confounders.
  • It can be difficult, expensive, and time-consuming to find appropriate matches.
  • One cannot evaluate the effect of the matched variable.
  • Matching requires special analytic methods. 

return to top | previous page | next page

Content ©2021. All Rights Reserved. Date last modified: November 11, 2021. Wayne W. LaMorte, MD, PhD, MPH

Confound (Experimental)

  • Reference work entry
  • First Online: 01 January 2020
  • Cite this reference work entry

unconfound the experiment

  • Sven Hilbert 3 , 4 , 5  

15 Accesses

Confounding factor ; Confounding variable

An (experimental) confound is a factor affecting both the dependent and the independent variables systematically, thus being responsible for (at least part of) their statistical relationship.

Introduction

In quantitative psychological investigations, a researcher tries to discover statistical relationships between variables. This relationship is commonly quantified in terms of covariation in a statistical model. It is impossible to include all variables in the model, so any relationship revealed by the model may be caused or influenced by a variable that is not considered in the model. This variable responsible for the spurious relationship is called “confound.”

The Role of Confounds in (Psychological) Research

An empirical researcher conducting an investigation typically analyzes the relationship between dependent and independent variables using statistical models. These models have to be formulated including all variables...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

Department of Psychology, Psychological Methods and Assessment, Münich, Germany

Sven Hilbert

Faculty of Psychology, Educational Science, and Sport Science, University of Regensburg, Regensburg, Germany

Psychological Methods and Assessment, LMU Munich, Munich, Germany

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sven Hilbert .

Editor information

Editors and affiliations.

Oakland University, Rochester, MI, USA

Virgil Zeigler-Hill

Todd K. Shackelford

Section Editor information

Humboldt University, Germany, Berlin, Germany

Matthias Ziegler

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this entry

Cite this entry.

Hilbert, S. (2020). Confound (Experimental). In: Zeigler-Hill, V., Shackelford, T.K. (eds) Encyclopedia of Personality and Individual Differences. Springer, Cham. https://doi.org/10.1007/978-3-319-24612-3_1286

Download citation

DOI : https://doi.org/10.1007/978-3-319-24612-3_1286

Published : 22 April 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-24610-9

Online ISBN : 978-3-319-24612-3

eBook Packages : Behavioral Science and Psychology Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

MeasuringU Logo

Confounded Experimental Designs, Part 1: Incomplete Factorial Designs

unconfound the experiment

Earlier we wrote about different kinds of variables . In short, dependent variables are what you get (outcomes), independent variables are what you set, and extraneous variables are what you can’t forget (to account for).

When you measure a user experience using metrics—for example, the SUPR-Q, SUS, SEQ, or completion rate—and conclude that one website or product design is good, how do you know it’s really the design that is good and not something else? While it could be due to the design, it could also be that extraneous (or nuisance) variables, such as prior experiences, brand attitudes, and recruiting practices, are  confounding your findings.

A critical skill when reviewing UX research findings and published research is the ability to identify when the experimental design is confounded .

Confounding can happen when there are variables in play that the design does not control and can also happen when there is insufficient control of an independent variable.

There are numerous strategies for dealing with confounding that are outside the scope of this article. In fact, it’s a topic that covers several years of graduate work in disciplines such as experimental psychology.

Our goal in this first of a series of articles is to show how to identify a specific type of confounded design in published experiments and demonstrate how their data can be reinterpreted once you’ve identified the confounding.

Incomplete Factorial Designs

One of the great scientific innovations in the early 20 th century was the development of the analysis of variance (ANOVA) and its use in analyzing factorial designs . A full factorial design is one that includes multiple independent variables (factors), with experimental conditions set up to obtain measurements under each combination of levels of factors. This approach allows experimenters to estimate the significance of each factor individually (main effects) and see how the different levels of the factors might behave differently in combination (interactions). This is all great when the factorial design is complete, but when it’s incomplete, it becomes impossible to untangle potential interactions among the factors.

For example, imagine an experiment in which participants sort cards and there are two independent variables—the size of the cards (small and large) and the size of the print on the cards (small and large). This is the simplest full factorial experiment, having two independent variables (card size and print size), each with two levels (small and large). For this 2×2 factorial experiment, there are four experimental conditions:

  • Large cards, large print
  • Large cards, small print
  • Small cards, large print
  • Small cards, small print

The graph below shows hypothetical results for this imaginary experiment. There is an interaction such that the combination of large cards and large print led to a faster sort time (45 s), but all the other conditions have the same sort time (60 s).

unconfound the experiment

But what if for some reason the experimenter had not collected data for the small card/small print condition? If you averaged across card size, you’d get the same average as you would collapsing the data over print size, which would be (60+45)/2 = 52.5. An experimenter focused on the effect of print size might claim that the data show a benefit to larger prints, but the counterargument would be that the effect is due to card size instead. With this incomplete design, you couldn’t say with certainty whether the benefit in the large card/large print condition was due to card size, print size, or that specific combination.

Moving from hypothetical to published experiments, we first show confounding in a famous psychological study, then in a somewhat less famous but influential human factors study, and finally in UX measurement research.

Harry Harlow’s Monkeys and Surrogate Mothers

In the late 1950s and early 1960s, psychologist Harry Harlow conducted a series of studies with infant rhesus monkeys, most of which would be considered unethical by modern standards. In his most famous study, infant monkeys were removed from their mothers and given access to two surrogate mothers, one made of terry cloth (providing tactile comfort but no food) and one made of wire with a milk bottle (providing food but no tactile comfort). The key finding was that the infant monkeys preferred to spend more time close to the terry cloth mother, using the wire mother only to feed. The image below shows both mothers.

unconfound the experiment

Image from Wikipedia.

In addition to the manipulation of comfort and food, there was also a clear manipulation of the surrogate mothers’ faces. The terry cloth mother’s face was rounded and had ears, nose, big eyes, and a smile. The wire mother’s face was square and devoid of potentially friendly features. With this lack of control, it’s possible that the infants’ preference for the terry cloth mother might have been due to just tactile comfort, just the friendly face, or a combination of the two. In addition to ethical issues associated with traumatizing infant monkeys, the experiment was deeply confounded.

Split Versus Standard Keyboards

Typing keyboards have been around for over 100 years, and there has been a lot of research on their design —different types of keys, different key layouts, and from the 1960s through the 1990s, different keyboard configurations. Specifically, researchers conducted studies of different types of split keyboards intended to make typing more comfortable and efficient by allowing a more natural wrist posture. The first design of a split keyboard was the Klockenberg keyboard, described in his 1926 book .

One of the most influential papers promoting split keyboards was “ Studies on Ergonomically Designed Alphanumeric Keyboards ” by Nakaseko et al., published in 1985 in the journal Human Factors. In that study, they described an experiment in which participants used three different keyboards—a split keyboard with a large wrist rest (see the figure below), a split keyboard with a small wrist rest, and a standard keyboard with a large wrist rest. They did not provide a rationale for failing to include a standard keyboard with a small wrist rest, and this omission made their experiment an incomplete factorial.

unconfound the experiment

Image from Lewis et al. (1997) “ Keys and Keyboards .”

They had participants rank the keyboards by preference, with the following results:

RankSplit with Large RestSplit with Small RestStandard with Large Rest
11679
261311
391111

The researchers’ primary conclusion was “After the typing tasks, about two-thirds of the subjects asserted that they preferred the split keyboard models.” This is true because 23/32 participants’ first choice was a split keyboard condition. What they failed to note was that 25/32 participants’ first choice was a keyboard condition that included a large wrist rest. If they had collected data for with a standard keyboard and small wrist rest, it would have been possible to untangle the potential interaction—but they didn’t.

Effects of Verbal Labeling and Branching in Surveys

In recent articles, we explored the effect of verbal labeling of rating scale response options; specifically, whether partial or full labeling affects the magnitude of responses, first in a literature review , and then in a designed experiment .

One of the papers in our literature review was Krosnick and Berent (1993) [pdf]. They reported the results of a series of political science studies investigating the effects of full versus partial labeling of response options and branching. In the Branching condition, questions were split into two parts, with the first part capturing the direction of the response (e.g., “Are you a Republican, Democrat, or independent?”) and the second capturing the intensity (e.g., “How strong or weak is your party affiliation?”). In the Nonbranching condition, both direction and intensity were captured in one question. The key takeaway from their abstract was, “We report eight experiments … demonstrating that fully labeled branching measures of party identification and policy attitudes are more reliable than partially labeled nonbranching measures of those attitudes. This difference seems to be attributable to the effects of both verbal labeling and branching.”

If all you read was the abstract, you’d think that full labeling was a better measurement practice than partial labeling. But when you review research, you can’t just read and accept the claims in the abstract. The figure below shows part of Table 1 from Krosnick and Berent (1993). Note that they list only three question formats. If their experimental designs had been full factorials, there would have been four. Missing from the design is the combination of partial labeling and branching. The first four studies also omitted the combination of full labeling with nonbranching, so any “significant” findings in those studies could be due to labeling or branching differences.

unconfound the experiment

Image from Krosnick and Berent (1993) [pdf].

The fifth study at least included the Fully Labeled Nonbranching condition and produced the following results (numbers in cells are the percentage of respondents who gave the same answer on two different administrations of the same survey questions):

FullPartialDiff
Branching68.4%NANA
Nonbranching57.8%58.9%1.1%
Diff10.6%NA

To analyze these results, Krosnick and Berent conducted two tests, one on the differences between Branching and Nonbranching holding Full Labeling constant and the second on the differences between Full and Partial Labeling holding Nonbranching constant. They concluded there was a significant effect of branching but no significant effect of labeling, bringing into question the claim they made in their abstract.

If you really want to understand the effects of labeling and branching on response consistency, the missing cell in the table above is a problem. Consider two possible hypothetical sets of results, one in which the missing cell matches the cell to its left and one in which it matches the cell below.

FullPartialMean
Branching68.4%68.4%0.0%
Nonbranching57.8%58.9%1.1%
Difference10.6%9.5%
FullPartialMean
Branching68.4%58.9%-9.5%
Nonbranching57.8%58.9%1.1%
Difference10.6%0.0%

In the first hypothetical, the conclusion would be that branching is more reliable than nonbranching and labeling doesn’t matter. For the second hypothetical, the conclusion would be that there is an interaction suggesting that full labeling is better than partial, but only for branching questions and not for nonbranching. But without data for the missing cell, you just don’t know!

Summary and Discussion

When reading published research, it’s important to read critically. One aspect of critical reading is to identify whether the design of the reported experiment is confounded in a way that casts doubt on the researchers’ claims.

This is not a trivial issue, and as we’ve shown, influential research has been published that has affected social policy (Harlow’s infant monkeys), product claims (split keyboards), and survey design practices (labeling and branching). But upon close and critical inspection, the experimental designs were flawed by virtue of confounding; specifically, the researchers were drawing conclusions from incomplete factorial experimental designs.

In future articles, we’ll revisit this topic from time to time with analyses of other published experiments we’ve reviewed that, unfortunately, were confounded.

You might also be interested in

Feature image with ChatGPT logomark and tree test structure illustration

Confound It! Or, Why It's Important Not To

research-confusion-500px.jpg

Before you begin any research study — including those on the impact of Quality Matters — you’ll need to be aware of all the components involved. That includes components that you may not have thought about. These components, known as confounding variables, can have a major impact on your study, so it’s important to know what they are and how you can minimize their influence.

What Are Confounding Variables?

Confounding variables are the stowaways in a research study that can result in misleading findings about the relationship between the independent variable (IV), the input in the study, and the dependent variable (DV), the results of the study. Confounding variables are the extra, unaccounted-for variables that can stealthily have a hidden impact on the outcome being explored. The results of any study can easily be distorted due to one or more confounding variables .

An example of a study that reveals confounding variables at work (that may be all too real for many of us!) is one that seeks to find the impact of an increase in activity level (IV) on weight loss (DV). Sounds simple enough, right? But, what about study participants’ sex, age, food consumption, and any medications they take? Might any or all of these variables affect the correlation between activity level and weight loss? These are all confounding variables — and probably not the only ones that would exist in such a study.

In education, there are many studies that investigate the effect of an independent variable — or treatment — on learners. For example, student engagement in an online course (IV) will result in improved learning (DV). Might students’ prior learning, age, experience with online courses, the course content, and numerous other variables skew or cloud any results of the study that might be linked to student engagement?

Confounding variables are frequently present in studies related to Quality Matters. For example, suppose you want to design a study to find evidence that having a QM-Certified course (IV) will result in an increase in student learning (DV) the semester after the course has met Standards. Several confounding variables would be involved, including delivery, student readiness, and, perhaps the biggest one in this example: the condition of the course prior to its meeting Standards. For example, the course may have been originally designed using the QM Rubric and, therefore, did not experience many changes in order to receive QM Certification. If this is the case, an impact on student learning may not be seen, but it would be erroneous to assume that QM Certification does not impact student learning.

What Confounding Variables Obscure the Impact of Quality Matters?

The QM Rubric TM is the validated instrument containing standards of quality course design. It is the bedrock from which quality in online learning can be launched within a quality assurance system at an educational institution. We know from research that online learning is impacted by a number of variables, however, the “ Quality Pie ” illustrates some of the prominent confounding variables categories, including:

  • Pre-condition of the design of a course related to officially meeting QM Standards (course design)
  • Teaching presence  as well as other interactive features that are well described by the Community of Inquiry (course delivery)
  • Breadth and depth of the course content, especially related to learners’ anticipated knowledge of the content (course content)
  • Benign or proactive policies and practices for online learning at the institution (institutional infrastructure)
  • Technologies that enable effective and efficient interactivity between the learner, instructor,  and the institution’s learning technologies (LMS)
  • Dimensions of instructors’ online teaching experience and pedagogical approach, as well as specific online learning training (faculty readiness)
  • Pre-conditions learners bring into a course, such as online experience, educational history (student readiness)

How Can You Guard Against Confounding Variables?

The unfortunate answer in educational research is that you can’t completely guard against confounding variables. But, becoming aware of possible confounding variables related to any study you want to conduct helps. So, how can that be done? Reviewing previous research in peer-reviewed publications on your topic and those similar to yours will inform you about the range of confounding variables to account for in the design of your study. Analysis of related previous research findings will guide you to design a research question that addresses likely confounding variables.

What Else Can Be Done About Confounding Variables?

If you are reading about the results of a research study, having a good grasp of what confounding variables may be present and the fact that they may have had some impact on the dependent variable will give you a contextual, nuanced understanding of the results. A well-done study will address possible confounding variables in the discussion and limitations sections of the write-up.

If you are designing a research study , having a grasp of the possible confounding variables will help you design the study in a way that will address as many confounding variables as possible. Randomization in assigning students to one of two different groups (the control group or the treatment group) can help reduce the impact of confounding variables. But, randomization requires dedication in sample selection and access to a large number of participants so that they, regardless of their assigned group, would experience the same confounding variables.

So, does all of this mean you should throw up your hands since designing a study that will produce valid findings is so challenging? Definitely not! It does mean, however, that you’ll want to keep the possibility of confounding variables in mind as you design studies that collect and use learning data to benchmark your rigorous quality assurance process and achievements.

Want to Learn More and Practice Identifying and Addressing Confounding Variables?

When it comes to research, confounding variables is an important topic. Take time to learn more about them and other key components of a research study by participating in QM’s three-week online workshop, Designing Quality Online Learning Research . Register for the next session today.

  • What is a Confounding Variable?
  • Confounding Variable / Third Variable
  • Bias in Research Studies

close

  • Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

unconfound the experiment

Your purchase has been completed. Your documents are now available to view.

Testing for the Unconfoundedness Assumption Using an Instrumental Assumption

The identification of average causal effects of a treatment in observational studies is typically based either on the unconfoundedness assumption (exogeneity of the treatment) or on the availability of an instrument. When available, instruments may also be used to test for the unconfoundedness assumption. In this paper, we present a set of assumptions on an instrumental variable which allows us to test for the unconfoundedness assumption, although they do not necessarily yield nonparametric identification of an average causal effect. We propose a test for the unconfoundedness assumption based on the instrumental assumptions introduced and give conditions under which the test has power. We perform a simulation study and apply the results to a case study where the interest lies in evaluating the effect of job practice on employment.

1 Introduction

Identification of the causal effect of a treatment T on an outcome Y in observational studies is typically based either on the unconfoundedness assumption (also called selection on observables, exogeneity, ignorability, see, e.g. de Luna and Johansson [ 1 ], Imbens and Wooldridge [ 2 ], Pearl [ 3 ]) or on the availability of an instrument. The unconfoundedness assumption says loosely that all the variables affecting both the treatment T and the outcome Y are observed (we call them covariates) and can be controlled for. An instrument is usually defined as a variable affecting the treatment T , and such that it is related to the outcome Y only through T (and possibly the observed covariates). When available, instruments can be used to identify causal effects in parametric situations. Nonparametric identification is also possible with the help of instruments, and Angrist et al. [ 4 ] develop a theory for the nonparametric identification and estimation of local average causal effects. Abadie [ 5 ] and Frölich [ 6 ] extended these results to the situation where the observed covariates are related to the instrument. Note also that nonparametric identification can be obtained with the related concept of (fuzzy) regression discontinuity designs; see Hahn et al. [ 7 ], Battistin and Retore [ 8 ], Dias et al. [ 9 ] and Lee [ 10 , Sec. 5.5.3]. When a causal effect is identified, a test of the unconfoundedness assumption may be devised by comparing the estimates of the causal effects obtained both under the unconfoundedness assumption and using the instrument (classical Durbin–Wu–Hausman (DWH) test in a parametric setting). This was recently used by Donald et al. [ 11 ] to propose a test of the unconfoundedness assumption in a nonparametric framework.

In this paper, we introduce general instrumental conditions under which it is possible to test for the unconfoundedness or exogeneity assumption. The instrumental assumptions are general and, for instance, they do not necessarily yield identification of a causal effect when the unconfoundedness assumption does not hold. Indeed, to obtain the nonparametric identification of a local average causal effects stronger (and untestable) assumptions must be made on the instrument, see, e.g. Imbens and Angrist [ 12 ], Angrist et al. [ 4 ], Angrist and Fernandez-Val [ 13 ], Donald et al. [ 11 ] and Guo et al. [ 14 ]. In particular, these papers use a monotonicity assumption saying that the instrument must affect the treatment in a monotone fashion, as well as do not allow for unobserved heterogeneity to affect both the instrument and the treatment. Based on our general instrumental conditions we can propose a statistic to test the unconfoundedness assumption. The proposed test is related to the use of two control groups to test the unconfoundedness assumption, an idea previously used, e.g. in Rosenbaum [ 15 ], de Luna and Johansson [ 1 ] and Dias et al. [ 9 ]. Rosenbaum [ 15 ] was probably first to formalize the idea that two control groups provide information on the unconfoundedness assumption and described actual observational studies where different control groups were available. One of our contributions in this context is the introduction of general assumptions under which an observed variable can be used to split an available control group in order to test the unconfoundedness assumption nonparametrically. However, the test statistic we eventually propose does not actually require the split to be done.

In Section 4 , we present a motivating example where Swedish register data are used to study the causal effect of job practice (JP) on employment. We have access to a rich set of background characteristics on unemployed individuals, although the question remains whether the effect of JP on employment is confounded by unobserved heterogeneity. In this study, unemployed have access to JP through their participation into a labor market program. During 1998 there were two such labor market programs available in Sweden offering JP with different probabilities. Because we know that the two programs differ mainly only with respect to their propensity to offer JP, the participation into the two programs may be assumed to affect employment differently only through JP. We, thus, argue that program participation fulfills our instrumental conditions. In contrast with usual instrumental assumptions this allows potential unobserved heterogeneity in the program and JP assignment to be correlated. We apply the introduced test to check whether the estimated effect of JP on employment is biased due to unobservables affecting both JP and employment.

Before treating this motivating example in more details in Section 4 , Section 2 presents the model, introduces instrumental assumptions and develops the theoretical results which then allow us to introduce a test of the unconfoundedness assumption. Section 3 presents a simulation study of the finite sample properties of the proposed test. In particular, one of the designs used illustrates the situation where the monotonicity assumption mentioned above does not hold. The paper is concluded in Section 5 .

2 Theory and method

We use the Neyman–Rubin model [ 16 , 17 ] for causal inference when the interest lies in the causal effect of a binary treatment T , taking values in T = { 0 , 1 } , on an outcome. Let us thus define Y ( t ) , t ∈ T , called potential outcomes. The latter are interpreted as the outcomes resulting from the assignment T = t , t ∈ T , respectively. We then observe Y = T Y ( 1 ) + ( T − 1 ) Y ( 0 ) . Let us also assume that we observe a set of variables which are not affected by the treatment assignment. We will need to distinguish in particular X and Z two vectors of such variables, the latter of dimension one.

For t ∈ T , we consider ( X , Z , T , Y ( t ) ) as a random vector variable with a given joint distribution, from which a random sample is drawn. Population parameters that are often of interest in this context are the average causal effect θ = E ( Y ( 1 ) − Y ( 0 ) ) and the average causal effect on the treated θ t = E ( Y ( 1 ) − Y ( 0 ) | T = 1 ) or on the non-treated θ n t = E ( Y ( 1 ) − Y ( 0 ) | T = 0 ) .

In observational studies, where the treatment assignment T is not randomized, an identifying assumption (e.g. Rosenbaum and Rubin [ 18 ]; Imbens [ 19 ]) for the average causal effect is the following.

( A.1 ) For t ∈ T ,

The common support assumption can be investigated by looking at the data. The unconfoundedness assumption may be considered as realistic in situations where the set of characteristics X is rich enough, and when there is subject-matter theory to support the assumption.

2.2 Instrumental assumptions, test and power

Let us now consider situations where the variable Z takes values in T (if not, it may be made dichotomous using a threshold) and fulfills the following assumption.

( A.2 ) For t ∈ T ,

Assumption (A.2) prohibits (a) a direct effect from Z to Y ( t ) , i.e. an effect not going through T and (b) unobserved variables affecting both Z and Y ( t ) . On the other hand, (A.2) allows unobserved variables to affect both Z and T which is typically prohibited by usual instrumental assumptions [ 4 – 6 ]. Note that when assuming (A.2) in the sequel, Z and Y ( t ) may also be independent conditional on a subset of X , and, e.g. Z may be randomized as discussed after Proposition 1. We also need the following regularity condition.

( A.3 ) If (A.1) and (A.2) hold, then T ╨ Y ( t ) | Z , X , for t ∈ T respectively.

Assumption (A.3) is a regularity condition and is violated only in specific situations, of which Example 1 is typical.

Graph illustrating model (1) in Example 1

Example 1 Let us assume that the vector ( Z ∗ , T ∗ , Y ( 0 ) , U , V ) has joint normal distribution, where U and V are two unobserved covariates and the set of observed covariates X is empty. Assume now that the following model generates the data :

where U , V , ε Z , ε T a n d ε Y are jointly normal and independently distributed. Let Z = I ( Z ∗ > 0 ) and T = I ( T ∗ > 0 ) , where I ( ⋅ ) is the indicator function. Figure 1 gives a graphical representation of the model, where ε Z , ε T a n d ε Y are omitted. We can write the conditional expectations

In Example 1, (A.1) and (A.2) will typically be violated, unless we assume that ξ 1 = − ξ 2 γ , in which case Z ∗ ╨ Y ( 0 ) by joint normality, and thereby Z ╨ Y ( 0 ) and T ╨ Y ( 0 ) . The constrained parametrization ξ 1 = − ξ 2 γ yields thus an example where (A.3) is violated since (A.1) and (A.2) hold while one can check that T ╨ Y ( 0 ) | Z does not necessary hold.

This type of example is called unstable [ 3 , Sec. 2.4] in the sense that (A.1 and A.2) will cease to hold as soon as the parameter values do not fulfill the constraint ξ 1 = − ξ 2 γ . Using directed acyclic graphs, 1 it can be shown that assumption (A.3) holds as soon as the distribution is stable, where, e.g. a distribution P ( ψ ) parametrized with a parameter vector ψ is said stable if no independence can be destroyed by varying the parameter ψ ; see Pearl [ 3 , Sec. 2.4] for a formal general definition. Note here that (A.3) does not imply any parametrized functional form.

Proposition 1 Assume (A.1)–(A.3), then

Proof . By assumption (A.1) and (A.2) hold. Then, for t ∈ T ,

The first implication by assumption (A.3), the two other by the properties of conditional independence relations, see Dawid [ 21 ], Lauritzen [ 22 , Sec. 3.1] and Pearl [ 3 , Sec. 1.1.5]. ■

The conditional independence statement obtained in Proposition 1 is testable from the data when conditioning on T = t (see next section). Finding evidence in the data against (2) yields evidence against the assumptions of the proposition. Thus, evidence against (2) can be interpreted as evidence against the unconfoundedness assumption (A.1) if (A.2) is known to hold from subject-matter considerations – (A.3) being a regularity condition. One application is a random experiment (where Z is a random assignment to a treatment) with restricted compliance T [ 4 , 12 ]. Another example of application is treated in detail in Section 4 . Note that while identification of the causal effect of T on Y may follow from (A.2) with linear models, see, e.g. Pearl [ 3 , p. 248], this is not true in general, and stronger assumptions are needed to obtain nonparametric identification of a causal effect such as, e.g. a local average treatment effect [ 4 – 6 ]. In particular, our result does not rely on two assumptions typically made to obtain such identification; that the instrument must affect the treatment in a monotone fashion and that no unobserved heterogeneity is allowed to affect both the instrument and the treatment.

For a test based on (2) to have power against (A.1) we further need to have that Z and T are dependent conditional on X . This is typically assumed for instrumental variables to be useful for identification. Examples of situations (expressed with directed acyclic graphs; see Footnote 1) for a test that would be based on (2) to have power against (A.1) are given in Figure 2 , panels (a)–(c), while panel (d) shows a case where such a test would not have power. A caveat here is that (2) can be tested only when conditioning on T = t . This has no practical consequence if the test rejects this null hypothesis. On the other hand, in cases where (2) is not rejected for T = t , we have no information on whether it is violated for T = 1 − t . In independent and related work, Guo et al. [ 14 , eqs ( 3 ) and (4)] give an example where (2) holds for T = t although not for T = 1 − t , and yet a specific causal effect is identified without the help of Z when the earlier mentioned monotonicity assumption holds.

Four directed acyclic graphs together with a respective stable joint distribution for the variables included: Only cases (a)–(c) are such that a test based on (2) may have power, i.e. if (A.1) does not hold, e.g. through the introduction of a variable V with arrows pointing toward T and Y ( t ) , then Y ( t ) ╨ Z | T , X would not hold either

Different strategies may be adopted to test two null hypotheses given by Proposition 1, i.e.

Note that for θ t , (A.1)–(A.3) need to hold only for t = 0 and, thus, only H 0 a is to be tested. Similarly, H 0 b is relevant when θ n t is of interest, while both null hypotheses are relevant for θ . In this paper we propose a testing strategy 2 based on the fact that under H 0 a and H 0 b we have δ 0 ( X ) = 0 and δ 1 ( X ) = 0 , for all X , respectively, where

Given a random sample of n individuals indexed by i , i = 1 , … , n , we consider a nonparametric estimator for δ j = E ( δ j ( X i ) ) , j = 0 , 1 ,

where N j k = c a r d ( { i : T i = j , Z i = k } ) , k = 0 , 1 , with c a r d ( A ) denoting the cardinality of the set A , and Y ˆ j i and Y ˜ j i are nonparametric estimators of E ( Y i | T i = j , X i , Z i = 0 ) and E ( Y i | T i = j , X i , Z i = 1 ) , respectively. The two latter estimates may be obtained by nearest neighbor matching, or any other smoothing technique. Since δ 0 = 0 and δ 1 = 0 , respectively, under H 0 a and H 0 b , the test statistics

will then, under the necessary regularity conditions, be asymptotically normally distributed with mean zero and variance one, where s j is the standard error of δ ˆ j , for j = 0 , 1 . For instance, if nearest neighbor matching estimators are used, then the asymptotic theory and in particular s j can be found in Abadie and Imbens [ 23 ]. A subsampling estimator of s j is also available in this case in de Luna et al. [ 24 ]. As noted above, when θ is of interest, then both hypotheses H 0 a and H 0 b are relevant and higher power may be obtained by considering the joint statistic

which is asymptotically χ 2 2 distributed, since C 0 and C 1 are independent.

We should note here that the statistics above are testing conditional mean independence, which is relevant when average causal effects are targeted. Alternatively, one may wish to use tests of conditional independence statements based on all the moments of the underlying distribution [ 25 ], thereby making the methods relevant when quantile or distributional causal effects are of interest.

3 Monte Carlo study

We use a Monte Carlo study to investigate the finite sample properties (empirical size and power) of the test C 0 in (3), where K -nearest neighbor matching is used as nonparametric estimator of Y ˆ i ( 0 ) and Y ˜ i ( 0 ) , together with the Abadie and Imbens [ 23 ] variance estimator. As noted above, in situations where θ is of interest and (A.1)–(A.3) are assumed to hold for t = 0 , 1 instead for only t = 0 , then C could be used instead of C 0 thereby increasing the power of the test. As a benchmark we also implement a parametric DWH test, where we first regress T on X and Z and then add the residuals from this fit as a covariate into the outcome equation for Y . The test for the unconfoundedness assumption is then a Wald test on the parameter for the included residual covariate (see, e.g. Wooldridge [ 26 , Chap. 6], and Rivers and Vuong [ 27 ]). We use a robust covariance matrix [ 28 ].

We consider a data generating process (DGP) which mimics a situation with a randomized assignment to a treatment ( Z ) with non-perfect compliance ( δ 0 = 0 below), where T denotes the actual treatment assignment, as well as more general situations where the effect of Z on T is allowed to be confounded by unobservables. For unit i , let

We let ε Y i , ε Z i , ε T i ( 0 ) , ε T i ( 1 ) , U 0 i , U 1 i and U 2 i be independently distributed as N ( 0 , 0.25 ) . Moreover, we also let X i ∼ N ( 0 , 2 ) , and consider two cases for θ i : θ i = 1 (homogeneous treatment effect) and θ i = 1 + X i (heterogeneous treatment effect). Parameters are varied in the study in order to study the empirical size and power of the test C 0 . Five designs, denoted D.1–D.5, are considered and described in Table 1 . For the situation where we set δ 2 = 8 (Design D.2), the instrumental variable Z is non-monotone, i.e. there exists individuals j for which T j ( Z j = 0 ) = 1 and T j ( Z j = 1 ) = 0 (called defiers), where T j ( Z j = k ) , k ∈ T , are potential treatment values for individual j when switching Z j to (everything else equal) k equal 0 or 1. The proportion of defiers when δ 2 = 8 is 8.4%. Thus, for design D.2 the monotonicity assumption necessary for the nonparametric identification of the local average causal effect is violated [ 4 – 6 ]. Another assumption for identification made in the latter references is that δ 0 × δ 1 = 0 , and, hence, the instrument does not recover identification in designs D.3 and D.5.

Different designs considered with resulting instrumental property for Z and whether nonparametric identification of the (local) average causal effect holds

D.1 Yes
D.2 No
D.3 No
D.4 Yes
D.5 No

The two tests mentioned above – C 0 and DWH – are evaluated in testing the null hypothesis δ 3 = 0 , and empirical size and power of the tests are obtained by letting δ 3 ∈ { 0 , 0.1 , 0.2 , 0.3 , 0.6 , 0.9 , 1.5 , 2 } . K -nearest neighbor matching estimators with K = 1 , 3 , 5 and 7 are used to compute C 0 , and we restrict X to have common support when conditioning on Z = 1 and Z = 0 . We consider sample sizes N = 500, 1,500 and 3,000. In the continuous response cases, DWH should have correct size when θ i = 1 irrespective of whether the instrument is monotone or not, or whether the relation with T is confounded or not. DWH is also expected to have correct size [ 27 ] in the binary response case with homogeneous causal effect ( θ i = 1 ). In contrast, DWH is expected to breakdown in all heterogeneous cases ( θ i = 1 + X i ), since the response model is then misspecified. Up to our knowledge, no nonparametric test has previously been proposed in the literature for situations in Table 1 where an average causal effect is not nonparametrically identified. On the other hand, using C 0 is expected to give correct size and has power in all situations simulated.

3.2 Results

The results from the Monte Carlo simulations are displayed in Figures 3 and 4 . The empirical sizes are also displayed in Table 2 . The nonparametric test C 0 with K = 5 behaves well with all the DGPs considered, with empirical size close to 5% and power increasing with sample size. Results with other values for K can be obtained from the authors. Empirical sizes were comparable for all K values considered, while power increased with K : significantly so from K = 1 to K = 3 and only marginally from K = 5 to 7. Power is further increased when using C instead of C 0 (see Table 3 for design D.1; similar increase was obtained for the other designs) as expected since the former is based on stronger assumptions. On the other hand, the DWH test has too large empirical size in the heterogeneous cases ( θ i = 1 + X i ). In the homogeneous treatment setup ( θ i = 1 ) DWH behaves well with respect to its empirical size. This was expected as noted in the previous section, thereby yielding an interesting benchmark. In such homogeneous cases, the nonparametric test C 0 has similar or better power than DWH, except for Designs D.1, where DWH is based on correctly specified models. For Design D.2 (non-monotone instrument), C 0 has markedly higher power than DWH.

Empirical sizes (nominal size is 5%) obtained with the nonparametric test C 0 ( K = 5 ) and the DWH test for simulated DGPs with a continuous response

D.1 10005.464.554.945.475.145.24
0005.856.157.795.475.165.22
D.2 10085.425.245.215.495.725.65
00899.511001005.505.735.65
D.3 110.205.114.724.875.675.265.27
10.205.666.628.185.675.285.26
D.4 11003.984.815.205.354.835.09
1005.476.097.435.354.855.09
D.5 110.204.094.654.985.475.025.18
10.205.756.998.705.475.035.18

Empirical size (nominal size is 5%) and power obtained with test statistics C 0 and C (both with K = 5 ) for simulated DGP D.1 with θ i = 1 + X i ; sample size 3,000

5.226.799.8314.7835.7255.2375.7281.90
5.257.6114.0023.5158.5981.8491.3097.55

In summary, the results obtained show that the nonparametric test (3) performs well in situations where DWH is consistent. By making fewer assumptions, (3) is also shown to work with non-monotone instruments and instruments whose effect on the treatment is confounded by unobservables, i.e. in situations where a local average causal effect is not identified.

Empirical size ( δ 3 = 0 ) and power for the nonparametric test C 0 and the DW(H) test (based on robust covariance matrix) for Design D.1 (first row) and Design D.2 (second row), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1

Empirical size ( δ 3 = 0 ) and power for the nonparametric test C 0 and the DW(H) test [ 27 ] for Design D.3–D.5 (rows 1–3), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1

4 Effect of JP

We consider a case study where the interest lies in estimating the effect of JP for unemployed on employment status. JP was offered within two separate labor market training (LMT) programs in Sweden during 1998. One program was run by the regular program provider in Sweden; the Swedish National Labor Market Board (AMV). The other program was offered by the Federation of Swedish Industries (Swit). To be eligible to the programs the unemployed individuals had to be at least 20 years of age and enrolled at the public employment service. There was no difference in benefits for the two groups of trainees. The fundamental idea with the Swit program was to increase the contacts between the unemployed individuals and employers by providing JP. In a survey conducted in June 2000 on 1,000 program participants from both programs, 69.5% of the Swit participants and 52% of the AMV participants stated that they obtained access to JP. 3 Except for the idea to provide more contacts with employers the two programs were similar. Both programs tested the individual’s motivation and ability before recruitment by similar selection procedures (see Johansson [ 30 ], for a thorough description of the selection). The types of courses given within the Swit and the AMV programs are displayed in Table 4 . The similarities of the two programs are apparent. Thus, despite differences in procurement between the two organizations (Swit and AMV), there do not seem to be any large differences between the types of LMT courses offered nor with the selection of participants. The fact that the programs distinguish themselves only with respect to JP availability prompts us that the effect of LMT program choice on labor market outcome should differ only through the effect of JP. This suggests that LMT program choice has the property (A.2) of an instrument for JP.

The frequency distribution of the courses within the two programs

) )
Programmer3227
Computer technician3129
Application support1016
IT-pedagogue26
IT-entrepreneur13
Other1715
Missing74

Based on the survey one can see in Table 5 that there is a statistical significant 18.1 percentage points difference in employment six months after leaving the program (the two programs have same average length) when comparing individuals having JP with those without. In the table we have some individual background variables: (i) education, (ii) work handicap (see disabled), (iii) gender (1 if man and 0 if women) and (iv) immigration status (1 if immigrant 0 else). Finally, since the propensity of receiving JP are higher in larger labor markets with also better labor market opportunities we need to control for region of residence in the estimation of an effect of JP. Sweden was divided into four regions: Stockholm, Skåne, Västra Götaland and the rest of the country. Stockholm, Skåne and Västra Götaland are the three regions with the largest population. Note that we have good reasons to assume that the two LMT programs only differ in their JP prospects, thus if the labor market opportunities affect the access to the LMT programs this does not invalidate them being used as an instrument for JP.

Descriptive statistics for outcome employment and background characteristics and how they differ between JP and non-JP individuals

-test
Employment64.946.818.16.82
Compuls. educ.5.17.6−2.5−1.9
Upper sec. educ.67.862.15.72.19
College27.130.3−3.2−1.3
Disabled7.511.5−4.0−2.5
Man62.161.90.20.1
Immigrant5.66.4−0.9−0.7
Stockholm21.427.8−6.5−2.7
Skå ne10.68.32.31.5
Västra Götaland13.816.5−2.6−1.3
Rest of the country54.247.32.72.53
Sample size969528

We can see some average differences between the two samples. Those with JP are (i) less disabled and (ii) less likely to live in Stockholm. The level of education also differs: they have on average more compulsory and upper secondary education but also less college education than those with no JP. Based on these average differences, it is difficult to argue that those with JP have better labor market prospects than those without JP. The single factor suggesting the JP population has better labor opportunities without JP is that they are less likely disabled. In order to further study the selection into JP we used the covariates from the table and estimated a logistic regression model (a propensity score) including merely main effects. The results from this estimation (not displayed) are that individuals who are from Stockholm or Västra Götaland, and disabled, are less likely to receive JP. There is no statistical significant (5% level) differences in education between the two groups for instance. Figure 5 , left panel, displays the propensity scores estimated. The latter gives evidence for the common support assumption in (A.1). In order to investigate the related assumption 0 < Pr ( Z = 1 | X ) < 1 included in (A.2), we also fit the probability of getting into Swit versus AMV with a logistic regression including main effects, and Figure 5 , right panel, also provides evidence for the latter assumption.

Because there are 969 JP (treated) individuals for only 528 non-treated individuals an estimate of the average causal effect of JP on the treated (ACT), θ t , will typically suffer from severe bias due to difficulties in finding matches to the treated. Thus, we estimate instead the average causal effect of JP on the non-treated (ACNT), θ n t . A reasonable assumption is that individuals with higher than average return from JP are the ones who select themselves into JP. This means that ACNT yields a lower bound for ACT, θ t ≥ θ n t .

Assumption (A.1) need only to be fulfilled for t = 1 in order for us to estimate ACNT, i.e. Y ( 1 ) ╨ T | X , where the covariates are displayed in Table 5 . A K = 5 nearest neighbor matching estimator using the minimum Mahalanobis distance between the covariates of Table 5 is used to estimate the parameter θ n t , yielding θ ˆ n t = 12% points, with standard error [ 23 , Theorems 6 and 7] estimated to 5% points. Hence, there is a significant effect from JP.

The distribution (percent) of the estimated probabilities (as function of the covariates) of (not) having JP ( T , left panel) and of getting into the two alternative LMT programs ( Z , right panel)

4.1 Testing the unconfoundedness assumption

We test for the null hypothesis H 0 b using C 1 in (3). Nonparametric estimation is performed with K = 5 nearest neighbor matching on the covariates displayed in Table 5 using the minimum Mahalanobis distance, also for computing the standard deviation s 1 ; see Abadie and Imbens [ 23 ]. The resulting value for test statistic is 1.31. Hence, we cannot reject the unconfoundedness assumption ( p -value of 0.18). We also perform a DWH test by estimating a linear probability model with the discrete covariates displayed in Table 5 , yielding a p -value of 0.09. Thus, given the maintained assumption (A.2), none of the test can reject the null hypothesis, at the 5% level, that the effect of JP on employment is not confounded, although the DWH test by making stronger assumptions has a p -value under 10%.

5 Conclusions

Identification of the causal effect of a treatment on an outcome in observational studies is typically based either on the unconfoundedness assumption or on the availability of an instrument (e.g. Angrist et al. [ 4 ]). In this paper, by introducing general instrumental assumptions we are able to propose an easy to use nonparametric test for the unconfoundedness assumption in situations where the same assumptions do not allow for the nonparametric identification of a causal effect. We illustrate the framework introduced with a study of the effect of JP for unemployed on employment, where we argue that an instrument fulfilling our conditions is available through the existence of two LMT programs with different degree of accessibility to JP.

In many applications, nonparametric identification of causal effects using instruments is non-trivial, e.g. when a non-testable monotonicity property for the instrument must hold [ 4 – 6 ] and/or when a large set of control variables is needed for the instrument to be valid. Using our weaker instrumental conditions, one may test for the unconfoundedness assumption. If the latter is not rejected, this gives some ground to the analyst to proceed using an identification strategy based on the unconfoundedness assumption. We have operationalized the theoretical results with a test statistic based on K -nearest neighbor matching estimators. Other nonparametric regression estimators may be used instead, such as, e.g. local polynomial regression and splines. Finally, it is worth noting here that for durations outcomes with censored data, the test proposed herein may be implemented by making use of the matching estimators for censored duration responses presented in Fredriksson and Johansson [31] and de Luna and Johansson [ 32 ].

Acknowledgments

This paper has benefited from useful comments from Martin Huber, Ingeborg Waernbaum, an editor, an anonymous referee and seminar participants at John Hopkins, Maryland University and the third Joint IZA/IFAU Conference on Labor Market Policy Evaluation. De Luna acknowledges the financial support of the Swedish Research Council through the Swedish Initiative for Research on Microdata in the Social and Medical Sciences (SIMSAM), the Ageing and Living Condition Program and grant 70246501. Johansson acknowledges the financial support of the Swedish Council for Working Life and Social Research (grant 2004–2005).

1. de Luna X , Johansson P . Exogeneity in structural equation models . J Econometrics 2006 ; 132 : 527 – 43 . 10.1016/j.jeconom.2005.02.010 Search in Google Scholar

2. Imbens GW , Wooldridge JM . Recent developments in the econometrics of program evaluation . J Econ Lit 2009 ; 47 : 5 – 86 . 10.1257/jel.47.1.5 Search in Google Scholar

3. Pearl J . Causality , 2nd ed. Cambridge : Cambridge University Press , 2009 . Search in Google Scholar

4. Angrist D , Imbens G , Rubin D . Identification of treatment effects using instrumental variables . J Am Stat Assoc 1996 ; 91 : 444 – 55 . Search in Google Scholar

5. Abadie A . Semiparametric instrumental variable estimation of treatment response models . J Econometrics 2003 ; 113 : 231 – 63 . 10.1016/S0304-4076(02)00201-4 Search in Google Scholar

6. Frölich M . Nonparametric iv estimation of local average treatment effects with covariates . J Econometrics 2007 ; 139 : 35 – 75 . 10.1016/j.jeconom.2006.06.004 Search in Google Scholar

7. Hahn J , Todd P , van der Klaaw W , Todd W , Van der Klaauw P . Identification and estimation of treatment effects with a regression-discontinuity design . Econometrica 2001 ; 69 : 201 – 9 . 10.1111/1468-0262.00183 Search in Google Scholar

8. Battistin E , Retore E . Ineligibles and eligible non-participants as a double comparison group in regression discontinuity designs . J Econometrics 2008 ; 142 : 715 – 30 . 10.1016/j.jeconom.2007.05.006 Search in Google Scholar

9. Dias M , Ichimura H , van den Berg G . The matching method for treatment evaluation with selective participation and ineligibles . IFAU Working Papers, 2008:6, Institute for Labour Market Policy Evaluation, Uppsala, 2008 . 10.1920/wp.cem.2007.3307 Search in Google Scholar

10. Lee M-J . Micro-econometrics for policy, program and treatment effects . Oxford : Oxford University Press , 2005 . 10.1093/0199267693.001.0001 Search in Google Scholar

11. Donald SG , Hsuz Y-C , Lieli RP . Testing the unconfoundedness assumption via inverse probability weighted estimators of (l)att . Working Paper, 2011 . Search in Google Scholar

12. Imbens GW , Angrist JD . Identification and estimation of local average treatment effects . Econometrica 1994 ; 62 : 467 – 75 . 10.2307/2951620 Search in Google Scholar

13. Angrist J , Fernandez-Val I . ExtrapoLATE-ing: external validity and overidentification in the late framework . NBER Working Paper, 16566, National Bureau of Economic Research, Cambridge, MA, 2010 . 10.3386/w16566 Search in Google Scholar

14. Guo Z , Cheng J , Lorch S , Small D . Using an instrumental variable to test for unmeasured confounding . Working Papers, 2013 . Search in Google Scholar

15. Rosenbaum PR . The role of a second control group in an observational study (with discussion) . Stat Sci 1987 ; 2 : 292 – 316 . 10.1214/ss/1177013236 Search in Google Scholar

16. Neyman J . Sur les applications de la théorie des probabilités aux experiences agricoles: essai des principes . Roczniki Nauk Rolniczych X 1923 : 1 – 51 . In Polish, English translation by D. Dabrowska and T. Speed in Stat Sci 1990; 5 : 465 – 72 . Search in Google Scholar

17. Rubin DB . Estimating causal effects of treatments in randomized and nonrandomized studies . J Educ Psychol 1974 ; 66 : 688 – 701 . 10.1037/h0037350 Search in Google Scholar

18. Imbens GW . Nonparametric estimation of average treatment effects under exogeneity: a review . Rev Econ Stat 2004 ; 86 : 4 – 29 . 10.1162/003465304323023651 Search in Google Scholar

19. Rosenbaum PR , Rubin DB . The central role of the propensity score in observational studies for causal effects . Biometrika 1983 ; 70 : 41 – 55 . 10.1093/biomet/70.1.41 Search in Google Scholar

20. de Luna X , Waernbaum I , Richardson T . Covariate selection for the non-parametric estimation of an average treatment effect . Biometrika 2011 ; 98 : 861 – 75 . 10.1093/biomet/asr041 Search in Google Scholar

21. Dawid AP . Conditional independence in statistical theory . J R Stat Soc Ser B 1979 ; 41 : 1 – 31 . Search in Google Scholar

22. Lauritzen S . Graphical models . Oxford : Oxford University Press , 1996 . Search in Google Scholar

23. Abadie A , Imbens GW . Large sample properties of matching estimators for average treatment effects . Econometrica 2006 ; 74 : 235 – 67 . 10.1111/j.1468-0262.2006.00655.x Search in Google Scholar

24. de Luna X , Johansson P , Sjöstedt-de Luna S . Bootstrap inference for k -nearest neighbour matching estimators . IZA Discussion Papers 5361, Institute for the Study of Labor, Bonn, 2010 . 10.2139/ssrn.1723999 Search in Google Scholar

25. Su L , White H . A consistent characteristic function-based test for conditional independence . J Econometrics 2007 ; 141 : 807 – 34 . 10.1016/j.jeconom.2006.11.006 Search in Google Scholar

26. Wooldridge J . Econometric analysis of cross section and panel data . Cambridge : MIT Press , 2002 . Search in Google Scholar

27. Rivers D , Vuong H . Limited information estimators and exogeneity tests for simultaneous probit models . J Econometrics 1988 ; 39 : 347 – 66 . 10.1016/0304-4076(88)90063-2 Search in Google Scholar

28. White H . Maximum likelihood estimation of misspecified models . Econometrica 1982 ; 50 : 1 – 25 . 10.2307/1912526 Search in Google Scholar

29. Johansson P , Martinson S . Det nationella it-programmet – en slutrapport om swit . Forskningsrapporter, 2000:8, Institute for Labour Market Policy Evaluation, Uppsala, 2000 . Search in Google Scholar

30. Johansson P . The importance of employer contacts: evidence based on selection on observables and internal replication . Labour Econ 2008 ; 15 : 350 – 69 . 10.1016/j.labeco.2007.06.010 Search in Google Scholar

31. Fredriksson P , Johansson P . Dynamic treatment assignment – the consequences for evaluations using observational data . J Bus Econ Stat 2008 ; 26 : 435 – 45 . 10.1198/073500108000000033 Search in Google Scholar

32. de Luna X , Johansson P . Non-parametric inference for the effect of a treatment on survival times with application in the health and social sciences . J Stat Plann Inference 2010 ; 140 : 2122 – 37 . 10.1016/j.jspi.2010.02.012 Search in Google Scholar

Directed acyclic graphs, e.g. Figure 1 , together with a stable (also called faithful) distribution for the variables are used to describe conditional independence relations between variables; see Lauritzen [ 22 ] for a general account on graphical models and de Luna et al. [ 20 ] for their use together with potential outcomes.

One related strategy could be to use the concept of two independent control groups [ 15 ]. Under H 0 a we can use Z to obtain two independent control groups (one defined by Z = 1 and one by Z = 0 ) for estimating θ , yielding θ ˆ z = 0 and θ ˆ z = 1 , respectively. Under H 0 a the difference θ ˆ z = 0 − θ ˆ z = 1 has expectation zero and this makes the basis for a test statistic. However, since we need to compute two nonparametric estimators of θ , the resulting statistic has poor finite sample properties, for instance, when the covariates have different support in the two control groups created. This has been confirmed in simulation experiments not presented here.

A detailed description of the survey can be found in Johansson and Martinson [ 29 ]. The survey contained a total of 19 questions. These concerned (i) the individual’s background, (ii) the individual’s labor market training and (iii) the individual’s present labor market situation.

©2014 by De Gruyter

  • X / Twitter

Supplementary Materials

Please login or register with De Gruyter to order this product.

Journal of Causal Inference

Journal and Issue

Articles in the same issue.

unconfound the experiment

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to solve confounding issue in experimental design?

In an experimental design, how to solve confounding issues? Building a regression to control for confounding variables, is it one solution?

  • experiment-design
  • confounding

kjetil b halvorsen's user avatar

The issue you raise is a big one, and there is a huge statistical and scientific literature on experimental design, and methods for dealing with confounding variables. I cannot do justice to this literature in a short answer, but I will try to give you some basics to get you started. Regression analysis allows you to take account of confounding variables that are in the data by including them in the regression analysis. You can obtain inferences about the "effects" of other variables, conditional on these would-be confounders, and this allows you to "filter them out" of your analysis, so that they do not confound your other inferences. So yes, regression analysis is one method of dealing with confounding variables, so long as you can identify the relevant confounding variable, and obtain adequate data on it, to include it in your regression.

However, if this is the path you are inclined to take, there are several issues you will need to consider. If you decide to try to "filter out" confounders using regression analysis, it is important to ensure that the variables you are filtering out adequately capture the actual confounder, and this can be tricky to do. For example, if you think "education" is a confounding variable in some analysis, you will need to decide what operational variables capture "education". It is common to use some crude metric like "highest qualification awarded", but this does not fully (or even closely) capture the broader concept of "education". It is therefore common in these situations for you to encounter a confounder that is difficult to measure adequately.

Another important issue with confounding in experimental design is that there may be a large number of possible confounders, and it is not always possible even to identify what these might be, let alone collect adequate data for them. For this reason, an ideal method of dealing with confounding (in circumstances where it is feasible, ethical, etc.) is to design a randomised controlled trial (RCT) to determine the effects of a "treatment" relative to a "control". It is also possible to use other experimental protocols such as "blinding". Both "randomisation" and "blinding" are experimental protocols that are imposed to try to sever the statistical connection between the variable of interest in your study and any would-be confounding variables. If used properly, these protocols can sever the statistical link between these variables, which allows you to treat statistical inferences about the treatment variable as being causal in nature. What is especially nice (amazing!) about these methods is that they do not even require you to know what the confounding variables are in order to filter them out.

This answer should give you some basic points you need to consider. However, I would stress that experimental design is a large and well-developed field, and it is important to familiarise yourself with the literature on this matter if you are conducting experimental design. I recommend you start by examining experimental protocols like randomisation and blinding, to learn why these work. This will lead you into broader discussions of causal analysis, and you can then start to learn about the interaction between statistical inferences, and inferences about underlying causal structures. This will be a long but rewarding task. Good luck.

Ben's user avatar

  • $\begingroup$ Thank you so much for your answers and explanation. I greatly appreciate it. :) $\endgroup$ –  Amy Commented Dec 6, 2019 at 18:19

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged experiment-design confounding or ask your own question .

  • Featured on Meta
  • User activation: Learnings and opportunities
  • Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...

Hot Network Questions

  • What is a “bearded” oyster?
  • How to react to a rejection based on a single one-line negative review?
  • Establishing Chirality For a 4D Person?
  • If a mount provokes opportunity attacks, can its rider be targeted?
  • Returning to the US for 2 weeks after a short stay around 6 months prior with an ESTA but a poor entry interview - worried about visiting again
  • How can I prove that this expression defines the area of the quadrilateral?
  • What does St Paul mean by ' height or depth' in Romans 8:39?
  • My one-liner 'delete old files' command finds the right files but will not delete them
  • What's the origin and first meanings of the term "grand piano"?
  • What early 60s puppet show similar to fireball XL5 used the phrase "Meson Power?"
  • A function to convert numbers from scientific notation to plain decimal
  • The graph of a continuous function is a topological manifold
  • Which law(s) bans medical exams without a prescription?
  • How to win a teaching award?
  • Are There U.S. Laws or Presidential Actions That Cannot Be Overturned by Successor Presidents?
  • Emergency belt repair
  • Trinitarian Christianity says Jesus was fully God and Fully man. Did Jesus (the man) know this to be the case?
  • How to add Z-axis to analog oscilloscope?
  • Intra Schengen passport check non EU Wizzair 2024
  • Hungarian Immigration wrote a code on my passport
  • Apple IIgs to VGA adapter
  • mp4 not opening in Ubuntu 24.04.1 LTS
  • 3D Chip Design using TikZ
  • Is SQL .bak file compressed without explicitly stating to compress?

unconfound the experiment

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

IMAGES

  1. SOLVED: Confounding Variable: The experience level of the controllers

    unconfound the experiment

  2. Rutherford's atomic model

    unconfound the experiment

  3. Solved Research Methods Confounding Variables Exercise

    unconfound the experiment

  4. Observation Paper

    unconfound the experiment

  5. Solved Assignment 5 Confounding Variables 2. Answer the same

    unconfound the experiment

  6. Drawbacks of Rutherford Atomic Model

    unconfound the experiment

VIDEO

  1. unsuccessful experiment: crossbreeding animals with food

  2. Jump Items!🥰 Gadgets Smart Appliances Kitchen Utensils/Home Cleaning/Beauty Inventions #shorts

  3. REAL✅ or FAKE❌👇🏻 #rahimovmir #experiment #challenge #challenger #funny #vector

  4. 14 Proposals in Freud's 'The Unconscious'

  5. Profound Mind: Journey Through Thought Experiments

  6. Top 5 Most Unethical Experiments Done in History

COMMENTS

  1. Confounding Variables

    Confounding variables are unmeasured factors that influence both the cause and the effect in a research study. Learn how to identify and account for them using methods such as restriction, matching, statistical control and randomization.

  2. Activity 10- Confounding Variables

    "Unconfound" the Experiment: Test both classes at the same itme of day by giving them homework based on what they learned. Confound selection 2: An airport administrator investigated the attention spans of air traffic controllers to determine how many incoming flights the average controller can coordinate at the same time. Each randomly ...

  3. How do I prevent confounding variables from interfering with ...

    Learn four methods to reduce the impact of confounding variables on your research: restriction, matching, statistical control and randomization. Find out the definition and examples of confounding variables and how they affect your study.

  4. Confounding Variables

    Learn about the factors that may influence the results of an experiment, such as participant variability, order effects, demand characteristics, and more. Find out how to control for confounding variables and evaluate any study in 3 simple steps.

  5. Activity Confounding Variables

    Propose a method to "unconfound" the experiment. By having 2 morning classes participate where a random assignment is handed out. An investigator was interested in studying the effect of taking a course in child development upon attitudes toward childrearing. At the end of the semester, the researcher distributed a questionnaire to students ...

  6. Three Methods for Minimizing Confounding in the Study Design Phase

    This web page explains three methods for minimizing confounding in the study design phase: randomization, restriction, and matching. It does not contain any information about design cofound, which is a term related to business or innovation.

  7. 2.7: Confounds, Artifacts and Other Threats to Validity

    Demand effects and reactivity. Placebo effects. Situation, measurement and subpopulation effects. Fraud, deception and self-deception. If we look at the issue of validity in the most general fashion, the two biggest worries that we have are confounds and artifact. These two terms are defined in the following way:

  8. Confound (Experimental)

    A confound variable in a psychological experiment is called an experimental confound. An example of a situation in which a confound is likely to lead to wrong conclusions drawn from an experiment is a working memory training study. When a group of participants of the study train with working memory tasks, they may become better in a subsequent ...

  9. PDF Removing Hidden Confounding by Experimental Grounding

    3 Method. Given data from both the unconfounded and confounded studies, we propose the following recipe for removing the hidden confounding. First, we learn a function ^! over the observational sample fXConf ; TConf; Y Conf gnConf . This can be done using any CATE estimation method such as learning two. i i i i=1.

  10. causality

    $\begingroup$ Well, that's not very general, because the potential outcomes don't have to depend on any measured covariates. They could be purely stochastic given treatment, or depend on variables otherwise irrelevant to the analysis (i.e., unrelated to treatment).

  11. Research Methods: Extraneous and Confounding Variables

    Learn the difference between extraneous and confounding variables in an experiment, and how they can affect the results. Watch a video by Psychology in the Fastlane's Difficult Topics Explained.

  12. Five ways to take confirmation bias out of your experimental results

    Jiangwei and I came up with five tips: Structure an open and transparent research atmosphere where data and experimental design are examined and evaluated by everyone, especially those not working directly on the project. Encourage and carefully consider critical views on the working hypothesis. Ensure that all stakeholders examine the primary ...

  13. Confounded Experimental Designs, Part 1: Incomplete ...

    Learn how to identify and deal with confounding in UX research and measurement. See examples of confounded designs in psychology, human factors, and UX studies and how to reinterpret the data.

  14. Confound It! Or, Why It's Important Not To

    Or, Why It's Important Not To. Before you begin any research study — including those on the impact of Quality Matters — you'll need to be aware of all the components involved. That includes components that you may not have thought about. These components, known as confounding variables, can have a major impact on your study, so it's ...

  15. Testing for the Unconfoundedness Assumption Using an Instrumental

    The identification of average causal effects of a treatment in observational studies is typically based either on the unconfoundedness assumption (exogeneity of the treatment) or on the availability of an instrument. When available, instruments may also be used to test for the unconfoundedness assumption. In this paper, we present a set of assumptions on an instrumental variable which allows ...

  16. How to solve confounding issue in experimental design?

    3. The issue you raise is a big one, and there is a huge statistical and scientific literature on experimental design, and methods for dealing with confounding variables. I cannot do justice to this literature in a short answer, but I will try to give you some basics to get you started. Regression analysis allows you to take account of ...

  17. Solved Confounding Variables: For each selection, identify

    See Answer. Question: Confounding Variables: For each selection, identify the one serious confounding factor that threatens the experiment's validity. Then, suggest how the confounded factor could be controlled. Fred Rogers wanted to test a new sing-along method to teach math to fourth graders (e.g., "I Love to Multiply" to the tune of ...

  18. Solved 1. What are is/are the IV (s)? 2. What are is/are the

    4. Propose a method to "unconfound" the experiment. 3. Tom Rogers wanted to test a new "singalong" method to teach math to fourth graders (e.g., "I love to multiply" to the tune of "God Bless America"). He used the singalong method in his first period class. His sixth period students continued solving math problems with the old method.

  19. Research Methods Exam #2 Flashcards

    1. Identify the independent variable(s) 2. Identify the dependent variable(s) 3. Identify any confounding variable(s) 4.Propose a method to "unconfound" the experiment, An airport administrator investigated the attention spans of air traffic controllers to determine how many incoming flights the average controller can coordinate at the same time.

  20. Identifying Confounding Variables in Experimental Design

    4. Propose a method to "unconfound" the experiment. a. Tom Rogers wanted to test a new "singalong" method to teach math to fourth graders (e.g., "I love to multiply" to the tune of "God Bless America"). He used the singalong method in his first period class. His sixth period students continued solving math problems with the old method.

  21. Confounding Variables: For each selection, identify

    Question: Confounding Variables: For each selection, identify the one serious confounding factor that threatens the experiment's validity. Then, suggest how the confounded factor could be controlled.An airport administrator wanted to determine how many incoming flights the average controller could coordinate at the same time.

  22. [Solved] Handout 3: Confounding Variables Answer the same four

    Pairing the salespeople according to their driving styles and doing the identical experiment with the matched groups are two ways to "unconfound" the experiment. Pick No. 5: enrolling in the child development course is an independent variable (yes vs. no). Attitudes toward raising children are a dependent variable.