Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Hypothesis testing, p values, confidence intervals, and significance.

Jacob Shreffler ; Martin R. Huecker .

Affiliations

Last Update: March 13, 2023 .

  • Definition/Introduction

Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting these findings, which may affect the adequate application of the data.

  • Issues of Concern

Without a foundational understanding of hypothesis testing, p values, confidence intervals, and the difference between statistical and clinical significance, it may affect healthcare providers' ability to make clinical decisions without relying purely on the research investigators deemed level of significance. Therefore, an overview of these concepts is provided to allow medical professionals to use their expertise to determine if results are reported sufficiently and if the study outcomes are clinically appropriate to be applied in healthcare practice.

Hypothesis Testing

Investigators conducting studies need research questions and hypotheses to guide analyses. Starting with broad research questions (RQs), investigators then identify a gap in current clinical practice or research. Any research problem or statement is grounded in a better understanding of relationships between two or more variables. For this article, we will use the following research question example:

Research Question: Is Drug 23 an effective treatment for Disease A?

Research questions do not directly imply specific guesses or predictions; we must formulate research hypotheses. A hypothesis is a predetermined declaration regarding the research question in which the investigator(s) makes a precise, educated guess about a study outcome. This is sometimes called the alternative hypothesis and ultimately allows the researcher to take a stance based on experience or insight from medical literature. An example of a hypothesis is below.

Research Hypothesis: Drug 23 will significantly reduce symptoms associated with Disease A compared to Drug 22.

The null hypothesis states that there is no statistical difference between groups based on the stated research hypothesis.

Researchers should be aware of journal recommendations when considering how to report p values, and manuscripts should remain internally consistent.

Regarding p values, as the number of individuals enrolled in a study (the sample size) increases, the likelihood of finding a statistically significant effect increases. With very large sample sizes, the p-value can be very low significant differences in the reduction of symptoms for Disease A between Drug 23 and Drug 22. The null hypothesis is deemed true until a study presents significant data to support rejecting the null hypothesis. Based on the results, the investigators will either reject the null hypothesis (if they found significant differences or associations) or fail to reject the null hypothesis (they could not provide proof that there were significant differences or associations).

To test a hypothesis, researchers obtain data on a representative sample to determine whether to reject or fail to reject a null hypothesis. In most research studies, it is not feasible to obtain data for an entire population. Using a sampling procedure allows for statistical inference, though this involves a certain possibility of error. [1]  When determining whether to reject or fail to reject the null hypothesis, mistakes can be made: Type I and Type II errors. Though it is impossible to ensure that these errors have not occurred, researchers should limit the possibilities of these faults. [2]

Significance

Significance is a term to describe the substantive importance of medical research. Statistical significance is the likelihood of results due to chance. [3]  Healthcare providers should always delineate statistical significance from clinical significance, a common error when reviewing biomedical research. [4]  When conceptualizing findings reported as either significant or not significant, healthcare providers should not simply accept researchers' results or conclusions without considering the clinical significance. Healthcare professionals should consider the clinical importance of findings and understand both p values and confidence intervals so they do not have to rely on the researchers to determine the level of significance. [5]  One criterion often used to determine statistical significance is the utilization of p values.

P values are used in research to determine whether the sample estimate is significantly different from a hypothesized value. The p-value is the probability that the observed effect within the study would have occurred by chance if, in reality, there was no true effect. Conventionally, data yielding a p<0.05 or p<0.01 is considered statistically significant. While some have debated that the 0.05 level should be lowered, it is still universally practiced. [6]  Hypothesis testing allows us to determine the size of the effect.

An example of findings reported with p values are below:

Statement: Drug 23 reduced patients' symptoms compared to Drug 22. Patients who received Drug 23 (n=100) were 2.1 times less likely than patients who received Drug 22 (n = 100) to experience symptoms of Disease A, p<0.05.

Statement:Individuals who were prescribed Drug 23 experienced fewer symptoms (M = 1.3, SD = 0.7) compared to individuals who were prescribed Drug 22 (M = 5.3, SD = 1.9). This finding was statistically significant, p= 0.02.

For either statement, if the threshold had been set at 0.05, the null hypothesis (that there was no relationship) should be rejected, and we should conclude significant differences. Noticeably, as can be seen in the two statements above, some researchers will report findings with < or > and others will provide an exact p-value (0.000001) but never zero [6] . When examining research, readers should understand how p values are reported. The best practice is to report all p values for all variables within a study design, rather than only providing p values for variables with significant findings. [7]  The inclusion of all p values provides evidence for study validity and limits suspicion for selective reporting/data mining.  

While researchers have historically used p values, experts who find p values problematic encourage the use of confidence intervals. [8] . P-values alone do not allow us to understand the size or the extent of the differences or associations. [3]  In March 2016, the American Statistical Association (ASA) released a statement on p values, noting that scientific decision-making and conclusions should not be based on a fixed p-value threshold (e.g., 0.05). They recommend focusing on the significance of results in the context of study design, quality of measurements, and validity of data. Ultimately, the ASA statement noted that in isolation, a p-value does not provide strong evidence. [9]

When conceptualizing clinical work, healthcare professionals should consider p values with a concurrent appraisal study design validity. For example, a p-value from a double-blinded randomized clinical trial (designed to minimize bias) should be weighted higher than one from a retrospective observational study [7] . The p-value debate has smoldered since the 1950s [10] , and replacement with confidence intervals has been suggested since the 1980s. [11]

Confidence Intervals

A confidence interval provides a range of values within given confidence (e.g., 95%), including the accurate value of the statistical constraint within a targeted population. [12]  Most research uses a 95% CI, but investigators can set any level (e.g., 90% CI, 99% CI). [13]  A CI provides a range with the lower bound and upper bound limits of a difference or association that would be plausible for a population. [14]  Therefore, a CI of 95% indicates that if a study were to be carried out 100 times, the range would contain the true value in 95, [15]  confidence intervals provide more evidence regarding the precision of an estimate compared to p-values. [6]

In consideration of the similar research example provided above, one could make the following statement with 95% CI:

Statement: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22; there was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

It is important to note that the width of the CI is affected by the standard error and the sample size; reducing a study sample number will result in less precision of the CI (increase the width). [14]  A larger width indicates a smaller sample size or a larger variability. [16]  A researcher would want to increase the precision of the CI. For example, a 95% CI of 1.43 – 1.47 is much more precise than the one provided in the example above. In research and clinical practice, CIs provide valuable information on whether the interval includes or excludes any clinically significant values. [14]

Null values are sometimes used for differences with CI (zero for differential comparisons and 1 for ratios). However, CIs provide more information than that. [15]  Consider this example: A hospital implements a new protocol that reduced wait time for patients in the emergency department by an average of 25 minutes (95% CI: -2.5 – 41 minutes). Because the range crosses zero, implementing this protocol in different populations could result in longer wait times; however, the range is much higher on the positive side. Thus, while the p-value used to detect statistical significance for this may result in "not significant" findings, individuals should examine this range, consider the study design, and weigh whether or not it is still worth piloting in their workplace.

Similarly to p-values, 95% CIs cannot control for researchers' errors (e.g., study bias or improper data analysis). [14]  In consideration of whether to report p-values or CIs, researchers should examine journal preferences. When in doubt, reporting both may be beneficial. [13]  An example is below:

Reporting both: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22, p = 0.009. There was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

  • Clinical Significance

Recall that clinical significance and statistical significance are two different concepts. Healthcare providers should remember that a study with statistically significant differences and large sample size may be of no interest to clinicians, whereas a study with smaller sample size and statistically non-significant results could impact clinical practice. [14]  Additionally, as previously mentioned, a non-significant finding may reflect the study design itself rather than relationships between variables.

Healthcare providers using evidence-based medicine to inform practice should use clinical judgment to determine the practical importance of studies through careful evaluation of the design, sample size, power, likelihood of type I and type II errors, data analysis, and reporting of statistical findings (p values, 95% CI or both). [4]  Interestingly, some experts have called for "statistically significant" or "not significant" to be excluded from work as statistical significance never has and will never be equivalent to clinical significance. [17]

The decision on what is clinically significant can be challenging, depending on the providers' experience and especially the severity of the disease. Providers should use their knowledge and experiences to determine the meaningfulness of study results and make inferences based not only on significant or insignificant results by researchers but through their understanding of study limitations and practical implications.

  • Nursing, Allied Health, and Interprofessional Team Interventions

All physicians, nurses, pharmacists, and other healthcare professionals should strive to understand the concepts in this chapter. These individuals should maintain the ability to review and incorporate new literature for evidence-based and safe care. 

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.

Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Shreffler J, Huecker MR. Hypothesis Testing, P Values, Confidence Intervals, and Significance. [Updated 2023 Mar 13]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997-2017). [PeerJ. 2021] The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997-2017). Messam LLM, Weng HY, Rosenberger NWY, Tan ZH, Payet SDM, Santbakshsing M. PeerJ. 2021; 9:e12453. Epub 2021 Nov 24.
  • Review Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. [J Pharm Pract. 2010] Review Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. Ferrill MJ, Brown DA, Kyle JA. J Pharm Pract. 2010 Aug; 23(4):344-51. Epub 2010 Apr 13.
  • Interpreting "statistical hypothesis testing" results in clinical research. [J Ayurveda Integr Med. 2012] Interpreting "statistical hypothesis testing" results in clinical research. Sarmukaddam SB. J Ayurveda Integr Med. 2012 Apr; 3(2):65-9.
  • Confidence intervals in procedural dermatology: an intuitive approach to interpreting data. [Dermatol Surg. 2005] Confidence intervals in procedural dermatology: an intuitive approach to interpreting data. Alam M, Barzilai DA, Wrone DA. Dermatol Surg. 2005 Apr; 31(4):462-6.
  • Review Is statistical significance testing useful in interpreting data? [Reprod Toxicol. 1993] Review Is statistical significance testing useful in interpreting data? Savitz DA. Reprod Toxicol. 1993; 7(2):95-100.

Recent Activity

  • Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearl... Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

What Is the Difference Between Alpha and P-Values?

AndreaObzerova/Getty Images

  • Inferential Statistics
  • Statistics Tutorials
  • Probability & Games
  • Descriptive Statistics
  • Applications Of Statistics
  • Math Tutorials
  • Pre Algebra & Algebra
  • Exponential Decay
  • Worksheets By Grade
  • Ph.D., Mathematics, Purdue University
  • M.S., Mathematics, Purdue University
  • B.A., Mathematics, Physics, and Chemistry, Anderson University

In conducting a test of significance or hypothesis test , there are two numbers that are easy to get confused. These numbers are easily confused because they are both numbers between zero and one, and are both probabilities. One number is called the p-value of the test statistic. The other number of interest is the level of significance or alpha. We will examine these two probabilities and determine the difference between them.

Alpha Values

The number alpha is the threshold value that we measure p-values against. It tells us how extreme observed results must be in order to reject the null hypothesis of a significance test.

The value of alpha is associated with the confidence level of our test. The following lists some levels of confidence with their related values of alpha:

  • For results with a 90 percent level of confidence, the value of alpha is 1 — 0.90 = 0.10.
  • For results with a 95 percent level of confidence , the value of alpha is 1 — 0.95 = 0.05.
  • For results with a 99 percent level of confidence, the value of alpha is 1 — 0.99 = 0.01.
  • And in general, for results with a C percent level of confidence, the value of alpha is 1 — C/100.

Although in theory and practice many numbers can be used for alpha, the most commonly used is 0.05. The reason for this is both because consensus shows that this level is appropriate in many cases, and historically, it has been accepted as the standard. However, there are many situations when a smaller value of alpha should be used. There is not a single value of alpha that always determines statistical significance.

The alpha value gives us the probability of a type I error . Type I errors occur when we reject a null hypothesis that is actually true. Thus, in the long run, for a test with a level of significance of 0.05 = 1/20, a true null hypothesis will be rejected one out of every 20 times.

The other number that is part of a test of significance is a p-value. A p-value is also a probability, but it comes from a different source than alpha. Every test statistic has a corresponding probability or p-value. This value is the probability that the observed statistic occurred by chance alone, assuming that the null hypothesis is true.

Since there are a number of different test statistics, there are a number of different ways to find a ​p-value. For some cases, we need to know the probability distribution  of the population.​​

The p-value of the test statistic is a way of saying how extreme that statistic is for our sample data. The smaller the p-value, the more unlikely the observed sample.

Difference Between P-Value and Alpha

To determine if an observed outcome is statistically significant, we compare the values of alpha and the p-value. There are two possibilities that emerge:

  • The p-value is less than or equal to alpha. In this case, we reject the null hypothesis. When this happens, we say that the result is statistically significant. In other words, we are reasonably sure that there is something besides chance alone that gave us an observed sample.
  • The p-value is greater than alpha. In this case, we fail to reject the null hypothesis . When this happens, we say that the result is not statistically significant. In other words, we are reasonably sure that our observed data can be explained by chance alone.

The implication of the above is that the smaller the value of alpha is, the more difficult it is to claim that a result is statistically significant. On the other hand, the larger the value of alpha is the easier is it to claim that a result is statistically significant. Coupled with this, however, is the higher probability that what we observed can be attributed to chance.

  • What Level of Alpha Determines Statistical Significance?
  • An Example of a Hypothesis Test
  • Null Hypothesis and Alternative Hypothesis
  • The Runs Test for Random Sequences
  • What Is ANOVA?
  • Confidence Interval for the Difference of Two Population Proportions
  • Explore Maximum Likelihood Estimation Examples
  • How to Construct a Confidence Interval for a Population Proportion
  • Example of a Permutation Test
  • How to Find Critical Values with a Chi-Square Table
  • The Use of Confidence Intervals in Inferential Statistics
  • How to Calculate the Margin of Error
  • The Difference Between Type I and Type II Errors in Hypothesis Testing
  • How to Find Degrees of Freedom in Statistics
  • Examples of Confidence Intervals for Means
  • Parametric and Nonparametric Methods in Statistics

P-Value in Statistical Hypothesis Tests: What is it?

P value definition.

A p value is used in hypothesis testing to help you support or reject the null hypothesis . The p value is the evidence against a null hypothesis . The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage . For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“ significant “) your results.

When you run a hypothesis test , you compare the p value from your test to the alpha level you selected when you ran the test. Alpha levels can also be written as percentages.

p value

P Value vs Alpha level

Alpha levels are controlled by the researcher and are related to confidence levels . You get an alpha level by subtracting your confidence level from 100%. For example, if you want to be 98 percent confident in your research, the alpha level would be 2% (100% – 98%). When you run the hypothesis test, the test will give you a value for p. Compare that value to your chosen alpha level. For example, let’s say you chose an alpha level of 5% (0.05). If the results from the test give you:

  • A small p (≤ 0.05), reject the null hypothesis . This is strong evidence that the null hypothesis is invalid.
  • A large p (> 0.05) means the alternate hypothesis is weak, so you do not reject the null.

P Values and Critical Values

p-value

What if I Don’t Have an Alpha Level?

In an ideal world, you’ll have an alpha level. But if you do not, you can still use the following rough guidelines in deciding whether to support or reject the null hypothesis:

  • If p > .10 → “not significant”
  • If p ≤ .10 → “marginally significant”
  • If p ≤ .05 → “significant”
  • If p ≤ .01 → “highly significant.”

How to Calculate a P Value on the TI 83

Example question: The average wait time to see an E.R. doctor is said to be 150 minutes. You think the wait time is actually less. You take a random sample of 30 people and find their average wait is 148 minutes with a standard deviation of 5 minutes. Assume the distribution is normal. Find the p value for this test.

  • Press STAT then arrow over to TESTS.
  • Press ENTER for Z-Test .
  • Arrow over to Stats. Press ENTER.
  • Arrow down to μ0 and type 150. This is our null hypothesis mean.
  • Arrow down to σ. Type in your std dev: 5.
  • Arrow down to xbar. Type in your sample mean : 148.
  • Arrow down to n. Type in your sample size : 30.
  • Arrow to <μ0 for a left tail test . Press ENTER.
  • Arrow down to Calculate. Press ENTER. P is given as .014, or about 1%.

The probability that you would get a sample mean of 148 minutes is tiny, so you should reject the null hypothesis.

Note : If you don’t want to run a test, you could also use the TI 83 NormCDF function to get the area (which is the same thing as the probability value).

Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Hypothesis testing.

Key Topics:

  • Basic approach
  • Null and alternative hypothesis
  • Decision making and the p -value
  • Z-test & Nonparametric alternative

Basic approach to hypothesis testing

  • State a model describing the relationship between the explanatory variables and the outcome variable(s) in the population and the nature of the variability. State all of your assumptions .
  • Specify the null and alternative hypotheses in terms of the parameters of the model.
  • Invent a test statistic that will tend to be different under the null and alternative hypotheses.
  • Using the assumptions of step 1, find the theoretical sampling distribution of the statistic under the null hypothesis of step 2. Ideally the form of the sampling distribution should be one of the “standard distributions”(e.g. normal, t , binomial..)
  • Calculate a p -value , as the area under the sampling distribution more extreme than your statistic. Depends on the form of the alternative hypothesis.
  • Choose your acceptable type 1 error rate (alpha) and apply the decision rule : reject the null hypothesis if the p-value is less than alpha, otherwise do not reject.
sampled from a with unknown mean μ and known variance σ . : μ = μ
H : μ ≤ μ
H : μ ≥ μ
: μ ≠ μ
H : μ > μ
H : μ < μ
  • \(\frac{\bar{X}-\mu_0}{\sigma / \sqrt{n}}\)
  • general form is: (estimate - value we are testing)/(st.dev of the estimate)
  • z-statistic follows N(0,1) distribution
  • 2 × the area above |z|, area above z,or area below z, or
  • compare the statistic to a critical value, |z| ≥ z α/2 , z ≥ z α , or z ≤ - z α
  • Choose the acceptable level of Alpha = 0.05, we conclude …. ?

Making the Decision

It is either likely or unlikely that we would collect the evidence we did given the initial assumption. (Note: “likely” or “unlikely” is measured by calculating a probability!)

If it is likely , then we “ do not reject ” our initial assumption. There is not enough evidence to do otherwise.

If it is unlikely , then:

  • either our initial assumption is correct and we experienced an unusual event or,
  • our initial assumption is incorrect

In statistics, if it is unlikely, we decide to “ reject ” our initial assumption.

Example: Criminal Trial Analogy

First, state 2 hypotheses, the null hypothesis (“H 0 ”) and the alternative hypothesis (“H A ”)

  • H 0 : Defendant is not guilty.
  • H A : Defendant is guilty.

Usually the H 0 is a statement of “no effect”, or “no change”, or “chance only” about a population parameter.

While the H A , depending on the situation, is that there is a difference, trend, effect, or a relationship with respect to a population parameter.

  • It can one-sided and two-sided.
  • In two-sided we only care there is a difference, but not the direction of it. In one-sided we care about a particular direction of the relationship. We want to know if the value is strictly larger or smaller.

Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. (In statistics, the data are the evidence.)

Next, you make your initial assumption.

  • Defendant is innocent until proven guilty.

In statistics, we always assume the null hypothesis is true .

Then, make a decision based on the available evidence.

  • If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis . (Behave as if defendant is guilty.)
  • If there is not enough evidence, do not reject the null hypothesis . (Behave as if defendant is not guilty.)

If the observed outcome, e.g., a sample statistic, is surprising under the assumption that the null hypothesis is true, but more probable if the alternative is true, then this outcome is evidence against H 0 and in favor of H A .

An observed effect so large that it would rarely occur by chance is called statistically significant (i.e., not likely to happen by chance).

Using the p -value to make the decision

The p -value represents how likely we would be to observe such an extreme sample if the null hypothesis were true. The p -value is a probability computed assuming the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Since it's a probability, it is a number between 0 and 1. The closer the number is to 0 means the event is “unlikely.” So if p -value is “small,” (typically, less than 0.05), we can then reject the null hypothesis.

Significance level and p -value

Significance level, α, is a decisive value for p -value. In this context, significant does not mean “important”, but it means “not likely to happened just by chance”.

α is the maximum probability of rejecting the null hypothesis when the null hypothesis is true. If α = 1 we always reject the null, if α = 0 we never reject the null hypothesis. In articles, journals, etc… you may read: “The results were significant ( p <0.05).” So if p =0.03, it's significant at the level of α = 0.05 but not at the level of α = 0.01. If we reject the H 0 at the level of α = 0.05 (which corresponds to 95% CI), we are saying that if H 0 is true, the observed phenomenon would happen no more than 5% of the time (that is 1 in 20). If we choose to compare the p -value to α = 0.01, we are insisting on a stronger evidence!

Neither decision of rejecting or not rejecting the H entails proving the null hypothesis or the alternative hypothesis. We merely state there is enough evidence to behave one way or the other. This is also always true in statistics!

So, what kind of error could we make? No matter what decision we make, there is always a chance we made an error.

Errors in Criminal Trial:

Errors in Hypothesis Testing

Type I error (False positive): The null hypothesis is rejected when it is true.

  • α is the maximum probability of making a Type I error.

Type II error (False negative): The null hypothesis is not rejected when it is false.

  • β is the probability of making a Type II error

There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!

The power of a statistical test is its probability of rejecting the null hypothesis if the null hypothesis is false. That is, power is the ability to correctly reject H 0 and detect a significant effect. In other words, power is one minus the type II error risk.

\(\text{Power }=1-\beta = P\left(\text{reject} H_0 | H_0 \text{is false } \right)\)

Which error is worse?

Type I = you are innocent, yet accused of cheating on the test. Type II = you cheated on the test, but you are found innocent.

This depends on the context of the problem too. But in most cases scientists are trying to be “conservative”; it's worse to make a spurious discovery than to fail to make a good one. Our goal it to increase the power of the test that is to minimize the length of the CI.

We need to keep in mind:

  • the effect of the sample size,
  • the correctness of the underlying assumptions about the population,
  • statistical vs. practical significance, etc…

(see the handout). To study the tradeoffs between the sample size, α, and Type II error we can use power and operating characteristic curves.

Assume data are independently sampled from a normal distribution with unknown mean μ and known variance σ = 9. Make an initial assumption that μ = 65.

Specify the hypothesis: H : μ = 65 H : μ ≠ 65

z-statistic: 3.58

z-statistic follow N(0,1) distribution

The -value, < 0.0001, indicates that, if the average height in the population is 65 inches, it is unlikely that a sample of 54 students would have an average height of 66.4630.

Alpha = 0.05. Decision: -value < alpha, thus

Conclude that the average height is not equal to 65.

What type of error might we have made?

Type I error is claiming that average student height is not 65 inches, when it really is. Type II error is failing to claim that the average student height is not 65in when it is.

We rejected the null hypothesis, i.e., claimed that the height is not 65, thus making potentially a Type I error. But sometimes the p -value is too low because of the large sample size, and we may have statistical significance but not really practical significance! That's why most statisticians are much more comfortable with using CI than tests.

Based on the CI only, how do you know that you should reject the null hypothesis?

The 95% CI is (65.6628,67.2631) ...

What about practical and statistical significance now? Is there another reason to suspect this test, and the -value calculations?

There is a need for a further generalization. What if we can't assume that σ is known? In this case we would use s (the sample standard deviation) to estimate σ.

If the sample is very large, we can treat σ as known by assuming that σ = s . According to the law of large numbers, this is not too bad a thing to do. But if the sample is small, the fact that we have to estimate both the standard deviation and the mean adds extra uncertainty to our inference. In practice this means that we need a larger multiplier for the standard error.

We need one-sample t -test.

One sample t -test

  • Assume data are independently sampled from a normal distribution with unknown mean μ and variance σ 2 . Make an initial assumption, μ 0 .
: μ = μ
H : μ ≤ μ
H : μ ≥ μ
: μ ≠ μ
H : μ > μ
H : μ < μ
  • t-statistic: \(\frac{\bar{X}-\mu_0}{s / \sqrt{n}}\) where s is a sample st.dev.
  • t-statistic follows t -distribution with df = n - 1
  • Alpha = 0.05, we conclude ….

Testing for the population proportion

Let's go back to our CNN poll. Assume we have a SRS of 1,017 adults.

We are interested in testing the following hypothesis: H 0 : p = 0.50 vs. p > 0.50

What is the test statistic?

If alpha = 0.05, what do we conclude?

We will see more details in the next lesson on proportions, then distributions, and possible tests.

P-Value And Statistical Significance: What It Is & Why It Matters

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P Value Calculator From T Score
  • P-Value Calculator For Chi-Square
  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

6 Week 5 Introduction to Hypothesis Testing Reading

An introduction to hypothesis testing.

What are you interested in learning about? Perhaps you’d like to know if there is a difference in average final grade between two different versions of a college class? Does the Fort Lewis women’s soccer team score more goals than the national Division II women’s average? Which outdoor sport do Fort Lewis students prefer the most?  Do the pine trees on campus differ in mean height from the aspen trees? For all of these questions, we can collect a sample, analyze the data, then make a statistical inference based on the analysis.  This means determining whether we have enough evidence to reject our null hypothesis (what was originally assumed to be true, until we prove otherwise). The process is called hypothesis testing .

A really good Khan Academy video to introduce the hypothesis test process: Khan Academy Hypothesis Testing . As you watch, please don’t get caught up in the calculations, as we will use SPSS to do these calculations.  We will also use SPSS p-values, instead of the referenced Z-table, to make statistical decisions.

The Six-Step Process

Hypothesis testing requires very specific, detailed steps.  Think of it as a mathematical lab report where you have to write out your work in a particular way.  There are six steps that we will follow for ALL of the hypothesis tests that we learn this semester.

Six Step Hypothesis Testing Process

1. Research Question

All hypothesis tests start with a research question.  This is literally a question that includes what you are trying to prove, like the examples earlier:  Which outdoor sport do Fort Lewis students prefer the most? Is there sufficient evidence to show that the Fort Lewis women’s soccer team scores more goals than the national Division 2 women’s average?

In this step, besides literally being a question, you’ll want to include:

  • mention of your variable(s)
  • wording specific to the type of test that you’ll be conducting (mean, mean difference, relationship, pattern)
  • specific wording that indicates directionality (are you looking for a ‘difference’, are you looking for something to be ‘more than’ or ‘less than’ something else, or are you comparing one pattern to another?)

Consider this research question: Do the pine trees on campus differ in mean height from the aspen trees?

  • The wording of this research question clearly mentions the variables being studied. The independent variable is the type of tree (pine or aspen), and these trees are having their heights compared, so the dependent variable is height.
  • ‘Mean’ is mentioned, so this indicates a test with a quantitative dependent variable.
  • The question also asks if the tree heights ‘differ’. This specific word indicates that the test being performed is a two-tailed (i.e. non-directional) test. More about the meaning of one/two-tailed will come later.

2. Statistical Hypotheses

A statistical hypothesis test has a null hypothesis, the status quo, what we assume to be true.  Notation is H 0, read as “H naught”.  The alternative hypothesis is what you are trying to prove (mentioned in your research question), H 1 or H A .  All hypothesis tests must include a null and an alternative hypothesis.  We also note which hypothesis test is being done in this step.

The notation for your statistical hypotheses will vary depending on the type of test that you’re doing. Writing statistical hypotheses is NOT the same as most scientific hypotheses. You are not writing sentences explaining what you think will happen in the study. Here is an example of what statistical hypotheses look like using the research question: Do the pine trees on campus differ in mean height from the aspen trees?

LaTeX: H_0\:

3. Decision Rule

In this step, you state which alpha value you will use, and when appropriate, the directionality, or tail, of the test.  You also write a statement: “I will reject the null hypothesis if p < alpha” (insert actual alpha value here).  In this introductory class, alpha is the level of significance, how willing we are to make the wrong statistical decision, and it will be set to 0.05 or 0.01.

Example of a Decision Rule:

Let alpha=0.01, two-tailed. I will reject the null hypothesis if p<0.01.

4. Assumptions, Analysis and Calculations

Quite a bit goes on in this step.  Assumptions for the particular hypothesis test must be done.  SPSS will be used to create appropriate graphs, and test output tables. Where appropriate, calculations of the test’s effect size will also be done in this step.

All hypothesis tests have assumptions that we hope to meet. For example, tests with a quantitative dependent variable consider a histogram(s) to check if the distribution is normal, and whether there are any obvious outliers. Each hypothesis test has different assumptions, so it is important to pay attention to the specific test’s requirements.

Required SPSS output will also depend on the test.

5. Statistical Decision

It is in Step 5 that we determine if we have enough statistical evidence to reject our null hypothesis.  We will consult the SPSS p-value and compare to our chosen alpha (from Step 3: Decision Rule).

Put very simply, the p -value is the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample. The p -value can also be thought of as the probability that the results (from the sample) that we are seeing are solely due to chance. This concept will be discussed in much further detail in the class notes.

Based on this numerical comparison between the p-value and alpha, we’ll either reject or retain our null hypothesis.  Note: You may NEVER ‘accept’ the null hypothesis. This is because it is impossible to prove a null hypothesis to be true.

Retaining the null means that you just don’t have enough evidence to prove your alternative hypothesis to be true, so you fall back to your null. (You retain the null when p is greater than or equal to alpha.)

Rejecting the null means that you did find enough evidence to prove your alternative hypothesis as true. (You reject the null when p is less than alpha.)

Example of a Statistical Decision:

Retain the null hypothesis, because p=0.12 > alpha=0.01.

The p-value will come from SPSS output, and the alpha will have already been determined back in Step 3. You must be very careful when you compare the decimal values of the p-value and alpha. If, for example, you mistakenly think that p=0.12 < alpha=0.01, then you will make the incorrect statistical decision, which will likely lead to an incorrect interpretation of the study’s findings.

6. Interpretation

The interpretation is where you write up your findings. The specifics will vary depending on the type of hypothesis test you performed, but you will always include a plain English, contextual conclusion of what your study found (i.e. what it means to reject or retain the null hypothesis in that particular study).  You’ll have statistics that you quote to support your decision.  Some of the statistics will need to be written in APA style citation (the American Psychological Association style of citation).  For some hypothesis tests, you’ll also include an interpretation of the effect size.

Some hypothesis tests will also require an additional (non-Parametric) test after the completion of your original test, if the test’s assumptions have not been met. These tests are also call “Post-Hoc tests”.

As previously stated, hypothesis testing is a very detailed process. Do not be concerned if you have read through all of the steps above, and have many questions (and are possibly very confused). It will take time, and a lot of practice to learn and apply these steps!

This Reading is just meant as an overview of hypothesis testing. Much more information is forthcoming in the various sets of Notes about the specifics needed in each of these steps. The Hypothesis Test Checklist will be a critical resource for you to refer to during homeworks and tests.

Student Course Learning Objectives

4.  Choose, administer and interpret the correct tests based on the situation, including identification of appropriate sampling and potential errors

c. Choose the appropriate hypothesis test given a situation

d. Describe the meaning and uses of alpha and p-values

e. Write the appropriate null and alternative hypotheses, including whether the alternative should be one-sided or two-sided

f. Determine and calculate the appropriate test statistic (e.g. z-test, multiple t-tests, Chi-Square, ANOVA)

g. Determine and interpret effect sizes.

h. Interpret results of a hypothesis test

  • Use technology in the statistical analysis of data
  • Communicate in writing the results of statistical analyses of data

Attributions

Adapted from “Week 5 Introduction to Hypothesis Testing Reading” by Sherri Spriggs and Sandi Dang is licensed under CC BY-NC-SA 4.0 .

Math 132 Introduction to Statistics Readings Copyright © by Sherri Spriggs is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

An Easy Introduction to Statistical Significance (With Examples)

Published on January 7, 2021 by Pritha Bhandari . Revised on June 22, 2023.

If a result is statistically significant , that means it’s unlikely to be explained solely by chance or random factors. In other words, a statistically significant result has a very low chance of occurring if there were no true effect in a research study.

The p value , or probability value, tells you the statistical significance of a finding. In most studies, a p value of 0.05 or less is considered statistically significant, but this threshold can also be set higher or lower.

Table of contents

How do you test for statistical significance, what is a significance level, problems with relying on statistical significance, other types of significance in research, other interesting articles, frequently asked questions about statistical significance.

In quantitative research , data are analyzed through null hypothesis significance testing, or hypothesis testing. This is a formal procedure for assessing whether a relationship between variables or a difference between groups is statistically significant.

Null and alternative hypotheses

To begin, research predictions are rephrased into two main hypotheses: the null and alternative hypothesis .

  • A null hypothesis ( H 0 ) always predicts no true effect, no relationship between variables , or no difference between groups.
  • An alternative hypothesis ( H a or H 1 ) states your main prediction of a true effect, a relationship between variables, or a difference between groups.

Hypothesis testin g always starts with the assumption that the null hypothesis is true. Using this procedure, you can assess the likelihood (probability) of obtaining your results under this assumption. Based on the outcome of the test, you can reject or retain the null hypothesis.

  • H 0 : There is no difference in happiness between actively smiling and not smiling.
  • H a : Actively smiling leads to more happiness than not smiling.

Test statistics and p values

Every statistical test produces:

  • A test statistic that indicates how closely your data match the null hypothesis.
  • A corresponding p value that tells you the probability of obtaining this result if the null hypothesis is true.

The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance.

Next, you perform a t test to see whether actively smiling leads to more happiness. Using the difference in average happiness between the two groups, you calculate:

  • a t value (the test statistic) that tells you how much the sample data differs from the null hypothesis,
  • a p value showing the likelihood of finding this result if the null hypothesis is true.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing alpha and p value

The significance level , or alpha (α), is a value that the researcher sets in advance as the threshold for statistical significance. It is the maximum risk of making a false positive conclusion ( Type I error ) that you are willing to accept .

In a hypothesis test, the  p value is compared to the significance level to decide whether to reject the null hypothesis.

  • If the p value is  higher than the significance level, the null hypothesis is not refuted, and the results are not statistically significant .
  • If the p value is lower than the significance level, the results are interpreted as refuting the null hypothesis and reported as statistically significant .

Usually, the significance level is set to 0.05 or 5%. That means your results must have a 5% or lower chance of occurring under the null hypothesis to be considered statistically significant.

The significance level can be lowered for a more conservative test. That means an effect has to be larger to be considered statistically significant.

The significance level may also be set higher for significance testing in non-academic marketing or business contexts. This makes the study less rigorous and increases the probability of finding a statistically significant result.

As best practice, you should set a significance level before you begin your study. Otherwise, you can easily manipulate your results to match your research predictions.

It’s important to note that hypothesis testing can only show you whether or not to reject the null hypothesis in favor of the alternative hypothesis. It can never “prove” the null hypothesis, because the lack of a statistically significant effect doesn’t mean that absolutely no effect exists.

When reporting statistical significance, include relevant descriptive statistics about your data (e.g., means and standard deviations ) as well as the test statistic and p value.

There are various critiques of the concept of statistical significance and how it is used in research.

Researchers classify results as statistically significant or non-significant using a conventional threshold that lacks any theoretical or practical basis. This means that even a tiny 0.001 decrease in a p value can convert a research finding from statistically non-significant to significant with almost no real change in the effect.

On its own, statistical significance may also be misleading because it’s affected by sample size. In extremely large samples , you’re more likely to obtain statistically significant results, even if the effect is actually small or negligible in the real world. This means that small effects are often exaggerated if they meet the significance threshold, while interesting results are ignored when they fall short of meeting the threshold.

The strong emphasis on statistical significance has led to a serious publication bias and replication crisis in the social sciences and medicine over the last few decades. Results are usually only published in academic journals if they show statistically significant results—but statistically significant results often can’t be reproduced in high quality replication studies.

As a result, many scientists call for retiring statistical significance as a decision-making tool in favor of more nuanced approaches to interpreting results.

That’s why APA guidelines advise reporting not only p values but also  effect sizes and confidence intervals wherever possible to show the real world implications of a research outcome.

Aside from statistical significance, clinical significance and practical significance are also important research outcomes.

Practical significance shows you whether the research outcome is important enough to be meaningful in the real world. It’s indicated by the effect size of the study.

Clinical significance is relevant for intervention and treatment studies. A treatment is considered clinically significant when it tangibly or substantially improves the lives of patients.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). An Easy Introduction to Statistical Significance (With Examples). Scribbr. Retrieved September 2, 2024, from https://www.scribbr.com/statistics/statistical-significance/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, understanding p values | definition and examples, what is effect size and why does it matter (examples), hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

Icon Partners

  • Quality Improvement
  • Talk To Minitab

Alphas, P-Values, and Confidence Intervals, Oh My!

Topics: Hypothesis Testing

Trying to remember what the alpha-level, p-value, and confidence interval all mean for a hypothesis test—and how they relate to one another—can seem about as daunting as Dorothy’s trek down the yellow brick road.

Rather than sitting through a semester of Intro Stats, let's get right to the point and explain in clear language what all these statistical terms mean and how they relate to one another.

What Does Alpha Mean in a Hypothesis Test?

Before you run any statistical test, you must first determine your alpha level, which is also called the “significance level.” By definition, the alpha level is the probability of rejecting the null hypothesis when the null hypothesis is true.

Translation: It’s the probability of making a wrong decision.

Thanks to famed statistician R. A. Fisher, most folks typically use an alpha level of 0.05. However, if you’re analyzing airplane engine failures, you may want to lower the probability of making a wrong decision and use a smaller alpha. On the other hand, if you're making paper airplanes, you might be willing to increase alpha and accept the higher risk of making the wrong decision.

Like all probabilities, alpha ranges from 0 to 1.

What Is the P-Value of a Hypothesis Test?

Toto

Statistically speaking, the p-value is the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained when the null hypothesis is true. If that makes your head spin like Dorothy’s house in a Kansas tornado, just pretend Glenda has waved her magic wand and zapped it from your memory. Then ponder this for a moment.

The p-value is basically the probability of obtaining your sample data IF the null hypothesis (e.g., the average cost of Cairn terriers = $400) were true. So if you obtain a p-value of 0.85, then you have little reason to doubt the null hypothesis. However, if your p-value is say 0.02, there’s only a very small chance you would have obtained that data if the null hypothesis was in fact true.

And since the p-value is a probability just like alpha, p-values also range from 0 to 1.

What Is the Confidence Interval for a Hypothesis Test?

When you run a hypothesis test, Minitab also provides a confidence interval. P-values and confidence intervals are like Dorothy and Toto—where you find one, you will likely find the other.

The confidence interval is the range of likely values for a population parameter, such as the population mean. For example, if you compute a 95% confidence interval for the average price of a Cairn terrier, then you can be 95% confident that the interval contains the true   average cost of all Cairn terriers.

Interpreting Hypothesis Test Statistics

Now let's put it all together. These three facts should help you interpret the results of your hypothesis test.

Fact 1: Confidence level + alpha = 1

If alpha equals 0.05, then your confidence level is 0.95. If you increase alpha, you both increase the probability of incorrectly rejecting the null hypothesis and also decrease your confidence level.

Fact 2: If the p-value is low, the null must go.

If the p-value is less than alpha—the risk you’re willing to take of making a wrong decision—then you reject the null hypothesis. For example, if the p-value was 0.02 (as in the Minitab output below) and we're using an alpha of 0.05, we’d reject the null hypothesis and conclude that the average price of Cairn terrier is NOT $400.

t-test output

If the p-value is low, the null must go. Alternatively, if the p-value is greater than alpha, then we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly.

Fact 3: The confidence interval and p-value will always lead you to the same conclusion.

If the p-value is less than alpha (i.e., it is significant), then the confidence interval will NOT contain the hypothesized mean. Looking at the Minitab output above, the 95% confidence interval of 365.58 - 396.75 does not include $400. Thus, we know that the p-value will be less than 0.05.

If the p-value is greater than alpha (i.e., it is not significant), then the confidence interval will include the hypothesized mean.

I hope this post has helped to lift the curtain if you've had questions regarding alpha, the p-value, confidence intervals, and how they all relate to one another. If you want more details about these statistical terms and hypothesis testing, I’d recommend giving Quality Trainer a try. Quality Trainer is Minitab’s e-learning course that teaches you both statistical concepts and how to analyze your data using Minitab, and at a cost of only $30 US for one month, it’s well worth the investment.

You Might Also Like

  • Trust Center

© 2023 Minitab, LLC. All Rights Reserved.

  • Terms of Use
  • Privacy Policy
  • Cookies Settings

Open topic with navigation

The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0 ) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested. P is also described in terms of rejecting H 0 when it is actually true, however, it is not a direct probability of this state.

The null hypothesis is usually an hypothesis of "no difference" e.g. no difference between blood pressures in group A and group B. Define a null hypothesis for each study question clearly before the start of your study.

The only situation in which you should use a one sided P value is when a large change in an unexpected direction would have absolutely no relevance to your study. This situation is unusual; if you are in any doubt then use a two sided P value.

The term significance level (alpha) is used to refer to a pre-chosen probability and the term "P value" is used to indicate a probability that you calculate after a given study.

The alternative hypothesis (H 1 ) is the opposite of the null hypothesis; in plain language terms this is usually the hypothesis you set out to investigate. For example, question is "is there a significant (not due to chance) difference in blood pressures between groups A and B if we give group A the test drug and group B a sugar pill?" and alternative hypothesis is " there is a difference in blood pressures between groups A and B if we give group A the test drug and group B a sugar pill".

If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. It does NOT imply a "meaningful" or "important" difference; that is for you to decide when considering the real-world relevance of your result.

The choice of significance level at which you reject H 0 is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1% (P < 0.05, 0.01 and 0.001) levels have been used. These numbers can give a false sense of security.

In the ideal world, we would be able to define a "perfectly" random sample, the most appropriate test and one definitive conclusion. We simply cannot. What we can do is try to optimise all stages of our research to minimise sources of uncertainty. When presenting P values some groups find it helpful to use the asterisk rating system as well as quoting the P value:

P < 0.05 *

P < 0.01 **

P < 0.001

Most authors refer to statistically significant as P < 0.05 and statistically highly significant as P < 0.001 (less than one in a thousand chance of being wrong).

The asterisk system avoids the woolly term "significant". Please note, however, that many statisticians do not like the asterisk rating system when it is used without showing P values. As a rule of thumb, if you can quote an exact P value then do. You might also want to refer to a quoted exact P value as an asterisk in text narrative or tables of contrasts elsewhere in a report.

At this point, a word about error. Type I error is the false rejection of the null hypothesis and type II error is the false acceptance of the null hypothesis. As an aid memoir: think that our cynical society rejects before it accepts.

The significance level (alpha) is the probability of type I error. The power of a test is one minus the probability of type II error (beta). Power should be maximised when selecting statistical methods. If you want to estimate sample sizes then you must understand all of the terms mentioned here.

The following table shows the relationship between power and error in hypothesis testing:

 
Accept H :

 

Reject H :

 

H is true:
 
     
H is false:
 
     
H = null hypothesis    
P = probability    

If you are interested in further details of probability and sampling theory at this point then please refer to one of the general texts listed in the reference section .

You must understand confidence intervals if you intend to quote P values in reports and papers. Statistical referees of scientific journals expect authors to quote confidence intervals with greater prominence than P values.

Notes about Type I error :

  • is the incorrect rejection of the null hypothesis
  • maximum probability is set in advance as alpha
  • is not affected by sample size as it is set in advance
  • increases with the number of tests or end points (i.e. do 20 rejections of H 0 and 1 is likely to be wrongly significant for alpha = 0.05)

Notes about Type II error :

  • is the incorrect acceptance of the null hypothesis
  • probability is beta
  • beta depends upon sample size and alpha
  • can't be estimated except as a function of the true population effect
  • beta gets smaller as the sample size gets larger
  • beta gets smaller as the number of tests or end points increases

Copyright © 1987-2024 Iain E. Buchan, all rights reserved. Download software here .

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

The true meaning/difference of alpha values and p-values

EDIT: I may have been confused by the confusion of others. In any case, it helped a lot when I came to know that the $p$-value is stochastic. It does make sense, given the $p$-value is a transformation of the test statistic, which is stochastic!

So, what I gather so far (thanks for my statistics course for not being helpful!):

  • The $\alpha$-value is the maximum chance of making a type I error given we assume the null hypothesis being correct AND the null is correct (I tried to simulate 10,000 $p$-values from two-sample t-tests for two ${N}(2,1)$ distributions (n = 3 or n =30).
  • In this case, $p$ is also following a uniform distribution with $p$ between 0 and 1.
  • If I calculate $p$-values when the null hypothesis cannot be true (like t-test of $N(2,1)$ vs $N(4,1)$; n = 3 or n = 30), the lower $p$-values tend to have a higher probability than the rest.
  • If a person does an experiment with, say two samples of n = 30 and he tries to test the difference (or, rather, equality) of the means, it is perfectly possible to obtain a $p$-value > 0.05 purely by chance, even if the samples are truly different. Following the existing protocol in science, he accepts the null hypothesis (but it is debated why we use such an arbitrary limit!).
  • Another scientist might try to replicate the experiment (or, more realistically, the first person will try to replicate the experiment because he can't publish this), and he gets, by pure chance, $p$ < 0.05 (say 0.03). Now he's in business.
  • If you really want to see if the distribution of $p$ is inconsistent with the null hypothesis, you have to repeat the exact same experiment and analysis a lot of times!

So, the $\alpha$ value indicates how willing we are to make a type I error, assuming the null hypothesis is correct. It might be useful if we want to accept or reject, say, batches of a product. The $p$-value is something we calculate after we have done our experiments, and one single value does not seem to tell us much about how far we are from the null being true alternative being indistinguishable from the null ( http://amstat.tandfonline.com/doi/pdf/10.1198/000313008X332421 ). Then we need to look at the whole distribution of $p$. Can we bootstrap us out of this?

Oh yeah, and regarding the CI. People tend to advise the CI is used instead of the p-value (unless they overlap and you have to check if they really are different). Do you agree?

------ RANDOM NONSENSE GIVEN BELOW --------

I have been puzzled by the seemingly great number of texts, that disagree on the usage of alpha-values and p-values. Even my own textbook looks like it mixes the philosophy and methods of two schools!

So I just want to really clarify what is going on - I'm not too strong in the statistics or math jargon, so I get lost in most articles beyond the simplest examples (like tossing a coin)! Therefore, I hope that you out there can help me out.

My current understanding goes like this: The alpha value is useful for the creation of confidence intervals (CIs) for the sample means, and an alpha value of for instance 0.05 (95% CI) will assure you, that only 5% of all future samples of the population will fail to contain the true mean in their CIs. This might be useful if you, for instance, continuously take samples from a production to check the product quality (discard the 5% of the batches when using 95% CI or change some production variables if >5% of samples disagree? I'm not sure.).

The big trouble is, that the alpha value is very often used mistakenly together with hypothesis testing, e.g. testing the difference of means between samples and using p < 0.05 as criteria. The p-value is interpreted as "there is a <5% chance that the effect is not significant, while it actually is (type I error?)". This, I have learned is wrong. The p-value rather describes that, given the null hypothesis is true, what the probability is to see a given or more extreme value. Like testing that the difference in means are zero, but when comparing to samples, the difference of the means might be e.g. 3 units by pure chance. I have stumbled upon some papers where they 'calibrate' p-values and relate them directly to the error rate, which is what many people actually mean by saying "p < 0.05". Here, you have to state the % of true nulls - I cannot readily interpret this, can anybody help? It seems it relates to, for instance, the empirical knowledge of how often an effect is seen as effective. Am I wrong?

Anyhow, for the error rate, I previously had a good idea about how to calculate it by Monte Carlo methods. But now I can't recall my reasoning! But I remember I figured the error rate depends on 1) sample sizes, 2) the distributions of the populations you sample, 3) how "far" the H0 and H1 hypotheses are from each other. You are less likely to be wrong if H0 states equal means, but the true means of the populations are far apart.

Cheers, Steffen

  • hypothesis-testing
  • confidence-interval

pseudoninja's user avatar

  • 1 $\begingroup$ A review of the posts at stats.stackexchange.com/questions/31/… might be of some help with these questions. $\endgroup$ –  whuber ♦ Commented Jun 18, 2016 at 15:22
The big trouble is, that the alpha value is very often used mistakenly together with hypothesis testing,

Sorry, you don't have it correct there.

Indeed, $\alpha$ is fundamental to hypothesis testing. Explicitly, it is your chosen (maximum) type I error rate under the null hypothesis, and the basis on which the rejection region is chosen.

Let me start with a basic/general discussion of hypothesis testing with a more-or-less Neyman-Pearson flavor (but which is not formally NP).

Let's take as given that you want to test some hypothesis about some population characteristic, and you have a null hypothesis and an alternative hypothesis. Let's assume for now that the null is a point null but that the alternative is not.

  • You choose* some test statistic that will tend to behave differently when the alternative is true than when the null is true.

* there's theory that can help pick good ones if you know a lot about the population distribution but that's entirely beside the point here, and we rarely actually know that kind of information in any case.

You then compute the distribution of the test statistic when the null is true (perhaps by making assumptions about the population distribution and computing the sampling distribution of the test statistic when the null is true, or perhaps by making exchangeability assumptions and invoking some form of resampling for that purpose - such as randomization tests or bootstrap tests)

You identify a proportion $\alpha$ (or no more than $\alpha$ ) of the distribution that's more consistent with the alternative** than the null and call that your rejection region.

If your test statistic falls into the rejection region you reject the null hypothesis. If the null was actually true this happens rarely (i.e. with probability $\alpha$ , whereas if the alternative is true it should happen considerably more often).

** However, it's possible to base a test purely on likelihood and take a more Fisherian-style point of view, where all the lowest-likelihood samples (under the null) compromise the rejection region.

Note that we haven't mentioned p-values at all yet, though they're common in the FIsherian-style approach. [However, they also fit in with the NP approach if you recognize that $p\leq\alpha$ precisely when the test statistic is in the rejection region.]

My current understanding goes like this: The alpha value is useful for the creation of confidence intervals (CIs)

No. Well, $\alpha$ does come into it in the sense that you choose a coverage probability of $(1-\alpha)$ .

e.g. testing the difference of means between samples

Weren't you just calculating a CI? How did we jump to doing a test? What are you trying to do, produce a CI or test something?

The p-value is interpreted as "there is a <5% chance that the effect is not significant, while it actually is (type I error?)".

Can you show us someone saying exactly that? I can't quite follow what you're saying there.

This, I have learned is wrong.

Well, yes it's wrong, whatever it was trying to say.

I have stumbled upon some papers where they 'calibrate' p-values and relate them directly to the error rate,

I have no clear idea what you're saying there, but

a. in general don't try to learn statistics from what people do in papers, at least not until you have a solid grounding in it.

b. it's impossible to discuss a second hand report of what people do. If you want to discuss what you see in a paper, quote it and give a proper reference (so we can see more context if needed).

You have many confusions here. It might have been more useful to have given explicit examples of what you've found that were directly contradictory than present your own understanding.

Here, you have to state the % of true nulls

With point nulls, this is usually 0.

I cannot readily interpret this, can anybody help? It seems it relates to, for instance, the empirical knowledge of how often an effect is seen as effective. Am I wrong?

Yes, you're wrong. The word "effective" doesn't belong there. If your null is "no effect" it relates to how often the effect is completely absent.

But you have to be careful about what is being done here -- it sounds like someone is maybe taking a Bayesian approach but it's impossible to tell -- because again all we have is a somewhat muddled second hand report. It's impossible to untangle your misunderstandings from the misunderstandings of the people you're talking about.

Anyhow, for the error rate, I previously had a good idea about how to calculate it by Monte Carlo methods. But now I can't recall my reasoning!

It's not clear to me what you seek here.

While the first half of your question was answerable enough (by explaining some of your misconceptions), if you can clarify and narrow (and add context to) the later part of your question you might like to post that as one or two new questions.

Glen_b's user avatar

  • $\begingroup$ Thanks for your reply, although I would find it much more constructive if you could point out what I have misinterpreted and how to correctly interpret it, instead on justing stating 'you're wrong' and then quote what basically stands in my stat textbook. $\endgroup$ –  pseudoninja Commented Jun 19, 2016 at 11:44
  • $\begingroup$ But taking alpha values first, you state that it is the maximum chosen type I error rate given null is true, and that if the test statistics falls into the rejection region, there is a 5% chance of rejecting a true null hypothesis (point 4). However, S.W. Huck (Statistical Misconceptions) states that this is false: "Alpha, the level of significance, defines the probability of a Type I error. For example, if alpha is set equal to .05, there will then necessarily be a 5% chance that a true null hypothesis will be rejected." (Ch. 8.1) is categorised as a misconception. $\endgroup$ –  pseudoninja Commented Jun 19, 2016 at 11:51
  • $\begingroup$ Also, using p as a measure of if we should reject or accept a null hypothesis is widely critisised, since using the standard test of p < 0.05 still makes it quite plausible that your results can not be replicated and any "significant effect" may or may not be found in replicate experiments: nature.com/news/scientific-method-statistical-errors-1.14700 , ncbi.nlm.nih.gov/pmc/articles/PMC1119478/pdf/226.pdf , onlinelibrary.wiley.com/doi/10.1111/j.1476-5381.2012.01931.x/… $\endgroup$ –  pseudoninja Commented Jun 19, 2016 at 11:54
  • 1 $\begingroup$ @Steffen: Have you gone on to read Huck's reasons for calling that a misconception? (1) He separates the full null hypothesis (in his example two independent random variables following the same normal distribution) into two parts: the one you're interested in testing (the distributions' having equal means), which he calls "the null hypothesis"; & one that you're not interested in (all the rest, including their having equal variances), which he calls "assumptions". Fair enough, but it seems odd to characterize a common, unexceptionable way of putting things as a misconception - perhaps he ... $\endgroup$ –  Scortchi - Reinstate Monica ♦ Commented Jun 20, 2016 at 12:25
  • 2 $\begingroup$ @Steffen, at each point where I agreed something was wrong, I attempted (where I was able) to explain what the issue was. If you would like clarification or further explanation on any point, I'm happy to try to answer a specific question on that point, if I can. $\endgroup$ –  Glen_b Commented Jun 20, 2016 at 12:58

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged hypothesis-testing confidence-interval p-value or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • Bringing clarity to status tag usage on meta sites

Hot Network Questions

  • What is this device in my ceiling making out of battery chirps?
  • ESTA is not letting me pay
  • How much easier/harder would it be to colonize space if humans found a method of giving ourselves bodies that could survive in almost anything?
  • How do I backup only changed files on an external hard drive?
  • Disable terminal switch focus shortcut (Cmd+Left/Right Arrow)
  • If you have two probabilities, how do you describe how much more likely one is than the other?
  • Is it fine to call a 26 year old character a young adult?
  • What is the difference between "Hubiera" and "Habría"?
  • What is happening when a TV satellite stops broadcasting during an "eclipse"?
  • Is it illegal to use a fake state ID to enter a private establishment even when a legitimate ID would've been fine?
  • Is consciousness a prerequisite for knowledge?
  • Does an airplane fly less or more efficiently after an mid-flight engine failure?
  • decode the pipe
  • How is an inverting opamp adder circuit able to regulate its feedback?
  • Removing duplicate data based on multiple attributes
  • Driveway electric run using existing service poles
  • In 1982 Admiral Grace Hopper said "I still haven't found out why helicopter rotors go the way they do". If she were here today, how might one answer?
  • How many ways can you make change?
  • What does "if you ever get up this way" mean?
  • Why is Namibia killing game animals for food rather than allowing more big game hunting?
  • We are travelling to Phu Quoc from Perth Australia and have a 9 hour stop over in kuala lumpur do we need a visa to go out the airport?
  • How can I align this figure with the page numbering?
  • Can it be acceptable to take over CTRL + F shortcut in web app
  • Reference request: regularity of non-divergence form elliptic operators with Neumann boundary conditions

hypothesis testing alpha and p value

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

How to Find the P value: Process and Calculations

By Jim Frost 4 Comments

P values are everywhere in statistics . They’re in all types of hypothesis tests. But how do you calculate a p-value ? Unsurprisingly, the precise calculations depend on the test. However, there is a general process that applies to finding a p value.

In this post, you’ll learn how to find the p value. I’ll start by showing you the general process for all hypothesis tests. Then I’ll move on to a step-by-step example showing the calculations for a p value. This post includes a calculator so you can apply what you learn.

General Process for How to Find the P value

To find the p value for your sample , do the following:

  • Identify the correct test statistic.
  • Calculate the test statistic using the relevant properties of your sample.
  • Specify the characteristics of the test statistic’s sampling distribution.
  • Place your test statistic in the sampling distribution to find the p value.

Before moving on to the calculations example, I’ll summarize the purpose for each step. This part tells you the “why.” In the example calculations section, I show the “how.”

Identify the Correct Test Statistic

All hypothesis tests boil your sample data down to a single number known as a test statistic. T-tests use t-values. F-tests use F-values. Chi-square tests use chi-square values. Choosing the correct one depends on the type of data you have and how you want to analyze it. Before you can find the p value, you must determine which hypothesis test and test statistic you’ll use.

Test statistics assess how consistent your sample data are with the null hypothesis. As a test statistic becomes more extreme, it indicates a larger difference between your sample data and the null hypothesis.

Calculate the Test Statistic

How you calculate the test statistic depends on which one you’re using. Unsurprisingly, the method for calculating test statistics varies by test type. Consequently, to calculate the p value for any test, you’ll need to know the correct test statistic formula.

To learn more about test statistics and how to calculate them for other tests, read my article, Test Statistics .

Specify the Properties of the Test Statistic’s Sampling Distribution

Test statistics are unitless, making them tricky to interpret on their own. You need to place them in a larger context to understand how extreme they are.

The sampling distribution for the test statistic provides that context. Sampling distributions are a type of probability distribution. Consequently, they allow you to calculate probabilities related to your test statistic’s extremeness, which lets us find the p value!

Probability distribution plot that displays a t-distribution.

Like any distribution, the same sampling distribution (e.g., the t-distribution) can have a variety of shapes depending upon its parameters . For this step, you need to determine the characteristics of the sampling distribution that fit your design and data.

That usually entails specifying the degrees of freedom (changes its shape) and whether the test is one- or two-tailed (affects the directions the test can detect effects). In essence, you’re taking the general sampling distribution and tailoring it to your study so it provides the correct probabilities for finding the p value.

Each test statistic’s sampling distribution has unique properties you need to specify. At the end of this post, I provide links for several.

Learn more about degrees of freedom and one-tailed vs. two-tailed tests .

Placing Your Test Statistic in its Sampling Distribution to Find the P value

Finally, it’s time to find the p value because we have everything in place. We have calculated our test statistic and determined the correct properties for its sampling distribution. Now, we need to find the probability of values more extreme than our observed test statistic.

In this context, more extreme means further away from the null value in both directions for a two-tailed test or in one direction for a one-tailed test.

At this point, there are two ways to use the test statistic and distribution to calculate the p value. The formulas for probability distributions are relatively complex. Consequently, you won’t calculate it directly. Instead, you’ll use either an online calculator or a statistical table for the test statistic. I’ll show you both approaches in the step-by-step example.

In summary, calculating a p-value involves identifying and calculating your test statistic and then placing it in its sampling distribution to find the probability of more extreme values!

Let’s see this whole process in action with an example!

Step-by-Step Example of How to Find the P value for a T-test

For this example, assume we’re tasked with determining whether a sample mean is different from a hypothesized value. We’re given the sample statistics below and need to find the p value.

  • Mean: 330.6
  • Standard deviation: 154.2
  • Sample size: 25
  • Null hypothesis value: 260

Let’s work through the step-by-step process of how to calculate a p-value.

First, we need to identify the correct test statistic. Because we’re comparing one mean to a null value, we need to use a 1-sample t-test. Hence, the t-value is our test statistic, and the t-distribution is our sampling distribution.

Second, we’ll calculate the test statistic. The t-value formula for a 1-sample t-test is the following:

Test statistic formula for the 1-sample t-test.

  • x̄ is the sample mean.
  • µ 0 is the null hypothesis value.
  • s is the sample standard deviation.
  • n is the sample size
  • Collectively, the denominator is the standard error of the mean .

Let’s input our sample values into the equation to calculate the t-value.

Calculations for the t-value, which leads to the p-value.

Third, we need to specify the properties of the sampling distribution to find the p value. We’ll need the degrees of freedom.

The degrees of freedom for a 1-sample t-test is n – 1. Our sample size is 25. Hence, we have 24 DF. We’ll use a two-tailed test, which is the standard.

Now we’ve got all the necessary information to calculate the p-value. I’ll show you two ways to take the final step!

P-value Calculator

One method is to use an online p-value calculator, like the one I include below.

Enter the following in the calculator for our t-test example.

  • In What do you want? , choose Two-tailed p-value (the default).
  • In What do you have? , choose t-score .
  • In Degrees of freedom (d) , enter 24 .
  • In Your t-score , enter 2.289 .

The calculator displays a result of 0.031178.

There you go! Using the standard significance level of 0.05, our results are statistically significant!

Using a Statistical Table to Find the P Value

The other common method is using a statistical table. In this case, we’ll need to use a t-table. For this example, I’ll truncate the rows. You can find my full table here: T-Table .

This method won’t find the exact p value, but you’ll find a range and know whether your results are statistically significant.

T-table for finding the p value.

Start by looking in the row for 24 degrees of freedom, highlighted in light green. We need to find where our t-score of 2.289 fits in. I highlight the two table values that our t-value fits between, 2.064 and 2.492. Then we look at the two-tailed row at the top to find the corresponding p values for the two t-values.

In this case, our t-value of 2.289 produces a p value between 0.02 and 0.05 for a two-tailed test. Our results are statistically significant, and they are consistent with the calculator’s more precise results.

Displaying the P value in a Chart

In the example above, you saw how to calculate a p-value starting with the sample statistics. We calculated the t-value and placed it in the applicable t-distribution. I find that the calculations and numbers are dry by themselves. I love graphing things whenever possible, so I’ll use a probability distribution plot to illustrate the example.

Using statistical software, I’ll create the graphical equivalent of calculating the p-value above.

Chart of finding p value.

This chart has two shaded regions because we performed a two-tailed test. Each region has a probability of 0.01559. When you sum them, you obtain the p-value of 0.03118. In other words, the likelihood of a t-value falling in either shaded region when the null hypothesis is true is 0.03118.

I showed you how to find the p value for a t-test. Click the links below to see how it works for other hypothesis tests:

  • One-Way ANOVA F-test
  • Chi-square Test of Independence

Now that we’ve found the p value, how do you interpret it precisely? If you’re going beyond the significant/not significant decision and really want to understand what it means, read my posts, Interpreting P Values  and Statistical Significance: Definition & Meaning .

If you’re learning about hypothesis testing and like the approach I use in my blog, check out my Hypothesis Testing book! You can find it at Amazon and other retailers.

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Share this:

hypothesis testing alpha and p value

Reader Interactions

' src=

January 9, 2024 at 9:58 am

how did you get the 0.01559? is it from the t table or somewhere else. please put me through

' src=

January 9, 2024 at 3:13 pm

The value of 0.01559 comes from the t-distribution. It’s the probability of each red shaded region in the graph I show. These regions are based on the t-value. Typically, you’ll use either statistical software or a t-distribution calculator to find probabilities associated with t-values. Or use a t-table. I used my statistical software. You don’t calculate those probabilities yourself because the calculations are complex.

I hope that helps!

' src=

November 23, 2022 at 2:08 am

Simply superb. Easy for us who are starters to enjoy statistic made enjoyable.

' src=

November 22, 2022 at 6:41 pm

I like the way your presentation so that every one can undersanf in the simplest way. If you can support this by power point it will be more intetrsted. I know it takes your valuable time. However, forwarding your knowledge to those who need is more valuable, supporting and appreciation. Continue doing this teaching approach. Thank you. I wish you all the best. God bless you.

Comments and Questions Cancel reply

IMAGES

  1. hypothesis testing Alpha Levels, a threshold for P value

    hypothesis testing alpha and p value

  2. P-Value

    hypothesis testing alpha and p value

  3. Hypothesis testing tutorial using p value method

    hypothesis testing alpha and p value

  4. What is P-value in hypothesis testing

    hypothesis testing alpha and p value

  5. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    hypothesis testing alpha and p value

  6. Understanding P-Values and Statistical Significance

    hypothesis testing alpha and p value

VIDEO

  1. Hypothesis Testing Large Sample Mean

  2. More about hypothesis tests (Part 2): Alpha, Beta, Power

  3. Hypothesis Testing

  4. #20 Statistical Tests: Null Hypothesis, p-value, alpha in Excel with XLSTAT

  5. Hypothesis Testing

  6. Alpha, p-value, test statistic, critical value, rejection zone

COMMENTS

  1. P-Value vs. Alpha: What's the Difference?

    A p-value tells us the probability of obtaining an effect at least as large as the one we actually observed in the sample data. 2. An alpha level is the probability of incorrectly rejecting a true null hypothesis. 3. If the p-value of a hypothesis test is less than the alpha level, then we can reject the null hypothesis.

  2. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  3. Understanding Hypothesis Tests: Significance Levels (Alpha) and P

    The P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level. If we stick to a significance level of 0.05, we can conclude that the average energy cost for the population is greater than 260. A common mistake is to interpret the P-value as the probability that the null hypothesis is true.

  4. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  5. Hypothesis Testing, P Values, Confidence Intervals, and Significance

    Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting ...

  6. Understanding P-values

    The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis. The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

  7. Interpreting P values

    Here is the technical definition of P values: P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true. Let's go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03.

  8. 9.3

    The test statistic is, therefore: Z = p ^ − p 0 p 0 ( 1 − p 0) n = 0.853 − 0.90 0.90 ( 0.10) 150 = − 1.92. And, the rejection region is: Z P lesson 9.3 α = 0.05 -1.645 0 0.90. Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that the rate has ...

  9. Understanding Significance Levels in Statistics

    It's not the standard level of 0.05. If you were to use a significance level of 0.05, then a p-value of 0.049 would be significant. However, a p-value in that range does not really provide strong evidence that an effect exists in the population. In other words, there's a relatively high chance of a false positive even in that p-value region.

  10. Chapter 5: Hypothesis Testing and P-Values

    Procedure for Hypothesis Testing. (1) Define null hypothesis, H0. (2) Define alternative hypothesis, Ha. (3) Define c% interval. (4) Calculate the value of texp from the data. (5) Determine proper value of tα,ν t α, ν or tα 2,ν t α 2, ν using the degrees of freedom ν. (6) If texp falls in the reject H0 region, we reject H0 and accept ...

  11. The Difference Between Alpha and P-Values

    P-Values. The other number that is part of a test of significance is a p-value. A p-value is also a probability, but it comes from a different source than alpha. Every test statistic has a corresponding probability or p-value. This value is the probability that the observed statistic occurred by chance alone, assuming that the null hypothesis ...

  12. P-Value in Statistical Hypothesis Tests: What is it?

    Graphically, the p value is the area in the tail of a probability distribution. It's calculated when you run hypothesis test and is the area to the right of the test statistic (if you're running a two-tailed test, it's the area to the left and to the right). P Value vs Alpha level. Alpha levels are controlled by the researcher and are ...

  13. Hypothesis Testing

    The p-value, < 0.0001, indicates that, if the average height in the population is 65 inches, it is unlikely that a sample of 54 students would have an average height of 66.4630. Alpha = 0.05. Decision: p-value < alpha, thus reject the null hypothesis. Conclude that the average height is not equal to 65.

  14. 4.4: Hypothesis Testing

    Because the p-value is less than the significance level \((p-value = 0.007 < 0.05 = \alpha)\), we reject the null hypothesis. ... Two-sided hypothesis testing with p-values. We now consider how to compute a p-value for a two-sided test. In one-sided tests, we shade the single tail in the direction of the alternative hypothesis. For example ...

  15. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  16. 7.5: Critical values, p-values, and significance level

    When we use z z -scores in this way, the obtained value of z z (sometimes called z z -obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis. The formula for our z z -statistic has not changed: z = X¯¯¯¯ − μ σ¯/ n−−√ (7.5.1) (7.5.1) z = X ¯ − μ σ ¯ / n.

  17. 6 Week 5 Introduction to Hypothesis Testing Reading

    (You reject the null when p is less than alpha.) Example of a Statistical Decision: Retain the null hypothesis, because p=0.12 > alpha=0.01. The p-value will come from SPSS output, and the alpha will have already been determined back in Step 3. You must be very careful when you compare the decimal values of the p-value and alpha.

  18. An Easy Introduction to Statistical Significance (With Examples)

    The p value determines statistical significance. An extremely low p value indicates high statistical significance, while a high p value means low or no statistical significance. Example: Hypothesis testing. To test your hypothesis, you first collect data from two groups. The experimental group actively smiles, while the control group does not.

  19. Alphas, P-Values, and Confidence Intervals, Oh My!

    Like all probabilities, alpha ranges from 0 to 1. What Is the P-Value of a Hypothesis Test? Once you've chosen alpha, you're ready to conduct your hypothesis test. Suppose you want to run a 1-sample t-test to determine whether or not the average price of Cairn terriers—like Dorothy's dog Toto—is equal to, say, $400.

  20. P-Value Method for Hypothesis Testing

    What is the P-value method in Hypothesis Testing? The P-value method is used in Hypothesis Testing to check the significance of the given Null Hypothesis. Then, deciding to reject or support it is based upon the specified significance level or threshold. ... P-value = 0.002 Alpha (Significance Level) = 0.05. We notice that, P-value ...

  21. P Values (Calculated Probability) and Hypothesis Testing

    The term significance level (alpha) is used to refer to a pre-chosen probability and the term "P value" is used to indicate a probability that you calculate after a given study. The alternative hypothesis (H 1 ) is the opposite of the null hypothesis; in plain language terms this is usually the hypothesis you set out to investigate.

  22. hypothesis testing

    The big trouble is, that the alpha value is very often used mistakenly together with hypothesis testing, e.g. testing the difference of means between samples and using p < 0.05 as criteria. The p-value is interpreted as "there is a <5% chance that the effect is not significant, while it actually is (type I error?)".

  23. How to Find the P value: Process and Calculations

    This chart has two shaded regions because we performed a two-tailed test. Each region has a probability of 0.01559. When you sum them, you obtain the p-value of 0.03118. In other words, the likelihood of a t-value falling in either shaded region when the null hypothesis is true is 0.03118. I showed you how to find the p value for a t-test.

  24. p-value

    The p -value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic . [note 2] The lower the p -value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically ...