## Section 10.4: Hypothesis Tests for a Population Standard Deviation

- 10.1 The Language of Hypothesis Testing
- 10.2 Hypothesis Tests for a Population Proportion
- 10.3 Hypothesis Tests for a Population Mean
- 10.4 Hypothesis Tests for a Population Standard Deviation
- 10.5 Putting It Together: Which Method Do I Use?

By the end of this lesson, you will be able to...

- test hypotheses about a population standard deviation

For a quick overview of this section, watch this short video summary:

Before we begin this section, we need a quick refresher of the Χ 2 distribution.

## The Chi-Square ( Χ 2 ) distribution

Reminder: "chi-square" is pronounced "kai" as in sky, not "chai" like the tea .

If a random sample size n is obtained from a normally distributed population with mean μ and standard deviation σ , then

has a chi-square distribution with n-1 degrees of freedom.

## Properties of the Χ 2 distribution

- It is not symmetric.
- The shape depends on the degrees of freedom.
- As the number of degrees of freedom increases, the distribution becomes more symmetric.
- Χ 2 ≥0

## Finding Probabilities Using StatCrunch

Click on > > Enter the degrees of freedom, the direction of the inequality, and X. Then press . |

We again have some conditions that need to be true in order to perform the test

- the sample was randomly selected, and
- the population from which the sample is drawn is normally distributed

Note that in the second requirement, the population must be normally distributed. The steps in performing the hypothesis test should be familiar by now.

## Performing a Hypothesis Test Regarding Ï

Step 1 : State the null and alternative hypotheses.

H : = H : ≠ | H : = H : < | H : = H : > |

Step 2 : Decide on a level of significance, α .

Step 4 : Determine the P -value.

Step 5 : Reject the null hypothesis if the P -value is less than the level of significance, α.

Step 6 : State the conclusion.

In Example 2 , in Section 10.2, we assumed that the standard deviation for the resting heart rates of ECC students was 12 bpm. Later, in Example 2 in Section 10.3, we considered the actual sample data below.

61 | 63 | 64 | 65 | 65 |

67 | 71 | 72 | 73 | 74 |

75 | 77 | 79 | 80 | 81 |

82 | 83 | 83 | 84 | 85 |

86 | 86 | 89 | 95 | 95 |

( Click here to view the data in a format more easily copied.)

Based on this sample, is there enough evidence to say that the standard deviation of the resting heart rates for students in this class is different from 12 bpm?

Note: Be sure to check that the conditions for performing the hypothesis test are met.

[ reveal answer ]

From the earlier examples, we know that the resting heart rates could come from a normally distributed population and there are no outliers.

Step 1 : H 0 : σ = 12 H 1 : σ ≠ 12

Step 2 : α = 0.05

Step 4 : P -value = 2P( Χ 2 > 15.89) ≈ 0.2159

Step 5 : Since P -value > α , we do not reject H 0 .

Step 6 : There is not enough evidence at the 5% level of significance to support the claim that the standard deviation of the resting heart rates for students in this class is different from 12 bpm.

## Hypothesis Testing Regarding σ Using StatCrunch

> > if you have the data, or if you only have the summary statistics. , then click . |

Let's look at Example 1 again, and try the hypothesis test with technology.

Using DDXL:

Using StatCrunch:

<< previous section | next section >>

## Hypothesis Test for Population Standard Deviation for normal population

. For this situation it is important that the population has a normal distribution but we do not need to know, ahead of time, the mean or standard deviation of that distribution. , of the population. . That is, someone (perhaps us) claims that : σ = a, for some value . : σ > a, : σ < a, or : σ ≠ a. against H . that we will use for this test. The , is the chance that we are willing to take that we will make a , that is, that we will when, in fact, it is true. | ||

drawn from this population will have a distribution of the of times the to the that is a with degrees of freedom. Thus, if is true and the population standard deviation is , then for samples of size the statistic distribution with degrees of freedom. At this point we proceed via the or by the These are just different ways to create a situation where we can finally make a decision. The tended to be used more often when everyone needed to use the tables. The is more commonly used now that we have calculators and computers to do the computations. Of course either approach can be done with tables, calculators, or computers. Either approach gives the same final result. |

H : σ = 4.63 | 16 | 3.24 | H : σ < 4.63 | 0.075 | |

H : σ = 4.63 | 16 | 3.57 | H : σ < 4.63 | 0.075 | |

H : σ = 18.43 | 32 | 22.52 | H : σ > 18.43 | 0.02 | |

H : σ = 18.43 | 32 | 23.45 | H : σ > 18.43 | 0.02 | |

H : σ = 7.35 | 28 | 5.78 | H : σ ≠ 7.35 | 0.08 | |

H : σ = 7.35 | 41 | 5.78 | H : σ ≠ 7.35 | 0.08 |

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

## 8.6 Hypothesis Tests for a Population Mean with Known Population Standard Deviation

Learning objectives.

- Conduct and interpret hypothesis tests for a population mean with known population standard deviation.

Some notes about conducting a hypothesis test:

- The null hypothesis [latex]H_0[/latex] is always an “equal to.” The null hypothesis is the original claim about the population parameter.
- The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.” The form of the alternative hypothesis depends on the context of the question.
- If the alternative hypothesis is a “less than”, then the test is left-tail. The p -value is the area in the left-tail of the distribution.
- If the alternative hypothesis is a “greater than”, then the test is right-tail. The p -value is the area in the right-tail of the distribution.
- If the alternative hypothesis is a “not equal to”, then the test is two-tail. The p -value is the sum of the area in the two-tails of the distribution. Each tail represents exactly half of the p -value.
- Think about the meaning of the p -value. A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of 0.05. Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
- The significance level must be identified before collecting the sample data and conducting the test. Generally, the significance level will be included in the question. If no significance level is given, a common standard is to use a significance level of 5%.
- An alternative approach for hypothesis testing is to use what is called the critical value approach . In this book, we will only use the p -value approach. Some of the videos below may mention the critical value approach, but this approach will not be used in this book.

Suppose the hypotheses for a hypothesis test are:

[latex]\begin{eqnarray*} H_0: & & \mu=5 \\ H_a: & & \mu \lt 5 \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\lt[/latex], this is a left-tailed test. The p -value is the area in the left-tail of the distribution.

[latex]\begin{eqnarray*} H_0: & & \mu=0.5 \\ H_a: & & \mu \neq 0.5 \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\neq[/latex], this is a two-tailed test. The p -value is the sum of the areas in the two tails of the distribution. Each tail contains exactly half of the p -value.

[latex]\begin{eqnarray*} H_0: & & \mu=10 \\ H_a: & & \mu \lt 10 \end{eqnarray*}[/latex]

## Steps to Conduct a Hypothesis Test for a Population Mean with Known Population Standard Deviation

- Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex]. Include appropriate units with the values of the mean.
- Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
- Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
- When the population standard deviation is known , we use a normal distribution with [latex]\displaystyle{z=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}}[/latex] to find the p -value. The p -value is the area in the corresponding tail of the normal distribution.
- The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
- The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
- Write down a concluding sentence specific to the context of the question.

## USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION MEAN WITH KNOWN POPULATION STANDARD DEVIATION

The p -value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean. When the population standard deviation is known, use the normal distribution to find the p -value.

The p -value is the area in the tail(s) of a normal distribution, so the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculate the p -value.

- For x , enter the value for [latex]\overline{x}[/latex].
- For [latex]\mu[/latex] , enter the mean of the sample means [latex]\mu[/latex]. Note: Because the test is run assuming the null hypothesis is true, the value for [latex]\mu[/latex] is the claim from the null hypothesis.
- For [latex]\sigma[/latex] , enter the standard error of the mean [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
- For the logic operator , enter true . Note: Because we are calculating the area under the curve, we always enter true for the logic operator.

Use the appropriate technique with the norm.dist function to find the area in the left-tail or the area in the right-tail.

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds with a standard deviation of 0.8 seconds for swimming the 25-meter freestyle. His dad, Frank, thought that Jeffrey could swim the 25-meter freestyle faster using goggles. Frank bought Jeffrey a new pair of goggles and timed Jeffrey swimming the 25-meter freestyle 15 different times. In the sample of 15 swims, Jeffrey’s mean time was 16 seconds. Frank thought that the goggles helped Jeffrey swim faster than 16.43 seconds. At the 5% significance level, did Jeffrey swim faster wearing the goggles? Assume that the swim times for the 25-meter freestyle are normally distributed.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \mu=16.43 \mbox{ seconds} \\ H_a: & & \mu \lt 16.43 \mbox{ seconds} \end{eqnarray*}[/latex]

From the question, we have [latex]n=15[/latex], [latex]\overline{x}=16[/latex], [latex]\sigma=0.8[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=0.8[/latex]). So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left-tail of the distribution.

norm.dist | ||

16 | 0.0187 | |

16.43 | ||

0.8/sqrt(15) | ||

true |

So the p -value[latex]=0.0187[/latex].

Conclusion:

Because p -value[latex]=0.0187 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that Jeffrey’s mean swim time with the goggles is less than 16.43 seconds.

- The null hypothesis [latex]\mu=16.43[/latex] is the claim that Jeffrey’s mean swim time with the goggles is 16.43 seconds (the same as it is without the googles).
- The alternative hypothesis [latex]\mu \lt 16.43[/latex] is the claim that Jeffrey’s swim time with the goggles is less than 16.43 seconds.
- The function is norm.dist because we are finding the area in the left tail of a normal distribution.
- Field 1 is the value of [latex]\overline{x}[/latex]
- Field 2 is the value of [latex]\mu[/latex] from the null hypothesis. Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]\mu=16.43[/latex].
- Field 3 is the standard deviation for the sample means [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex]. Note that we are not using the standard deviation from the population ([latex]\sigma=0.8[/latex]). This is because the p -value is the area under the curve of the distribution of the sample means, not the distribution of the population.
- The p -value of 0.0187 tells us that under the assumption that Jeffrey’s mean swim time with goggles is 16.43 seconds (the null hypothesis), there is only a 1.87% chance that the mean time for the 15 sample swims is 16 seconds or less. This is a small probability, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.
- The Type I error for this problem is to conclude that Jeffrey swims the 25-meter freestyle, on average, in less than 16.43 seconds (the alternative hypothesis) when, in fact, he actually swims the 25-meter freestyle, on average, in 16.43 seconds (the null hypothesis). That is, reject the null hypothesis when the null hypothesis is actually true.
- The Type II error for this problem is to conclude that Jeffrey swims the 25-meter freestyle, on average, in 16.43 seconds (the null hypothesis) when, in fact, he actually swims the 25-meter freestyle, on average, in less than 16.43 seconds (the alternative hypothesis). That is, do not reject the null hypothesis when the null hypothesis is actually false.

The mean throwing distance of a football for Marco, a high school freshman quarterback, is 40 yards with a standard deviation of 2 yards. The team coach tells Marco to adjust his grip to get more distance. The coach records the distances for 20 throws with the new grip. For the 20 throws, Marco’s mean distance was 41.5 yards. The coach thought the different grip helped Marco throw farther than 40 yards. At the 5% significance level, is Marco’s mean throwing distance higher with the new grip? Assume the throw distances for footballs are normally distributed.

[latex]\begin{eqnarray*} H_0: & & \mu=40 \mbox{ yards} \\ H_a: & & \mu \gt 40 \mbox{ yards} \end{eqnarray*}[/latex]

From the question, we have [latex]n=20[/latex], [latex]\overline{x}=41.5[/latex], [latex]\sigma=2[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=2[/latex]). So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right-tail of the distribution.

1-norm.dist | ||

41.5 | 0.0004 | |

40 | ||

2/sqrt(20) | ||

true |

So the p -value[latex]=0.0004[/latex].

Because p -value[latex]=0.0004 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that Marco’s mean throwing distance is greater than 40 yards with the new grip.

- The null hypothesis [latex]\mu=40[/latex] is the claim that Marco’s mean throwing distance with the new grip is 40 yards (the same as it is without the new grip).
- The alternative hypothesis [latex]\mu \gt 40[/latex] is the claim that Marco’s mean throwing distance with the new grip is greater than 40 yards.
- Field 2 is the value of [latex]\mu[/latex] from the null hypothesis.
- Field 3 is the standard deviation for the sample means [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
- The p -value of 0.0004 tells us that under the assumption that Marco’s mean throwing distance with the new grip is 40 yards, there is only a 0.047% chance that the mean throwing distance for the 20 sample throws is more than 40 yards. This is a small probability, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.

A local college states in its marketing materials that the average age of its first-year students is 18.3 years with a standard deviation of 3.4 years. But this information is based on old data and does not take into account that more older adults are returning to college. A researcher at the college believes that the average age of its first-year students has changed. The researcher takes a sample of 50 first-year students and finds the average age is 19.5 years. At the 1% significance level, has the average age of the college’s first-year students changed?

[latex]\begin{eqnarray*} H_0: & & \mu=18.3 \mbox{ years} \\ H_a: & & \mu \neq 18.3 \mbox{ years} \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]\overline{x}=19.5[/latex], [latex]\sigma=3.4[/latex] and [latex]\alpha=0.01[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=3.4[/latex]). In this case, the sample size is greater than 30. So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

Because there is only one sample, we only have information relating to one of the two tails, either the left tail or the right tail. We need to know if the sample relates to the left tail or right tail because that will determine how we calculate out the area of that tail using the normal distribution. In this case, the sample mean [latex]\overline{x}=19.5[/latex] is greater than the value of the population mean in the null hypothesis [latex]\mu=18.3[/latex] ([latex]\overline{x}=19.5>18.3=\mu[/latex]), so the sample information relates to the right-tail of the normal distribution. This means that we will calculate out the area in the right tail using 1-norm.dist . However, this is a two-tailed test where the p -value is the sum of the area in the two tails and the area in the right-tail is only one half of the p -value. The area in the left tail equals the area in the right tail and the p -value is the sum of these two areas.

1-norm.dist | ||

19.5 | 0.0063 | |

18.3 | ||

3.4/sqrt(50) | ||

true |

So the area in the right tail is 0.0063 and [latex]\frac{1}{2}[/latex]( p -value)[latex]=0.0063[/latex]. This is also the area in the left tail, so

p -value[latex]=0.0063+0.0063=0.0126[/latex]

Because p -value[latex]=0.0126 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis. At the 1% significance level there is not enough evidence to suggest that the average age of the college’s first-year students has changed.

- The null hypothesis [latex]\mu=18.3[/latex] is the claim that the average age of the first-year students is still 18.3 years.
- The alternative hypothesis [latex]\mu \neq 18.3[/latex] is the claim that the average age of the first-year students has changed from 18.3 years.
- We use norm.dist([latex]\overline{x}[/latex],[latex]\mu[/latex],[latex]\sigma/\mbox{sqrt}(n)[/latex],true) to find the area in the left tail. The area in the right tail equals the area in the left tail, so we can find the p -value by adding the output from this function to itself.
- We use 1-norm.dist([latex]\overline{x}[/latex],[latex]\mu[/latex],[latex]\sigma/\mbox{sqrt}(n)[/latex],true) to find the area in the right tail. The area in the left tail equals the area in the right tail, so we can find the p -value by adding the output from this function to itself.
- The p -value of 0.0126 is a large probability compared to the 1% significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the claim that the average age of first-year students is 18.3 years is most likely correct.

Watch this video: Hypothesis Testing: z -test, right tail by ExcelIsFun [33:47]

Watch this video: Hypothesis Testing: z -test, left tail by ExcelIsFun [10:57]

Watch this video: Hypothesis Testing: z -test, two tail by ExcelIsFun [9:56]

## Concept Review

The hypothesis test for a population mean is a well established process:

- Collect the sample information for the test and identify the significance level.
- When the population standard deviation is known, find the p -value (the area in the corresponding tail) for the test using the normal distribution.
- Compare the p -value to the significance level and state the outcome of the test.

## Attribution

“ 9.6 Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

## Hypothesis Testing Framework

Now that we've seen an example and explored some of the themes for hypothesis testing, let's specify the procedure that we will follow.

## Hypothesis Testing Steps

The formal framework and steps for hypothesis testing are as follows:

- Identify and define the parameter of interest
- Define the competing hypotheses to test
- Set the evidence threshold, formally called the significance level
- Generate or use theory to specify the sampling distribution and check conditions
- Calculate the test statistic and p-value
- Evaluate your results and write a conclusion in the context of the problem.

We'll discuss each of these steps below.

## Identify Parameter of Interest

First, I like to specify and define the parameter of interest. What is the population that we are interested in? What characteristic are we measuring?

By defining our population of interest, we can confirm that we are truly using sample data. If we find that we actually have population data, our inference procedures are not needed. We could proceed by summarizing our population data.

By identifying and defining the parameter of interest, we can confirm that we use appropriate methods to summarize our variable of interest. We can also focus on the specific process needed for our parameter of interest.

In our example from the last page, the parameter of interest would be the population mean time that a host has been on Airbnb for the population of all Chicago listings on Airbnb in March 2023. We could represent this parameter with the symbol $\mu$. It is best practice to fully define $\mu$ both with words and symbol.

## Define the Hypotheses

For hypothesis testing, we need to decide between two competing theories. These theories must be statements about the parameter. Although we won't have the population data to definitively select the correct theory, we will use our sample data to determine how reasonable our "skeptic's theory" is.

The first hypothesis is called the null hypothesis, $H_0$. This can be thought of as the "status quo", the "skeptic's theory", or that nothing is happening.

Examples of null hypotheses include that the population proportion is equal to 0.5 ($p = 0.5$), the population median is equal to 12 ($M = 12$), or the population mean is equal to 14.5 ($\mu = 14.5$).

The second hypothesis is called the alternative hypothesis, $H_a$ or $H_1$. This can be thought of as the "researcher's hypothesis" or that something is happening. This is what we'd like to convince the skeptic to believe. In most cases, the desired outcome of the researcher is to conclude that the alternative hypothesis is reasonable to use moving forward.

Examples of alternative hypotheses include that the population proportion is greater than 0.5 ($p > 0.5$), the population median is less than 12 ($M < 12$), or the population mean is not equal to 14.5 ($\mu \neq 14.5$).

There are a few requirements for the hypotheses:

- the hypotheses must be about the same population parameter,
- the hypotheses must have the same null value (provided number to compare to),
- the null hypothesis must have the equality (the equals sign must be in the null hypothesis),
- the alternative hypothesis must not have the equality (the equals sign cannot be in the alternative hypothesis),
- there must be no overlap between the null and alternative hypothesis.

You may have previously seen null hypotheses that include more than an equality (e.g. $p \le 0.5$). As long as there is an equality in the null hypothesis, this is allowed. For our purposes, we will simplify this statement to ($p = 0.5$).

To summarize from above, possible hypotheses statements are:

$H_0: p = 0.5$ vs. $H_a: p > 0.5$

$H_0: M = 12$ vs. $H_a: M < 12$

$H_0: \mu = 14.5$ vs. $H_a: \mu \neq 14.5$

In our second example about Airbnb hosts, our hypotheses would be:

$H_0: \mu = 2100$ vs. $H_a: \mu > 2100$.

## Set Threshold (Significance Level)

There is one more step to complete before looking at the data. This is to set the threshold needed to convince the skeptic. This threshold is defined as an $\alpha$ significance level. We'll define exactly what the $\alpha$ significance level means later. For now, smaller $\alpha$s correspond to more evidence being required to convince the skeptic.

A few common $\alpha$ levels include 0.1, 0.05, and 0.01.

For our Airbnb hosts example, we'll set the threshold as 0.02.

## Determine the Sampling Distribution of the Sample Statistic

The first step (as outlined above) is the identify the parameter of interest. What is the best estimate of the parameter of interest? Typically, it will be the sample statistic that corresponds to the parameter. This sample statistic, along with other features of the distribution will prove especially helpful as we continue the hypothesis testing procedure.

However, we do have a decision at this step. We can choose to use simulations with a resampling approach or we can choose to rely on theory if we are using proportions or means. We then also need to confirm that our results and conclusions will be valid based on the available data.

## Required Condition

The one required assumption, regardless of approach (resampling or theory), is that the sample is random and representative of the population of interest. In other words, we need our sample to be a reasonable sample of data from the population.

## Using Simulations and Resampling

If we'd like to use a resampling approach, we have no (or minimal) additional assumptions to check. This is because we are relying on the available data instead of assumptions.

We do need to adjust our data to be consistent with the null hypothesis (or skeptic's claim). We can then rely on our resampling approach to estimate a plausible sampling distribution for our sample statistic.

Recall that we took this approach on the last page. Before simulating our estimated sampling distribution, we adjusted the mean of the data so that it matched with our skeptic's claim, shown in the code below.

We'll see a few more examples on the next page.

## Using Theory

On the other hand, we could rely on theory in order to estimate the sampling distribution of our desired statistic. Recall that we had a few different options to rely on:

- the CLT for the sampling distribution of a sample mean
- the binomial distribution for the sampling distribution of a proportion (or count)
- the Normal approximation of a binomial distribution (using the CLT) for the sampling distribution of a proportion

If relying on the CLT to specify the underlying sampling distribution, you also need to confirm:

- having a random sample and
- having a sample size that is less than 10% of the population size if the sampling is done without replacement
- having a Normally distributed population for a quantitative variable OR
- having a large enough sample size (usually at least 25) for a quantitative variable
- having a large enough sample size for a categorical variable (defined by $np$ and $n(1-p)$ being at least 10)

If relying on the binomial distribution to specify the underlying sampling distribution, you need to confirm:

- having a set number of trials, $n$
- having the same probability of success, $p$ for each observation

After determining the appropriate theory to use, we should check our conditions and then specify the sampling distribution for our statistic.

For the Airbnb hosts example, we have what we've assumed to be a random sample. It is not taken with replacement, so we also need to assume that our sample size (700) is less than 10% of our population size. In other words, we need to assume that the population of Chicago Airbnbs in March 2023 was at least 7000. Since we do have our (presumed) population data available, we can confirm that there were at least 7000 Chicago Airbnbs in the population in 2023.

Additionally, we can confirm that normality of the sampling distribution applies for the CLT to apply. Our sample size is more than 25 and the parameter of interest is a mean, so this meets our necessary criteria for the normality condition to be valid.

With the conditions now met, we can estimate our sampling distribution. From the CLT, we know that the distribution for the sample mean should be $\bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})$.

Now, we face our next challenge -- what to plug in as the mean and standard error for this distribution. Since we are adopting the skeptic's point of view for the purpose of this approach, we can plug in the value of $\mu_0 = 2100$. We also know that the sample size $n$ is 700. But what should we plug in for the population standard deviation $\sigma$?

When we don't know the value of a parameter, we will generally plug in our best estimate for the parameter. In this case, that corresponds to plugging in $\hat{\sigma}$, or our sample standard deviation.

Now, our estimated sampling distribution based on the CLT is: $\bar{X} \sim N(2100, 41.4045)$.

If we compare to our corresponding skeptic's sampling distribution on the last page, we can confirm that the theoretical sampling distribution is similar to the simulated sampling distribution based on resampling.

## Assumptions not met

What do we do if the necessary conditions aren't met for the sampling distribution? Because the simulation-based resampling approach has minimal assumptions, we should be able to use this approach to produce valid results as long as the provided data is representative of the population.

The theory-based approach has more conditions, and we may not be able to meet all of the necessary conditions. For example, if our parameter is something other than a mean or proportion, we may not have appropriate theory. Additionally, we may not have a large enough sample size.

- First, we could consider changing approaches to the simulation-based one.
- Second, we might look at how we could meet the necessary conditions better. In some cases, we may be able to redefine groups or make adjustments so that the setup of the test is closer to what is needed.
- As a last resort, we may be able to continue following the hypothesis testing steps. In this case, your calculations may not be valid or exact; however, you might be able to use them as an estimate or an approximation. It would be crucial to specify the violation and approximation in any conclusions or discussion of the test.

## Calculate the evidence with statistics and p-values

Now, it's time to calculate how much evidence the sample contains to convince the skeptic to change their mind. As we saw above, we can convince the skeptic to change their mind by demonstrating that our sample is unlikely to occur if their theory is correct.

How do we do this? We do this by calculating a probability associated with our observed value for the statistic.

For example, for our situation, we want to convince the skeptic that the population mean is actually greater than 2100 days. We do that by calculating the probability that a sample mean would be as large or larger than what we observed in our actual sample, which was 2188 days. Why do we need the larger portion? We use the larger portion because a sample mean of 2200 days also provides evidence that the population mean is larger than 2100 days; it isn't limited to exactly what we observed in our sample. We call this specific probability the p-value.

That is, the p-value is the probability of observing a test statistic as extreme or more extreme (as determined by the alternative hypothesis), assuming the null hypothesis is true.

Our observed p-value for the Airbnb host example demonstrates that the probability of getting a sample mean host time of 2188 days (the value from our sample) or more is 1.46%, assuming that the true population mean is 2100 days.

## Test statistic

Notice that the formal definition of a p-value mentions a test statistic . In most cases, this word can be replaced with "statistic" or "sample" for an equivalent statement.

Oftentimes, we'll see that our sample statistic can be used directly as the test statistic, as it was above. We could equivalently adjust our statistic to calculate a test statistic. This test statistic is often calculated as:

$\text{test statistic} = \frac{\text{estimate} - \text{hypothesized value}}{\text{standard error of estimate}}$

## P-value Calculation Options

Note also that the p-value definition includes a probability associated with a test statistic being as extreme or more extreme (as determined by the alternative hypothesis . How do we determine the area that we consider when calculating the probability. This decision is determined by the inequality in the alternative hypothesis.

For example, when we were trying to convince the skeptic that the population mean is greater than 2100 days, we only considered those sample means that we at least as large as what we observed -- 2188 days or more.

If instead we were trying to convince the skeptic that the population mean is less than 2100 days ($H_a: \mu < 2100$), we would consider all sample means that were at most what we observed - 2188 days or less. In this case, our p-value would be quite large; it would be around 99.5%. This large p-value demonstrates that our sample does not support the alternative hypothesis. In fact, our sample would encourage us to choose the null hypothesis instead of the alternative hypothesis of $\mu < 2100$, as our sample directly contradicts the statement in the alternative hypothesis.

If we wanted to convince the skeptic that they were wrong and that the population mean is anything other than 2100 days ($H_a: \mu \neq 2100$), then we would want to calculate the probability that a sample mean is at least 88 days away from 2100 days. That is, we would calculate the probability corresponding to 2188 days or more or 2012 days or less. In this case, our p-value would be roughly twice the previously calculated p-value.

We could calculate all of those probabilities using our sampling distributions, either simulated or theoretical, that we generated in the previous step. If we chose to calculate a test statistic as defined in the previous section, we could also rely on standard normal distributions to calculate our p-value.

## Evaluate your results and write conclusion in context of problem

Once you've gathered your evidence, it's now time to make your final conclusions and determine how you might proceed.

In traditional hypothesis testing, you often make a decision. Recall that you have your threshold (significance level $\alpha$) and your level of evidence (p-value). We can compare the two to determine if your p-value is less than or equal to your threshold. If it is, you have enough evidence to persuade your skeptic to change their mind. If it is larger than the threshold, you don't have quite enough evidence to convince the skeptic.

Common formal conclusions (if given in context) would be:

- I have enough evidence to reject the null hypothesis (the skeptic's claim), and I have sufficient evidence to suggest that the alternative hypothesis is instead true.
- I do not have enough evidence to reject the null hypothesis (the skeptic's claim), and so I do not have sufficient evidence to suggest the alternative hypothesis is true.

The only decision that we can make is to either reject or fail to reject the null hypothesis (we cannot "accept" the null hypothesis). Because we aren't actively evaluating the alternative hypothesis, we don't want to make definitive decisions based on that hypothesis. However, when it comes to making our conclusion for what to use going forward, we frame this on whether we could successfully convince someone of the alternative hypothesis.

A less formal conclusion might look something like:

Based on our sample of Chicago Airbnb listings, it seems as if the mean time since a host has been on Airbnb (for all Chicago Airbnb listings) is more than 5.75 years.

## Significance Level Interpretation

We've now seen how the significance level $\alpha$ is used as a threshold for hypothesis testing. What exactly is the significance level?

The significance level $\alpha$ has two primary definitions. One is that the significance level is the maximum probability required to reject the null hypothesis; this is based on how the significance level functions within the hypothesis testing framework. The second definition is that this is the probability of rejecting the null hypothesis when the null hypothesis is true; in other words, this is the probability of making a specific type of error called a Type I error.

Why do we have to be comfortable making a Type I error? There is always a chance that the skeptic was originally correct and we obtained a very unusual sample. We don't want to the skeptic to be so convinced of their theory that no evidence can convince them. In this case, we need the skeptic to be convinced as long as the evidence is strong enough . Typically, the probability threshold will be low, to reduce the number of errors made. This also means that a decent amount of evidence will be needed to convince the skeptic to abandon their position in favor of the alternative theory.

## p-value Limitations and Misconceptions

In comparison to the $\alpha$ significance level, we also need to calculate the evidence against the null hypothesis with the p-value.

The p-value is the probability of getting a test statistic as extreme or more extreme (in the direction of the alternative hypothesis), assuming the null hypothesis is true.

Recently, p-values have gotten some bad press in terms of how they are used. However, that doesn't mean that p-values should be abandoned, as they still provide some helpful information. Below, we'll describe what p-values don't mean, and how they should or shouldn't be used to make decisions.

## Factors that affect a p-value

What features affect the size of a p-value?

- the null value, or the value assumed under the null hypothesis
- the effect size (the difference between the null value under the null hypothesis and the true value of the parameter)
- the sample size

More evidence against the null hypothesis will be obtained if the effect size is larger and if the sample size is larger.

## Misconceptions

We gave a definition for p-values above. What are some examples that p-values don't mean?

- A p-value is not the probability that the null hypothesis is correct
- A p-value is not the probability that the null hypothesis is incorrect
- A p-value is not the probability of getting your specific sample
- A p-value is not the probability that the alternative hypothesis is correct
- A p-value is not the probability that the alternative hypothesis is incorrect
- A p-value does not indicate the size of the effect

Our p-value is a way of measuring the evidence that your sample provides against the null hypothesis, assuming the null hypothesis is in fact correct.

## Using the p-value to make a decision

Why is there bad press for a p-value? You may have heard about the standard $\alpha$ level of 0.05. That is, we would be comfortable with rejecting the null hypothesis once in 20 attempts when the null hypothesis is really true. Recall that we reject the null hypothesis when the p-value is less than or equal to the significance level.

Consider what would happen if you have two different p-values: 0.049 and 0.051.

In essence, these two p-values represent two very similar probabilities (4.9% vs. 5.1%) and very similar levels of evidence against the null hypothesis. However, when we make our decision based on our threshold, we would make two different decisions (reject and fail to reject, respectively). Should this decision really be so simplistic? I would argue that the difference shouldn't be so severe when the sample statistics are likely very similar. For this reason, I (and many other experts) strongly recommend using the p-value as a measure of evidence and including it with your conclusion.

Putting too much emphasis on the decision (and having a significant result) has created a culture of misusing p-values. For this reason, understanding your p-value itself is crucial.

## Searching for p-values

The other concern with setting a definitive threshold of 0.05 is that some researchers will begin performing multiple tests until finding a p-value that is small enough. However, with a p-value of 0.05, we know that we will have a p-value less than 0.05 1 time out of every 20 times, even when the null hypothesis is true.

This means that if researchers start hunting for p-values that are small (sometimes called p-hacking), then they are likely to identify a small p-value every once in a while by chance alone. Researchers might then publish that result, even though the result is actually not informative. For this reason, it is recommended that researchers write a definitive analysis plan to prevent performing multiple tests in search of a result that occurs by chance alone.

## Best Practices

With all of this in mind, what should we do when we have our p-value? How can we prevent or reduce misuse of a p-value?

- Report the p-value along with the conclusion
- Specify the effect size (the value of the statistic)
- Define an analysis plan before looking at the data
- Interpret the p-value clearly to specify what it indicates
- Consider using an alternate statistical approach, the confidence interval, discussed next, when appropriate

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

## Statistics and probability

Course: statistics and probability > unit 12, hypothesis testing and p-values.

- One-tailed and two-tailed tests
- Z-statistics vs. T-statistics
- Small sample hypothesis test
- Large sample proportion hypothesis testing

## Want to join the conversation?

- Upvote Button navigates to signup page
- Downvote Button navigates to signup page
- Flag Button navigates to signup page

## Video transcript

## Hypothesis Tests for One or Two Variances or Standard Deviations

Chi-Square-tests and F-tests for variance or standard deviation both require that the original population be normally distributed.

## Testing a Claim about a Variance or Standard Deviation

To test a claim about the value of the variance or the standard deviation of a population, then the test statistic will follow a chi-square distribution with $n-1$ dgrees of freedom, and is given by the following formula.

$\chi^2 = \dfrac{(n-1)s^2}{\sigma_0^2}$ |

The television habits of 30 children were observed. The sample mean was found to be 48.2 hours per week, with a standard deviation of 12.4 hours per week. Test the claim that the standard deviation was at least 16 hours per week.

- The hypotheses are: $H_0: \sigma = 16$ $H_a: \sigma < 16$
- We shall choose $\alpha = 0.05$.
- The test statistic is $\chi^2 = \dfrac{(n-1)s^2}{\sigma_0^2} = \dfrac{(30-1)12.4^2}{16^2} = 17.418$.
- The p-value is $p = \chi^2\text{cdf}(0,17.418,29) = 0.0447$.
- The variation in television watching was less than 16 hours per week.

## Testing a the Difference of Two Variances or Two Standard Deviations

Two equal variances would satisfy the equation $\sigma_1^2 = \sigma_2^2$, which is equivalent to $\dfrac{ \sigma_1^2}{\sigma_2^2} = 1$. Since sample variances are related to chi-square distributions, and the ratio of chi-square distributions is an F-distribution, we can use the F-distribution to test against a null hypothesis of equal variances. Note that this approach does not allow us to test for a particular magnitude of difference between variances or standard deviations.

Given sample sizes of $n_1$ and $n_2$, the test statistic will have $n_1-1$ and $n_2-1$ degrees of freedom, and is given by the following formula.

$F = \dfrac{s_1^2}{s_2^2}$ |

If the larger variance (or standard deviation) is present in the first sample, then the test is right-tailed. Otherwise, the test is left-tailed. Most tables of the F-distribution assume right-tailed tests, but that requirement may not be necessary when using technology.

Samples from two makers of ball bearings are collected, and their diameters (in inches) are measured, with the following results:

- Acme: $n_1 = 80$, $s_1 = 0.0395$
- Bigelow: $n_2 = 120$, $s_2 = 0.0428$
- The hypotheses are: $H_0: \sigma_1 = \sigma_2$ $H_a: \sigma_1 \neq \sigma_2$
- The test statistic is $F = \dfrac{s_1^2}{s_2^2} = \dfrac{0.0395^2}{0.0428^2} = 0.8517$.
- Since the first sample had the smaller standard deviation, this is a left-tailed test. The p-value is $p = \operatorname{Fcdf}(0,0.8517,79,119) = 0.2232$.
- There is insufficient evidence to conclude that the diameters of the ball bearings in the two companies have different standard deviations.

If the two samples had been reversed in our computations, we would have obtained the test statistic $F = 1.1741$, and performing a right-tailed test, found the p-value $p = \operatorname{Fcdf}(1.1741,\infty,119,79) = 0.2232$. Of course, the answer is the same.

## 7.4.1 - Hypothesis Testing

Five step hypothesis testing procedure.

In the remaining lessons, we will use the following five step hypothesis testing procedure. This is slightly different from the five step procedure that we used when conducting randomization tests.

- Check assumptions and write hypotheses. The assumptions will vary depending on the test. In this lesson we'll be confirming that the sampling distribution is approximately normal by visually examining the randomization distribution. In later lessons you'll learn more objective assumptions. The null and alternative hypotheses will always be written in terms of population parameters; the null hypothesis will always contain the equality (i.e., \(=\)).
- Calculate the test statistic. Here, we'll be using the formula below for the general form of the test statistic.
- Determine the p-value. The p-value is the area under the standard normal distribution that is more extreme than the test statistic in the direction of the alternative hypothesis.
- Make a decision. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis.
- State a "real world" conclusion. Based on your decision in step 4, write a conclusion in terms of the original research question.

## General Form of a Test Statistic

When using a standard normal distribution (i.e., z distribution), the test statistic is the standardized value that is the boundary of the p-value. Recall the formula for a z score: \(z=\frac{x-\overline x}{s}\). The formula for a test statistic will be similar. When conducting a hypothesis test the sampling distribution will be centered on the null parameter and the standard deviation is known as the standard error.

This formula puts our observed sample statistic on a standard scale (e.g., z distribution). A z score tells us where a score lies on a normal distribution in standard deviation units. The test statistic tells us where our sample statistic falls on the sampling distribution in standard error units.

## 7.4.1.1 - Video Example: Mean Body Temperature

Research question: Is the mean body temperature in the population different from 98.6° Fahrenheit?

## 7.4.1.2 - Video Example: Correlation Between Printer Price and PPM

Research question: Is there a positive correlation in the population between the price of an ink jet printer and how many pages per minute (ppm) it prints?

## 7.4.1.3 - Example: Proportion NFL Coin Toss Wins

Research question: Is the proportion of NFL overtime coin tosses that are won different from 0.50?

StatKey was used to construct a randomization distribution:

## Step 1: Check assumptions and write hypotheses

From the given StatKey output, the randomization distribution is approximately normal.

\(H_0\colon p=0.50\)

\(H_a\colon p \ne 0.50\)

## Step 2: Calculate the test statistic

\(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

The sample statistic is the proportion in the original sample, 0.561. The null parameter is 0.50. And, the standard error is 0.024.

\(test\;statistic=\dfrac{0.561-0.50}{0.024}=\dfrac{0.061}{0.024}=2.542\)

## Step 3: Determine the p value

The p value will be the area on the z distribution that is more extreme than the test statistic of 2.542, in the direction of the alternative hypothesis. This is a two-tailed test:

The p value is the area in the left and right tails combined: \(p=0.0055110+0.0055110=0.011022\)

## Step 4: Make a decision

The p value (0.011022) is less than the standard 0.05 alpha level, therefore we reject the null hypothesis.

## Step 5: State a "real world" conclusion

There is evidence that the proportion of all NFL overtime coin tosses that are won is different from 0.50

## 7.4.1.4 - Example: Proportion of Women Students

Research question : Are more than 50% of all World Campus STAT 200 students women?

Data were collected from a representative sample of 501 World Campus STAT 200 students. In that sample, 284 students were women and 217 were not women.

StatKey was used to construct a sampling distribution using randomization methods:

Because this randomization distribution is approximately normal, we can find the p value by computing a standardized test statistic and using the z distribution.

The assumption here is that the sampling distribution is approximately normal. From the given StatKey output, the randomization distribution is approximately normal.

\(H_0\colon p=0.50\) \(H_a\colon p>0.50\)

## 2. Calculate the test statistic

\(test\;statistic=\dfrac{sample\;statistic-hypothesized\;parameter}{standard\;error}\)

The sample statistic is \(\widehat p = 284/501 = 0.567\).

The hypothesized parameter is the value from the hypotheses: \(p_0=0.50\).

The standard error on the randomization distribution above is 0.022.

\(test\;statistic=\dfrac{0.567-0.50}{0.022}=3.045\)

## 3. Determine the p value

We can find the p value by constructing a standard normal distribution and finding the area under the curve that is more extreme than our observed test statistic of 3.045, in the direction of the alternative hypothesis. In other words, \(P(z>3.045)\):

Our p value is 0.0011634

## 4. Make a decision

Our p value is less than or equal to the standard 0.05 alpha level, therefore we reject the null hypothesis.

## 5. State a "real world" conclusion

There is evidence that the proportion of all World Campus STAT 200 students who are women is greater than 0.50.

## 7.4.1.5 - Example: Mean Quiz Score

Research question: Is the mean quiz score different from 14 in the population?

\(H_0\colon \mu = 14\)

\(H_a\colon \mu \ne 14\)

The sample statistic is the mean in the original sample, 13.746 points. The null parameter is 14 points. And, the standard error, 0.142, can be found on the StatKey output.

\(test\;statistic=\dfrac{13.746-14}{0.142}=\dfrac{-0.254}{0.142}=-1.789\)

The p value will be the area on the z distribution that is more extreme than the test statistic of -1.789, in the direction of the alternative hypothesis:

This was a two-tailed test. The p value is the area in the left and right tails combined: \(p=0.0368074+0.0368074=0.0736148\)

The p value (0.0736148) is greater than the standard 0.05 alpha level, therefore we fail to reject the null hypothesis.

There is not enough evidence to state that the mean quiz score in the population is different from 14 points.

## 7.4.1.6 - Example: Difference in Mean Commute Times

Research question: Do the mean commute times in Atlanta and St. Louis differ in the population?

From the given StatKey output, the randomization distribution is approximately normal.

\(H_0: \mu_1-\mu_2=0\)

\(H_a: \mu_1 - \mu_2 \ne 0\)

## Step 2: Compute the test statistic

\(test\;statistic=\dfrac{sample\;statistic - null \; parameter}{standard \;error}\)

The observed sample statistic is \(\overline x _1 - \overline x _2 = 7.14\). The null parameter is 0. And, the standard error, from the StatKey output, is 1.136.

\(test\;statistic=\dfrac{7.14-0}{1.136}=6.285\)

The p value will be the area on the z distribution that is more extreme than the test statistic of 6.285, in the direction of the alternative hypothesis:

This was a two-tailed test. The area in the two tailed combined is 0.000000. Theoretically, the p value cannot be 0 because there is always some chance that a Type I error was committed. This p value would be written as p < 0.001.

The p value is smaller than the standard 0.05 alpha level, therefore we reject the null hypothesis.

There is evidence that the mean commute times in Atlanta and St. Louis are different in the population.

## Hypothesis Testing Calculator

$H_o$: | |||

$H_a$: | μ | ≠ | μ₀ |

$n$ | = | $\bar{x}$ | = | = |

$\text{Test Statistic: }$ | = |

$\text{Degrees of Freedom: } $ | $df$ | = |

$ \text{Level of Significance: } $ | $\alpha$ | = |

## Type II Error

$H_o$: | $\mu$ | ||

$H_a$: | $\mu$ | ≠ | $\mu_0$ |

$n$ | = | σ | = | $\mu$ | = |

$\text{Level of Significance: }$ | $\alpha$ | = |

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

$\sigma$ Known | $\sigma$ Unknown | |

Test Statistic | $ z = \dfrac{\bar{x}-\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $ | $ t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{n}} $ |

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

Lower Tail Test | Upper Tail Test | Two-Tailed Test |

$H_0 \colon \mu \geq \mu_0$ | $H_0 \colon \mu \leq \mu_0$ | $H_0 \colon \mu = \mu_0$ |

$H_a \colon \mu | $H_a \colon \mu \neq \mu_0$ |

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

Lower Tail Test | Upper Tail Test | Two-Tailed Test |

If $z \leq -z_\alpha$, reject $H_0$. | If $z \geq z_\alpha$, reject $H_0$. | If $z \leq -z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$. |

If $t \leq -t_\alpha$, reject $H_0$. | If $t \geq t_\alpha$, reject $H_0$. | If $t \leq -t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$. |

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Condition | ||||

$H_0$ True | $H_a$ True | |||

Conclusion | Accept $H_0$ | Correct | Type II Error | |

Reject $H_0$ | Type I Error | Correct |

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

## Department of Earth Sciences

Service navigation.

- SOGA Startpage
- Privacy Policy
- Accessibility Statement

## Statistics and Geodata Analysis using R (SOGA-R)

Path Navigation

- Basics of Statistics
- Hypothesis Tests
- Population Standard Deviations
- One Population Standard Deviation

## Inferences for One Population Standard Deviation

- Anatomy & Physiology
- Astrophysics
- Earth Science
- Environmental Science
- Organic Chemistry
- Precalculus
- Trigonometry
- English Grammar
- U.S. History
- World History

## ... and beyond

- Socratic Meta
- Featured Answers

- Hypothesis Testing for the Standard Deviation
- Why is important to perform a hypothesis test about a standard deviation?
- What are the two conditions that must be met when performing a hypothesis test about the standard deviation?
- What test can be used to determine if two samples have similar variances?
- What causes the null hypothesis to be rejected in an F-test?

## Advanced Topics

- Nonlinear Transformations of Data
- Nonparametric Statistics
- Analysis of Variance

Teach yourself statistics

## Hypothesis Test for a Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:

- The sampling method is simple random sampling .
- The sampling distribution is normal or nearly normal.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

- The population distribution is normal.
- The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
- The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
- The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

## State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)

Set | Null hypothesis | Alternative hypothesis | Number of tails |
---|---|---|---|

1 | μ = M | μ ≠ M | 2 |

2 | μ M | μ < M | 1 |

3 | μ M | μ > M | 1 |

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

## Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

- Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
- Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

## Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = s * sqrt{ ( 1/n ) * [ ( N - n ) / ( N - 1 ) ] }

SE = s / sqrt( n )

- Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n - 1.

t = ( x - μ) / SE

- P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

## Sample Size Calculator

As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

## Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

## Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: μ = 300

Alternative hypothesis: μ ≠ 300

- Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a one-sample t-test .

SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83

DF = n - 1 = 50 - 1 = 49

t = ( x - μ) / SE = (295 - 300)/2.83 = -1.77

where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < -1.77) is about 0.04.

- If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t > 1.77) is 1 minus 0.96 or 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
- Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).

Problem 2: One-Tailed Test

Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)

Null hypothesis: μ >= 110

Alternative hypothesis: μ < 110

- Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a one-sample t-test .

SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236

DF = n - 1 = 20 - 1 = 19

t = ( x - μ) / SE = (108 - 110)/2.236 = -0.894

Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.

The observed sample mean produced a t statistic test statistic of -0.894. We use the t Distribution Calculator to find P(t < -0.894) is about 0.19.

- This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19.
- Interpret results . Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.

## T-test for two Means – Unknown Population Standard Deviations

Instructions : Use this T-Test Calculator for two Independent Means calculator to conduct a t-test for two population means (\(\mu_1\) and \(\mu_2\)), with unknown population standard deviations. This test apply when you have two-independent samples, and the population standard deviations \(\sigma_1\) and \(\sigma_2\) and not known. Please select the null and alternative hypotheses, type the significance level, the sample means, the sample standard deviations, the sample sizes, and the results of the t-test for two independent samples will be displayed for you:

## The T-test for Two Independent Samples

More about the t-test for two means so you can better interpret the output presented above: A t-test for two means with unknown population variances and two independent samples is a hypothesis test that attempts to make a claim about the population means (\(\mu_1\) and \(\mu_2\)).

More specifically, a t-test uses sample information to assess how plausible it is for the population means \(\mu_1\) and \(\mu_2\) to be equal. The test has two non-overlapping hypotheses, the null and the alternative hypothesis.

The null hypothesis is a statement about the population means, specifically the assumption of no effect, and the alternative hypothesis is the complementary hypothesis to the null hypothesis.

## Properties of the two sample t-test

The main properties of a two sample t-test for two population means are:

- Depending on our knowledge about the "no effect" situation, the t-test can be two-tailed, left-tailed or right-tailed
- The main principle of hypothesis testing is that the null hypothesis is rejected if the test statistic obtained is sufficiently unlikely under the assumption that the null hypothesis is true
- The p-value is the probability of obtaining sample results as extreme or more extreme than the sample results obtained, under the assumption that the null hypothesis is true
- In a hypothesis tests there are two types of errors. Type I error occurs when we reject a true null hypothesis, and the Type II error occurs when we fail to reject a false null hypothesis

## How do you compute the t-statistic for the t test for two independent samples?

The formula for a t-statistic for two population means (with two independent samples), with unknown population variances shows us how to calculate t-test with mean and standard deviation and it depends on whether the population variances are assumed to be equal or not. If the population variances are assumed to be unequal, then the formula is:

On the other hand, if the population variances are assumed to be equal, then the formula is:

Normally, the way of knowing whether the population variances must be assumed to be equal or unequal is by using an F-test for equality of variances.

With the above t-statistic, we can compute the corresponding p-value, which allows us to assess whether or not there is a statistically significant difference between two means.

## Why is it called t-test for independent samples?

This is because the samples are not related with each other, in a way that the outcomes from one sample are unrelated from the other sample. If the samples are related (for example, you are comparing the answers of husbands and wives, or identical twins), you should use a t-test for paired samples instead .

## What if the population standard deviations are known?

The main purpose of this calculator is for comparing two population mean when sigma is unknown for both populations. In case that the population standard deviations are known, then you should use instead this z-test for two means .

## Related Calculators

## log in to your account

Reset password.

- Skip to secondary menu
- Skip to main content
- Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

## Kruskal Wallis Test Explained

By Jim Frost 2 Comments

## What is the Kruskal Wallis Test?

The Kruskal Wallis test is a nonparametric hypothesis test that compares three or more independent groups. Statisticians also refer to it as one-way ANOVA on ranks. This analysis extends the Mann Whitney U nonparametric test that can compare only two groups.

If you analyze data, chances are you’re familiar with one-way ANOVA that compares the means of at least three groups. The Kruskal Wallis test is the nonparametric version of it. Because it is nonparametric, the analysis makes fewer assumptions about your data than its parametric equivalent.

Many analysts use the Kruskal Wallis test to determine whether the medians of at least three groups are unequal. However, it’s important to note that it only assesses the medians in particular circumstances. Interpreting the analysis results can be thorny. More on this later!

If you need a nonparametric test for paired groups or a single sample , consider the Wilcoxon signed rank test .

Learn more about Parametric vs. Nonparametric Tests and Hypothesis Testing Overview .

## What Does the Kruskal Wallis Test Tell You?

At its core, the Kruskal Wallis test evaluates data ranks. The procedure ranks all the sample data from low to high. Then it averages the ranks for all groups. If the results are statistically significant, the average group ranks are not all equal. Consequently, the analysis indicates whether any groups have values that rank differently. For instance, one group might have values that tend to rank higher than the other groups.

The Kruskal Wallis test doesn’t involve medians or other distributional properties—just the ranks. In fact, by evaluating ranks, it rolls up both the location and shape parameters into a single evaluation of each group’s average rank.

When their average ranks are unequal, you know a group’s distribution tends to produce higher or lower values than the others. However, you don’t know enough to draw conclusions specifically about the distributions’ locations (e.g., the medians).

## Special Case for Same Shapes

However, when you hold the distribution shapes constant, the Kruskal Wallis test does tell us about the median. That’s not a property of the procedure itself but logic. If several distributions have the same shape, but the average ranks are shifted higher and lower, their medians must differ. But we can only draw that conclusion about the medians when the distributions have the same shapes.

These three distributions have the same shape, but the red and green are shifted right to higher values. Wherever the median falls on the blue distribution, it’ll be in the corresponding position in the red and blue distributions. In this case, the analysis can assess the medians.

But, if the shapes aren’t similar, we don’t know whether the location, shape, or a combination of the two produced the statistically significant Kruskal Wallis test.

## Analysis Assumptions

Like all statistical analyses, the Kruskal Wallis test has assumptions. Ensuring that your data meet these assumptions is crucial.

- Independent Groups : Each group has a distinct set of subjects or items.
- Independence of Observations : Each observation must be independent of the others. The data points should not influence or predict each other.
- Ordinal or Continuous Data : The Kruskal Wallis test can handle both ordinal data and continuous data, making it flexible for various research situations.
- Same Distribution Shape : This assumption applies only when you want to draw inferences about the medians. If this assumption holds, the analysis can provide insights about the medians.

Violating these assumptions can lead to incorrect conclusions.

## When to Use this Analysis?

Consider using the Kruskal Wallis test in the following cases:

- You have ordinal data.
- Your data follow a nonnormal distribution, and you have a small sample size.
- The median is more relevant to your subject area than the mean.

Learn more about the Normal Distribution .

If you have 3 – 9 groups and more than 15 observations per group or 10 – 12 groups and more than 20 observations per group, you might want to use one-way ANOVA even when you have nonnormal data. The central limit theorem causes the sampling distributions to converge on normality, making ANOVA a suitable choice.

One-way ANOVA has several advantages over the Kruskal Wallis test, including the following:

- More statistical power to detect differences.
- Can handle distributions with different shapes ( Use Welch’s ANOVA ).
- Avoids the interpretation issues discussed above.

In short, use this nonparametric method when you’re specifically interested in the medians, have ordinal data, or can’t use one-way ANOVA because you have a small, nonnormal sample.

## Interpreting Kruskal Wallis Test Results

Like one-way ANOVA, the Kruskal Wallis test is an “omnibus” test. Omnibus tests can tell you that not all your groups are equal, but it doesn’t specify which pairs of groups are different.

Specifically, the Kruskal Wallis test evaluates the following hypotheses:

- Null : The average ranks are all the same.
- Alternative : At least one average rank is different.

Again, if the distributions have similar shapes, you can replace “average ranks” with “medians.”

Imagine you’re studying five different diets and their impact on weight loss. The Kruskal Wallis test can confirm that at least two diets have different results. However, it won’t tell you exactly which pairs of diets have statistically significant differences.

So, how do we solve this problem? Enter post hoc tests. Perform these analyses after (i.e., post) an omnibus analysis to identify specific pairs of groups with statistically significant differences. A standard option includes Dunn’s multiple comparisons procedure. Other options include performing a series of pairwise Mann-Whitney U tests with a Bonferroni correction or the lesser-known but potent Conover-Iman method.

Learn about Post Hoc Tests for ANOVA .

## Kruskal Wallis Test Example

Imagine you’re a healthcare administrator analyzing the median number of unoccupied beds in three hospitals. Download the CSV dataset: KruskalWallisTest .

For this Kruskal Wallis test, the p-value is 0.029, which is less than the typical significance level of 0.05. Consequently, we can reject the null hypothesis that all groups have the same average rank. At least one group has a different average rank than the others.

Furthermore, if the three hospital distributions have the same shape, we can conclude that the medians differ.

At this point, we might decide to use a post hoc test to compare pairs of hospitals.

## Share this:

## Reader Interactions

May 20, 2024 at 2:07 pm

Sir kruskal walllis test is Two tailed or one tailed test??

May 20, 2024 at 3:55 pm

It’s a one-tailed test in the same sense that the F-test for one-way ANOVA is one-tailed.

## IMAGES

## VIDEO

## COMMENTS

A test of a single standard deviation assumes that the underlying distribution is normal. The null and alternative hypotheses are stated in terms of the population standard deviation (or population variance). The test statistic is: χ2 = (n − 1)s2 σ2 (8.4.1) (8.4.1) χ 2 = ( n − 1) s 2 σ 2. where:

Performing a Hypothesis Test Regarding Ïƒ. Step 1: State the null and alternative hypotheses. Step 2: Decide on a level of significance, α. Step 3: Compute the test statistic, . Step 4: Determine the P -value. Step 5: Reject the null hypothesis if the P -value is less than the level of significance, α.

The alternative hypothesis is that the true population standard deviation is not equal to 3.25 . We want to test the null hypothesis, H0: σ = 3.25 , against the alternative hypothesis, H1: σ ≠ 3.25 , at the 0.0333 level of significance . Note that this is a two-tailed test .

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=1.8.[/latex]). So we use a [latex]t[/latex]-distribution to calculate the p-value. Because the alternative hypothesis is a [latex]\lt[/latex], the p-value is the area in the left-tail of the distribution.

The hypothesis test will be evaluated using a significance level of \(\alpha = 0.05\). We want to consider the data under the scenario that the null hypothesis is true. In this case, the sample mean is from a distribution that is nearly normal and has mean 7 and standard deviation of about 0.17. Such a distribution is shown in Figure 4.15.

Below these are summarized into six such steps to conducting a test of a hypothesis. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as H 0, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is ...

Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed. Collect the sample information for the test and identify the significance level. When the population standard deviation is known, find the p-value (the area in the corresponding tail) for the test using the normal distribution.

The test statistic is. χ2 = (n − 1)S2 σ20 = (11 − 1)0.064 0.06 = 10.667 χ 2 = ( n − 1) S 2 σ 0 2 = ( 11 − 1) 0.064 0.06 = 10.667. We fail to reject the null hypothesis. The forester does NOT have enough evidence to support the claim that the variance is greater than 0.06 gal.2 You can also estimate the p-value using the same method ...

For hypothesis testing, we need to decide between two competing theories. ... But what should we plug in for the population standard deviation $\sigma$? When we don't know the value of a parameter, we will generally plug in our best estimate for the parameter. In this case, that corresponds to plugging in $\hat{\sigma}$, or our sample standard ...

This is the mean. If I did 1 standard deviation, 2 standard deviations, 3 standard deviations-- that's in the positive direction. Actually let me draw it a little bit different than that. This wasn't a nicely drawn bell curve, but I'll do 1 standard deviation, 2 standard deviation, and then 3 standard deviations in the positive direction.

Bigelow: n2 = 120 n 2 = 120, s2 = 0.0428 s 2 = 0.0428. Assuming that the diameters of the bearings from both companies are normally distributed, test the claim that there is no difference in the variation of the diameters between the two companies. The hypotheses are: H0: σ1 = σ2 H 0: σ 1 = σ 2. Ha: σ1 ≠ σ2 H a: σ 1 ≠ σ 2.

The p-value is the area under the standard normal distribution that is more extreme than the test statistic in the direction of the alternative hypothesis. Make a decision. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis. State a "real world" conclusion.

Distribution for the test: Use tdf t d f where df d f is calculated using the df d f formula for independent groups, two population means. Using a calculator, df d f is approximately 18.8462. Do not pool the variances. Calculate the p-value using a Student's t-distribution: p-value = 0.0054 p -value = 0.0054. Graph:

A test statistic assesses how consistent your sample data are with the null hypothesis in a hypothesis test. Test statistic calculations take your sample data and boil them down to a single number that quantifies how much your sample diverges from the null hypothesis. As a test statistic value becomes more extreme, it indicates larger ...

Hypothesis Testing Calculator. The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is ...

Determine if the data supports a hypothesis at a given significance level using known distributions. Topic. This lesson covers: Hypothesis Testing with Known Standard Deviation. Openstax Introductory Statistics: 9.1 Null and Alternative Hypotheses. 9.2 Outcomes and the Type I and Type II Errors. Introductory Statistics by Sheldon Ross, 3rd ...

Full Hypothesis Test Examples. Example 8.6.4 8.6. 4. Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71.

The hypothesis testing procedure for one standard deviation is called one standard deviation \\(\\chi^2\\)-test. Hypothesis testing for variances follows the same step-wise procedure as hypothesis tests for the mean: ... We want to test, if the standard deviation of the height of female students is significantly smaller than the standard ...

What test can be used to determine if two samples have similar variances? What causes the null hypothesis to be rejected in an F-test? View all chapters. The best videos and questions to learn about Hypothesis Testing for the Standard Deviation. Get smarter on Socratic.

Exercise 8.3.6 8.3. 6. It is believed that a stock price for a particular company will grow at a rate of $5 per week with a standard deviation of $1. An investor believes the stock won't grow as quickly. The changes in stock price is recorded for ten weeks and are as follows: $4, $3, $2, $3, $1, $7, $2, $1, $1, $2.

How to conduct a hypothesis test for a mean value, using a one-sample t-test. The test procedure is illustrated with examples for one- and two-tailed tests. ... where s is the standard deviation of the sample, N is the population size, and n is the sample size. When the population size is much larger (at least 20 times larger) than the sample ...

Calculation Example: There are six steps you would follow in hypothesis testing: Formulate the null and alternative hypotheses in three different ways: H 0: θ = θ 0 v e r s u s H 1: θ ≠ θ 0. H 0: θ ≤ θ 0 v e r s u s H 1: θ > θ 0. H 0: θ ≥ θ 0 v e r s u s H 1: θ < θ 0.

Answer. This is a test of two independent groups, two population means, population standard deviations known. Random Variable: X¯1 −X¯2 = X ¯ 1 − X ¯ 2 = difference in the mean number of months the competing floor waxes last. The words "is more effective" says that wax 1 lasts longer than wax 2, on average.

Instructions : Use this T-Test Calculator for two Independent Means calculator to conduct a t-test for two population means ( \mu_1 μ1 and \mu_2 μ2 ), with unknown population standard deviations. This test apply when you have two-independent samples, and the population standard deviations \sigma_1 σ1 and \sigma_2 σ2 and not known.

The Kruskal Wallis test is a nonparametric hypothesis test that compares three or more independent groups. Statisticians also refer to it as one-way ANOVA on ranks. This analysis extends the Mann Whitney U nonparametric test that can compare only two groups. If you analyze data, chances are you're familiar with one-way ANOVA that compares the ...