Takehome Portion Examples

The following examples illustrate the use of the MA145 technology link with the type of problems that appear on the takehome portion of the exam.

The problems fall into two major categories:

Interval Estimation (Confidence Intervals)
Hypothesis Testing

Most of the problems on the exam deal with hypothesis testing.

The hypothesis testing problems fall into two subcategories:

Testing Hypotheses on a single mean or proportion
Testing Hypotheses comparing two means or proportions

Example 1

A researcher counts gypsy moth (Lymantria dispar) egg masses in 76 quarter-acre plots. An much larger survey from the previous season indicated an average of 52.1 egg masses per quarter-acre with a standard deviation of 12.4. If the average number of egg masses per quarter-acre in the sample is 46.1, does the data support the claim that the level of infestation (as measured by the average number of egg masses per quarter acre) is lower this year?

Solution: Note the following when reading the problem:

The data are not counts or percentages, so we are not dealing with proportions.
There is no mention of matching or pairing of data from two samples, so we are not dealing with a paired test.
There are two means mentioned (52.1 and 46.1)
There is only one standard deviation mentioned (12.4)
There is only one sample size mentioned (76 quarter-acre plots: n=76)
The standard deviation mentioned does not appear to be calculated from the sample of 76, but from a "much larger survey" for which n is not specified. This suggests that we should interpret this as the known population standard deviation sigma.
Apparently the question of interest is whether or not the level of infestation as measured by egg masses per quarter acre is lower this year.

In general, the null hypothesis can be almost anything, but by convention it is usually associated with the situation where nothing has changed: the treatment had no effect, there was no change in the level of a measured variable, the mean this year is the same as last year.

In this case, the likely choice for a null hypothesis would be a statement contradicting the claim that the infestation is lower:

Null Hypothesis H_0: The mean number of egg masses per quarter acre 52.1, the same as last year.

Alternative Hypothesis H_1: The mean number of egg masses per quarter acre is less than 52.1.

The wording for a comparison of two means and a test of a hypothesis about a mean are similar, but the fact that only one sample size is given suggests there is only one sample mean, which would be the case in testing a hypothesis about a mean. The value of the mean under the null hypothesis does not come from a sample, but is purely hypothetical.

So we would conclude that:

In this example, we are testing a hypothesis about a population mean
We consider the population standard deviation sigma to be known
The input parameters are:
- The mean under the null hypothesis is 52.1
- The (known) population standard deviation is 12.4
- The sample mean is 46.1
- The sample size is 76

A check of the MA145 technology page for hypothesis testing, sigma known lists the required inputs as the mean under H_0, the sample mean, the (known) population standard deviation sigma, the sample size, and the alpha level (which is chosen by the person analyzing the data).

From the wording of the claim, we would perform a one-sided test, "left tailed" because we want to reject the null hypothesis for sufficiently small values of the sample mean, and we do not care about sample means higher than the mean under the null hypothesis.

Example 2

Ocean color is known to correlate strongly with phytoplankton levels, green indicating higher and blue lower levels of phytoplankton. Ocean color in the Gulf of Alaska is measured on a grid of 120 randomly selected coordinate points using satelite imagery from 2005. The same analysis is performed on an lower resolution image taken 3 years earlier, using 80 randomly selected coordinate points. In the 2005 data, the wavelength of the peak absorption color is 492.1 nanometers with a standard deviation of 43.2. The 2002 data has peak absoption at 511 nanometers with a standard deviation of 32.4. Given that the wavelength of green light is 510 nm and that of blue light is 475 nm, does this data indicate a significant shift in ocean color towards blue (and therefore, a reduction in phytoplankton)?

Solution: Note the following when reading the problem:

The data are not counts or percentages, so we are not dealing with proportions.
There is no mention of matching or pairing of data from two samples, so we are not dealing with a paired test.
There are two means mentioned (492.1 and 511)
There are two standard deviations mentioned (43.2 and 32.4)
There are two sample sizes mentioned (80 and 120 coordinate points)
The standard deviations mentioned appear to be calculated from the samples, indicating that the population standard deviation sigma is unknown.
Apparently the question of interest is whether or not the data indicates a shift in ocean color towards blue (i.e., a shorter peak wavelength) from 2002 to 2005.

In this case, the likely choice for a null hypothesis would be a statement contradicting the claim that there is a shift in ocean color towards blue:

Null Hypothesis H_0: There is no significant difference between the peak absorption wavelengths in 2002 and 2005 (i.e., the two populations, 2002 and 2005, have the same mean)

Alternative Hypothesis H_1: The peak absorption wavelength is lower in 2005 (the population means are different, and the 2005 mean is lower)

The wording for a comparison of two means and a test of a hypothesis about a mean are similar, but the fact that we are given two sample means, two apparent sample standard deviations, and two sample sizes suggests that this problem can be handled as inference about two means, with sigma unknown.

So we would conclude that:

In this example, we comparing two sample means
We consider the population standard deviation sigma to be unknown and estimated from the samples
The input parameters are:
- The two sample means (511 and 492.1)
- The two sample standard deviations (43.2 and 32.4)
- The two sample sizes (80 and 120)

A check of the MA145 technology page for inference about two means, sigma unknown lists the required inputs as the two sample means, the two sample standard deviations, and the two sample sizes (and also the alpha level, which is not based on data but chosen by the data analyst).

Example 3

Mosquito traps are placed near 43 small ponds and a count of Culex species in the traps is obtained during a baseline period. At the end of the baseline period a spraying program is conducted. One week after the spraying, the traps are cleaned and a second collection period is initiated. Based on an estimate of the size of each pond, the raw counts are converted to a density of Culex mosquitoes per square foot of pond. The difference between the before and after densities is found to have a sample mean of 9.1 and a sample standard deviation of 12.0. Test whether or not the data indicates that the spraying was effective in reducing the density of Culex species mosquitoes.

Solution: Note the following when reading the problem:

The data are not counts or percentages, so we are not dealing with proportions.
The description of the experiment mentions use of the same traps for two samples, before and after spraying.
There is only one mean mentioned (9.1), described as the difference before and after spraying (for an individual trap).
There is only one standard deviation mentioned (12.0).
There is only one sample size mentioned (43 ponds).
The standard deviation is described as a sample standard deviation of the differences of the before and after measures.
Apparently the question of interest is whether or not the spraying reduces the density of mosquitoes.

In this case, the likely choice for a null hypothesis would be a statement that the spraying has no effect:

Null Hypothesis H_0: The mean population densities of mosquitoes is the same before and after spraying (i.e., spraying is not effective).

Alternative Hypothesis H_1: The mean population density is lower after spraying (spraying is effective).

So we would conclude that:

This is an example of inference about two means with paired or dependent samples.
The input parameters are:
- The sample mean of the difference between before and after measures each trap (9.1)
- The sample standard deviation of the difference between before and after measures for each trap (12.0)
- The sample size, in this case, the number of traps (pairs) (n=43)

A check of the MA145 technology page for inference about two means with paired samples lists the required inputs as the mean difference between paired measures, the standard deviation of the difference, and the sample size (number of pairs). (plus, the alpha level).

Example 4

In a double-blind study of Major Depressive Disorder (MDD), 43 subjects are treated with seratonin-reuptake inhibitors (SRIs) while 32 are given a placebo. After 8 weeks of treatment, 16 subjects in the SRI group have experienced a remission of MDD, while 6 subjects in the placebo group have. Can we conclude that patients receiving the drug are more likely to remit than those receiving a placebo?

Solution: Note the following when reading the problem:

The data are counts, so we are dealing with proportions.
Two samples are mentioned (The SRI group and the placebo group).
There are two remission counts, one for each group.
Apparently the question of interest whether SRIs produce remission of MDD at a higher rate than placebo.

The data provided suggests that this is inference about two proprotions.

Null Hypothesis H_0: There is no difference between the proportion of subjects with MDD who remit when treated with SRIs and the proportion who remit when given a placebo.

Alternative Hypothesis H_1: A higher proportion of the MDD subjects treated with SRIs remit.

So we would conclude that:

This is an example of inference about two proportions.
The only input parameters are:
- The proportions of the SRI and placebo groups remitting (16/43 and 6/32)
- The two sample sizes (43 and 32)

A check of the MA145 technology page for inference about two proportions lists the required inputs as the two proportions, and the two sample sizes. (plus, the alpha level).

Example 5

A government health agency estimates the level of HIV infection in a certain area at 15.3%. Testing is performed on a random sample of 45 residents and 12 of the tests are positive. Does this data support the agency figure?

Solution: Note the following when reading the problem:

The data are counts, so we are dealing with proportions.
Only one sample size is mentioned.
There is only one count mentioned.
Apparently the question of interest is whether the HIV infection level is 15.3%.

The data provided suggests that this is a test of a hypothesis about a proportion.

Null Hypothesis H_0: The HIV infection level is 15.3%.

Alternative Hypothesis H_1: The HIV infection level is different from 15.3%.

So we would conclude that:

This is an example of hypothesis testing using proportions.
The only input parameters are:
- The proportion when the null hypothesis is true (15.3%).
- The sample proportion (12/45).
- The sample size (45)

A check of the MA145 technology page for tests of hypotheses on proportions lists the required inputs as the proportion when the null hypothesis is true, the sample proportion, and the sample size. (plus, the alpha level).