Large Sample Estimation of a Population Proportion

7.3 Large Sample Estimation of a Population Proportion

Learning Objective

To understand how to apply the formula for a confidence interval for a population proportion.

Since from Section 6.3 "The Sample Proportion" in Chapter 6 "Sampling Distributions" we know the mean, standard deviation, and sampling distribution of the sample proportion $\hat{p}$ , the ideas of the previous two sections can be applied to produce a confidence interval for a population proportion. Here is the formula.

Large Sample $100 (1 - α) %$ Confidence Interval for a Population Proportion

\hat{p} \pm z_{α ∕ 2} \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}

A sample is large if the interval $[p − 3 σ_{\hat{P}}, p + 3 σ_{\hat{P}}]$ lies wholly within the interval $[0,1] .$

In actual practice the value of p is not known, hence neither is $σ_{\hat{P}} .$ In that case we substitute the known quantity $\hat{p}$ for p in making the check; this means checking that the interval

[\hat{p} − 3 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}, \hat{p} + 3 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}]

lies wholly within the interval $[0,1] .$

Example 7

To estimate the proportion of students at a large college who are female, a random sample of 120 students is selected. There are 69 female students in the sample. Construct a 90% confidence interval for the proportion of all students at the college who are female.

Solution:

The proportion of students in the sample who are female is $\hat{p} = 69 ∕ 120 = 0.575 .$

Confidence level 90% means that $α = 1 - 0.90 = 0.10$ so $α ∕ 2 = 0.05 .$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.05} = 1.645 .$

Thus

\hat{p} \pm z_{α ∕ 2} \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}} = 0.575 \pm 1.645 \sqrt{\frac{(0.575) (0.425)}{120}} = 0.575 \pm 0.074

One may be 90% confident that the true proportion of all students at the college who are female is contained in the interval $(0.575 - 0 . 074,0 . 575 + 0.074) = (0 . 501,0 . 649) .$

Key Takeaways

We have a single formula for a confidence interval for a population proportion, which is valid when the sample is large.
The condition that a sample be large is not that its size n be at least 30, but that the density function fit inside the interval $[0,1] .$

Exercises

Basic

Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 90% confidence interval for the population proportion.
1. n = 25, $\hat{p} = 0.7$
2. n = 50, $\hat{p} = 0.7$
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 95% confidence interval for the population proportion.
1. n = 2500, $\hat{p} = 0.22$
2. n = 1200, $\hat{p} = 0.22$
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 98% confidence interval for the population proportion.
1. n = 80, $\hat{p} = 0.4$
2. n = 325, $\hat{p} = 0.4$
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 99.5% confidence interval for the population proportion.
1. n = 200, $\hat{p} = 0.85$
2. n = 75, $\hat{p} = 0.85$
In a random sample of size 1,100, 338 have the characteristic of interest.
1. Compute the sample proportion $\hat{p}$ with the characteristic of interest.
2. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion.
3. Construct an 80% confidence interval for the population proportion p.
4. Construct a 90% confidence interval for the population proportion p.
5. Comment on why one interval is longer than the other.
In a random sample of size 2,400, 420 have the characteristic of interest.
1. Compute the sample proportion $\hat{p}$ with the characteristic of interest.
2. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion.
3. Construct a 90% confidence interval for the population proportion p.
4. Construct a 99% confidence interval for the population proportion p.
5. Comment on why one interval is longer than the other.

Applications

A security feature on some web pages is graphic representations of words that are readable by human beings but not machines. When a certain design format was tested on 450 subjects, by having them attempt to read ten disguised words, 448 subjects could read all the words.
1. Give a point estimate of the proportion p of all people who could read words disguised in this way.
2. Show that the sample is not sufficiently large to construct a confidence interval for the proportion of all people who could read words disguised in this way.
In a random sample of 900 adults, 42 defined themselves as vegetarians.
1. Give a point estimate of the proportion of all adults who would define themselves as vegetarians.
2. Verify that the sample is sufficiently large to use it to construct a confidence interval for that proportion.
3. Construct an 80% confidence interval for the proportion of all adults who would define themselves as vegetarians.
In a random sample of 250 employed people, 61 said that they bring work home with them at least occasionally.
1. Give a point estimate of the proportion of all employed people who bring work home with them at least occasionally.
2. Construct a 99% confidence interval for that proportion.
In a random sample of 1,250 household moves, 822 were moves to a location within the same county as the original residence.
1. Give a point estimate of the proportion of all household moves that are to a location within the same county as the original residence.
2. Construct a 98% confidence interval for that proportion.
In a random sample of 12,447 hip replacement or revision surgery procedures nationwide, 162 patients developed a surgical site infection.
1. Give a point estimate of the proportion of all patients undergoing a hip surgery procedure who develop a surgical site infection.
2. Verify that the sample is sufficiently large to use it to construct a confidence interval for that proportion.
3. Construct a 95% confidence interval for the proportion of all patients undergoing a hip surgery procedure who develop a surgical site infection.
In a certain region prepackaged products labeled 500 g must contain on average at least 500 grams of the product, and at least 90% of all packages must weigh at least 490 grams. In a random sample of 300 packages, 288 weighed at least 490 grams.
1. Give a point estimate of the proportion of all packages that weigh at least 490 grams.
2. Verify that the sample is sufficiently large to use it to construct a confidence interval for that proportion.
3. Construct a 99.8% confidence interval for the proportion of all packages that weigh at least 490 grams.
A survey of 50 randomly selected adults in a small town asked them if their opinion on a proposed “no cruising” restriction late at night. Responses were coded 1 for in favor, 0 for indifferent, and 2 for opposed, with the results shown in the table.
$\begin{matrix} 1 & 0 & 2 & 0 & 1 & 0 & 0 & 1 & 1 & 2 \\ 0 & 2 & 0 & 0 & 0 & 1 & 0 & 2 & 0 & 0 \\ 0 & 2 & 1 & 2 & 0 & 0 & 0 & 2 & 0 & 1 \\ 0 & 2 & 0 & 2 & 0 & 1 & 0 & 0 & 2 & 0 \\ 1 & 0 & 0 & 1 & 2 & 0 & 0 & 2 & 1 & 2 \end{matrix}$
1. Give a point estimate of the proportion of all adults in the community who are indifferent concerning the proposed restriction.
2. Assuming that the sample is sufficiently large, construct a 90% confidence interval for the proportion of all adults in the community who are indifferent concerning the proposed restriction.
To try to understand the reason for returned goods, the manager of a store examines the records on 40 products that were returned in the last year. Reasons were coded by 1 for “defective,” 2 for “unsatisfactory,” and 0 for all other reasons, with the results shown in the table.
$\begin{matrix} 0 & 2 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 \\ 0 & 0 & 2 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \end{matrix}$
1. Give a point estimate of the proportion of all returns that are because of something wrong with the product, that is, either defective or performed unsatisfactorily.
2. Assuming that the sample is sufficiently large, construct an 80% confidence interval for the proportion of all returns that are because of something wrong with the product.
In order to estimate the proportion of entering students who graduate within six years, the administration at a state university examined the records of 600 randomly selected students who entered the university six years ago, and found that 312 had graduated.
1. Give a point estimate of the six-year graduation rate, the proportion of entering students who graduate within six years.
2. Assuming that the sample is sufficiently large, construct a 98% confidence interval for the six-year graduation rate.
In a random sample of 2,300 mortgages taken out in a certain region last year, 187 were adjustable-rate mortgages.
1. Give a point estimate of the proportion of all mortgages taken out in this region last year that were adjustable-rate mortgages.
2. Assuming that the sample is sufficiently large, construct a 99.9% confidence interval for the proportion of all mortgages taken out in this region last year that were adjustable-rate mortgages.
In a research study in cattle breeding, 159 of 273 cows in several herds that were in estrus were detected by means of an intensive once a day, one-hour observation of the herds in early morning.
1. Give a point estimate of the proportion of all cattle in estrus who are detected by this method.
2. Assuming that the sample is sufficiently large, construct a 90% confidence interval for the proportion of all cattle in estrus who are detected by this method.
A survey of 21,250 households concerning telephone service gave the results shown in the table.

Landline No Landline

Cell phone 12,474 5,844

No cell phone 2,529 403
1. Give a point estimate for the proportion of all households in which there is a cell phone but no landline.
2. Assuming the sample is sufficiently large, construct a 99.9% confidence interval for the proportion of all households in which there is a cell phone but no landline.
3. Give a point estimate for the proportion of all households in which there is no telephone service of either kind.
4. Assuming the sample is sufficiently large, construct a 99.9% confidence interval for the proportion of all all households in which there is no telephone service of either kind.

	Landline	No Landline
Cell phone	12,474	5,844
No cell phone	2,529	403

Additional Exercises

In a random sample of 900 adults, 42 defined themselves as vegetarians. Of these 42, 29 were women.
1. Give a point estimate of the proportion of all self-described vegetarians who are women.
2. Verify that the sample is sufficiently large to use it to construct a confidence interval for that proportion.
3. Construct a 90% confidence interval for the proportion of all all self-described vegetarians who are women.
A random sample of 185 college soccer players who had suffered injuries that resulted in loss of playing time was made with the results shown in the table. Injuries are classified according to severity of the injury and the condition under which it was sustained.

Minor Moderate Serious

Practice 48 20 6

Game 62 32 17
1. Give a point estimate for the proportion p of all injuries to college soccer players that are sustained in practice.
2. Construct a 95% confidence interval for the proportion p of all injuries to college soccer players that are sustained in practice.
3. Give a point estimate for the proportion p of all injuries to college soccer players that are either moderate or serious.
4. Construct a 95% confidence interval for the proportion p of all injuries to college soccer players that are either moderate or serious.
The body mass index (BMI) was measured in 1,200 randomly selected adults, with the results shown in the table.

BMI

Under 18.5 18.5–25 Over 25

Men 36 165 315

Women 75 274 335
1. Give a point estimate for the proportion of all men whose BMI is over 25.
2. Assuming the sample is sufficiently large, construct a 99% confidence interval for the proportion of all men whose BMI is over 25.
3. Give a point estimate for the proportion of all adults, regardless of gender, whose BMI is over 25.
4. Assuming the sample is sufficiently large, construct a 99% confidence interval for the proportion of all adults, regardless of gender, whose BMI is over 25.
Confidence intervals constructed using the formula in this section often do not do as well as expected unless n is quite large, especially when the true population proportion is close to either 0 or 1. In such cases a better result is obtained by adding two successes and two failures to the actual data and then computing the confidence interval. This is the same as using the formula
$\begin{matrix} \tilde{p} \pm z_{α ∕ 2} \sqrt{\frac{\tilde{p} (1 - \tilde{p})}{\tilde{n}}} \\ where \\ \tilde{p} = \frac{x + 2}{n + 4} and \tilde{n} = n + 4 \end{matrix}$
Suppose that in a random sample of 600 households, 12 had no telephone service of any kind. Use the adjusted confidence interval procedure just described to form a 99.9% confidence interval for the proportion of all households that have no telephone service of any kind.

	Minor	Moderate	Serious
Practice	48	20	6
Game	62	32	17

	BMI
Men	36	165	315
Women	75	274	335

Large Data Set Exercises

Large Data Sets 4 and 4A list the results of 500 tosses of a die. Let p denote the proportion of all tosses of this die that would result in a four. Use the sample data to construct a 90% confidence interval for p.

https://www.gone.2012books.lardbucket.org/sites/all/files/data4.xls

https://www.gone.2012books.lardbucket.org/sites/all/files/data4A.xls
Large Data Set 6 records results of a random survey of 200 voters in each of two regions, in which they were asked to express whether they prefer Candidate A for a U.S. Senate seat or prefer some other candidate. Use the full data set (400 observations) to construct a 98% confidence interval for the proportion p of all voters who prefer Candidate A.

https://www.gone.2012books.lardbucket.org/sites/all/files/data6.xls
Lines 2 through 536 in Large Data Set 11 is a sample of 535 real estate sales in a certain region in 2008. Those that were foreclosure sales are identified with a 1 in the second column.

https://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
1. Use these data to construct a point estimate $\hat{p}$ of the proportion p of all real estate sales in this region in 2008 that were foreclosure sales.
2. Use these data to construct a 90% confidence for p.
Lines 537 through 1106 in Large Data Set 11 is a sample of 570 real estate sales in a certain region in 2010. Those that were foreclosure sales are identified with a 1 in the second column.

https://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
1. Use these data to construct a point estimate $\hat{p}$ of the proportion p of all real estate sales in this region in 2010 that were foreclosure sales.
2. Use these data to construct a 90% confidence for p.

Answers

1. (0.5492, 0.8508)
2. (0.5934, 0.8066)
1. (0.2726, 0.5274)
2. (0.3368, 0.4632)
1. 0.3073
2. $\hat{p} \pm 3 \sqrt{\frac{\hat{p} \hat{q}}{n}} = 0.31 \pm 0.04$
  
  and
  
  $[0 . 27,0 . 35] \subset [0,1]$
3. (0.2895, 0.3251)
4. (0.2844, 0.3302)
5. Asking for greater confidence requires a longer interval.

1. 0.9956
2. (0.9862, 1.005)
1. 0.244
2. (0.1740, 0.3140)
1. 0.013
2. (0.01, 0.016)
3. (0.011, 0.015)
1. 0.52
2. (0.4038, 0.6362)
1. 0.52
2. (0.4726, 0.5674)
1. 0.5824
2. (0.5333, 0.6315)

1. 0.69
2. $\hat{p} \pm 3 \sqrt{\frac{\hat{p} \hat{q}}{n}} = 0.69 \pm 0.21$
  
  and
  
  $[0 . 48,0 . 90] \subset [0,1]$
3. $0.69 \pm 0.12$
1. 0.6105
2. (0.5552, 0.6658)
3. 0.5583
4. (0.5214, 0.5952)

$(0 . 1368,0 . 1912)$
1. $\hat{p} = 0.2280$
2. $(0 . 1982,0 . 2579)$