4. Using data to support an argument; making inferences
4.1 Estimating a population parameter from a sample statistic and estimating the size of a sample.
In Module 1 you learned how it was possible to estimate the population mean from a sample mean. For example, in opinion polls a sample of voters is selected, and, if the sample is representative of the population, then you can be confident that the population mean falls within the range of the sample mean ± a margin of error.
This range is called a confidence interval and can be expressed as a formula:
where
is the population mean which is being estimated
is a measured sample mean
CI stands for "confidence interval"
Consider the situation where we are estimating the mean of a population based on the mean of a sample (perhaps we want to estimate the average number of cars per family in Sydney).
The process we follow is:
Choose 100 samples each of size 50 families (samples don't always have to be of size 50).
For the first sample we determine the mean number of cars in each family.
When the 50 families have been interviewed, the number of cars in each family is totalled and then divided by 50 to get the mean number of cars per family for that sample.
On the basis of this sample, we calculate a confidence interval for the mean of the population.
The steps above are repeated another 99 times - once on each remaining sample, and another 99 confidence intervals are formed.
If we want 95% confidence that our results (intervals) capture / include the true population value, then we expect around 95 of the 100 intervals to include that true population value.
We can express the formula for confidence interval in a more precise way:
which includes a special number Z and the standard deviation of the
population () and the sample size (n).
The special Z number is related to the level of confidence we want for our experiment, and three commonly used levels of confidence are tabulated below:
Heres how it works. EXAMPLE
From past experience the standard deviation of rod diameters produced by a machine has been found to be 0.135 cm. For a simple random sample of 30 rods the average diameter is found to be 3.560 cm. Calculate:
(a) the 95% confidence interval for the population mean diameter (b) the 90% confidence interval for the population mean diameter.
(a) Identify
n = 30
= 0.135 cm
= 3.56 cm
Z = 1.960
So we have 95% confidence that the true population mean lies between 3.512 cm and 3.608 cm.
(b) Identify
n = 30
= 0.135 cm
= 3.56 cm
Z = 1.645
So we have 90% confidence that the true population mean lies between 3.519 cm and 3.601 cm.
EXAMPLE
Consider the rod diameters example above where a simple random sample of 60 rods is chosen to estimate the average diameter of the population by using a 95% confidence interval.
Identify
n = 60
= 0.135 cm
= 3.56 cm
Z = 1.960
So we have 95% confidence that the true population mean lies between 3.526 cm and 3.594 cm.
From the above examples we note:
Sample size affects the width of the confidence interval - greater sample size narrows the interval.
Confidence level affects the width of the confidence interval. Higher confidence will widen the confidence interval - thus we are further away from the true mean we are estimating. (We have lost precision, but gained confidence.)
In fact we can use the confience interval technique to determine what size of sample should be taken to achieve a required confidence in our results.
EXAMPLE
A social worker wants to determine, with 95% confidence, and a maximum error of $60; the average wage earned by teenagers during vacation employment. Previous studies have suggested that = $430. What sample size should be used to achieve this?
Identify
= $430
Z = 1.960
maximum error = $60
So we would take a sample size of 198 (the next highest integer).
It is common to estimate ,
the proportion of a population.
In this case, the error term is slightly different. It depends upon:
the same confidence levels of 90%, 95% and 99%
a different standard deviation term based on p, the proportion measured in the sample.
EXAMPLE
USA Today carried out a poll for CNN to answer the question "Do you agree that the current system discourages the best candidates from running for president?" From a sample of 1406 people, 320 responded "strongly agree".
REFERENCE: Becker, Jean (1998) Voters Favor a National Primary, USA Today, 5 February 1988, p 8A.
Calculate a 99% confidence interval for the population proportion who will vote "strongly agree".
Identify
n = 1460
p = 320/1460 = 0.219
Z = 2.575
So we have 99% confidence that the true population proportion lies between 19.1% and 24.7%
Thus when newspapers conduct a survey on whether Australia should become a republic, they can then claim 'with confidence' that a particular proportion of the population is in favour of the proposal.
Questions
1. A quality control officer took a random sample of 400 hinges from a production line and finds that 44 were defective.
2. A maker of fishing rods produces rods which have the standard deviation for their breaking strength at 2.25 kg.
Estimating the mean when
,
the population standard deviation, is not known.
This is a common occurrence - that we know the sample standard deviation, s. When this is so, s is used in place of .
And also, in place of Z , another confidence number called t is used. Any questions in this module which require t
will quote the value for you - you will not have to derive it. t also helps us when the sample size is small.
So the formula becomes
EXAMPLE
A test was conducted to determine the length of time required for students to read a fixed number of pages in a book. All students were instructed to read at the fastest speed which allowed them to comprehend the book. A sample of sixteen students took the test, with a mean time of 24.19 minutes and a sample standard deviation of s = 5.29 minutes.
Question Estimate the mean length of time (in minutes) required for all students to read the book, using a 95% confidence interval with t = 2.131.
(a) 21.37 to 27.01
(b) 21.43 to 26.95
(c) 23.6 to 24.8
(d) 23.8 to 28.4
(e) 22.3 to 26.1
Answer A
Question Estimate the mean length of time (in minutes) required for all students to read the book, using a 99% confidence interval with t =2.947.