Background: We had previously done some simulations in Mathematica showing what happens when numerous samples of sizes 100, 1,000 and 10,000 were drawn from various populations.
Students were asked to write a response to the following: You are a political consultant who has been asked to predict the winner in what is expected to be a very close race for a senate seat. There are two candidates: a democrat and a republican. A previous poll of a random sample of people who are likely to vote has found 49% of the sample favor the democrat. The poll has a reported margin of error of plus or minus 4%, at 95% confidence. Explain how you might use a computer simulation to determine how large a sample you would need to reduce the margin of error to 2%. If the poll were repeated with a sample of this size, would you necessarily get a better basis for predicting a winner?
Here is what they said. Student responses are in black. My remarks are in red. To see how I would have answered, look at the end of this document.
-In order to reduce the margin of error, increase the number of people polled along with the number of samples. More individuals in a sample, or more samples, both will yield more information. But when we speak of "margin of error," we generally mean to refer to a single sample.
-Yes. With each time (averaged w/ the others), the margin of error as well as the confidence would increase. You should note that there is a tradeoff between margin of error and level of confidence. Even with a single sample, your margin of error can be made smaller at the expense of confidence.
-In order to gain a 2% margin of error, you must sample a large enough group of the population. You must sample until less than 5% of the sample group is further away than 2% from the target value. This statement doesn't make any sense in the context. The sampled units are being tested to see if they are democrats or republicans. How could an individual be "2% from the target value"? The previous sentence is a misunderstanding of what is meant by level of confidence. The correct idea is: we must choose a sample size so large that when samples of that size are taken over and over again, less than 5% of the samples have a statistic differing from the population parameter by more than 5%. We raised the size of the sample to 10,000 and easily attained a margin of error of less than 2%. It was easy because we already know the target, or actual value.
In order to use simulations to determine how large a sample would be needed, one must know the percentage of the variable being measured as reflected in the entire simulated population. It would then be necessary to determine what size sample is needed to consistently measure within 2% of the variable as measured in the population at large. Good explanation, except the word "consistently" needs to be quantified. Consistently would mean "95% of the time" in an answer to the question I posed; in other contexts, it could mean 99% of the time or 99.99% of the time. It depends on the level of confidence you wish to attain.
So, taking the first poll at its word, we would input into our simulation that the number of Democrats in the population at large is measured at about 49%. Using this number, it would then be easy (if so, how?) to calculate a sample size in which the variable sampled measures at 49%(+-1/2) with a reasonable confidence rate.
-78% level of confidence
in a 2% margin of error (1000=pop.) Seems to be a reference
to the results of a particular simulation we did, in which we found that of
a large number of samples of size 1000, 78% happened to have a statistic within
2% of the (known) population parameter.
b/t 1000&10,000
-(100 samples of) 10,000 would not be affected by size
-sample size 10,000
-took 100 random samples from a population 51%Rep
-statistic was w/in 2% of 51% for all 100 samples
Luck?
w/in 2%
100 76%
1,000 8?%
10,000 100% Not clear what any of this is intended to
mean.
Observation: Even if we did reduce the margin of error to 2%, the Democrat still has a good chance of winning because he could have as much as 51% of the vote. A chance, but I would not say "good." It is possible to calculate the chance of the democrat winning, given that the poll of 1000 voters yields a statistic of 51% republican.
Using the simulations, increase the sample size until a 2% margin of error is found. Give a more detailed description of how to do this. How do you know if a given sample size will result in a 2% margin of error at a confidence level of 95%? No, there is still a margin of error, so there is a chance that if you take the poll the prediction could still be off just as much as with a 4% margin of error. Good observation; of course, the chance becomes smaller the larger the sample from which the statistic was obtained.
I understand that if you increase the sample size, the statistic you end up with is more accurate. We were trying to get a 2% margin of error, so did we just get lucky by choosing 1000 as our sample size? Good question; this goes to the heart of the matter. See the next student's answer. I am still unclear on how, from the information we gathered, we can calculate the confidence level (which I used to understand, but it seems to have slipped away from me).
From table 1-1 (page 40 of Moore, Statistics: Concepts and Controversies) it is seen that a population percentage of near 50 (49%) and a margin of error of +-4% is acquired from a sample size of either 750 or 1000. That margin of error decreases to +-3 when the population sampled increases to 1500. Thus, the margin of error decreases as the population sampled increases. To get a margin of error of +-2, the population sampled would have to be increased, probably to 2000 (or more). You clearly understand the table very well, though your guess is low.
If the poll is repeated with this sample size, we would have a better basis for predicting the winner b/c the margin of error would have decreased. While we could not be certain that the prediction of a winner would come true, we would be closer to the truth b/c of the decrease in the margin of error. To be even more certain we would need to increase the sampling population again.
This is all supported by the following: "You can get a smaller margin of error by having a larger sample." (pg. 39) This response shows a good understanding of margin of error and level of confidence, but it does not address the question I posed of how to use simulations to determine the sample size needed to determine a given margin with given confidence.
According to the table in the book when the population percentage is near 50 and the margin of error is +-4, the sample size is 750-1000. So, if you increase the sample size to 2000-2250, you could reduce the margin of error to +-2. But, the confidence level would be less. At first, you seem to be assuming that margin of error is not inversely proportional to sample size---so a doubling of sample size, you think, would cut the margin of error in half. This is NOT the case. You generally need to quadruple sample size to halve margin of error and preserve level of confidence (as the last response, below, accurately points out). You are right, therefore, to say that the confidence level would be lower. But I don't know if this is what motivated you to add the last statement.
Use simulations to run sample sizes which over a period of time show an average. This average can be used to determine a margin of error and the larger the sample, the smaller the margin of error. If sampling would be repeated, we would not necessarily get a better basis for predicting a winner because the confidence rate could go down since the margin is so close. I don't have a clear idea of what you're trying to say.
Use simulations to determine the sample size to get +-4% margin of error. Then quadruple the sample size to cut the margin of error by half---the desired 2%. This is a useful rule of thumb. Though not exact, it's generally more than adequate for practical purposes when the parameter you are trying to measure is between 10 and 90 percent, and sample sizes are more than 100.
The teacher's thoughts on the problem
To repeat the problem: You are a political consultant who has been asked to predict the winner in what is expected to be a very close race for a senate seat. There are two candidates: a democrat and a republican. A previous poll of a random sample of people who are likely to vote has found 49% of the sample favor the democrat. The poll has a reported margin of error of plus or minus 4%, at 95% confidence. Explain how you might use a computer simulation to determine how large a sample you would need to reduce the margin of error to 2%. If the poll were repeated with a sample of this size, would you necessarily get a better basis for predicting a winner?
Based on the previous poll, we know that the proportion of democrats is near 50%. I can write a program that simulates choosing a random sample of size n from a population that is 50% democrat. To accomplish the task set in the problem, I would choose a particular n (say n=500) and run the program several hundred times. Then I would determine what proportion of the computer-generated samples had a proportion of democrats between 48% and 52%. If I found that 95% of the samples were in this range, then I would be satisfied that 500 was large enough. If not, I would try a larger value for n (say n=1000), and again create several hundred samples. Eventually, I'll find a value for n for which the samples the computer produces have between 48% and 52% democrat at least 95% of the time. I would assume the same sample size would produce results with the same margin of error and level of confidence for a population that did not have exactly---but very nearly---50% democrat.
If I were to go ahead and do a poll using a sample of this size, it would be entirely possible for me to get a statistic of 50% democrat, and in this case the poll would not give me any grounds for predicting a winner. Or, I could get a statistic of 49% democrat. Then, I would have to say that a democratic loss was more probable than the alternative, but my level of confidence in a democratic loss would be lower than 95%. Moral: though larger samples are more reliable in determining the percent democrat, this does not necessarily mean that larger samples will make it easier to predict a winner in a close race. In fact, the race might be so close that no poll of less than the entire population could create reasonable confidence in an outcome.