Week 12--13. Statistics.
Inferring population parameters from sample statistics; margin of error and
level of confidence
Basic ideas this week:
Much of statistics is concerned with the problem of obtaining information about
a population from information about a sample. One very vivid application is
currently in the news: polls attempt to determine the way a population will
vote by examining the voting patterns within a sample.
The idea of generalizing from a sample to a population is not hard to grasp
in a loose and informal way, since we do this all the time. After a few vivits
to a store, for example, we notice that the produce is not fresh. So we assume
that the store generally has bad produce. This is a generalization from a sample
(the vegetables we have examined) to a population (all the vegetables the store
sells). But there are many ways to go wrong or to misunderstand the meaning
of the data obtained from a sample.
How do statisticians conceive of the process of drawing a conclusion about
a population from a sample? How do they describe the information that is earned
from a sample and quantify how informative it is? How much data do we need in
order to reach a conclusion that is secure enough to print in a newpaper? Or
on which to base medical decisions? These are the questions that we will address
this week.
The simplest example arises when one uses a sample to infer a population proportion.
We can give a fairly complete account of the mathematical ideas that are used
in this situation, based on the binomial distribution. My aim is to enable you
to understand the internal mathematical "clockwork" of how the statistical
theory works.
Assignment:
- Read: Chapter 8, sections 1, 2 and 3. For the time being,
do not worry about pasages that contain references to the "normal distribution"
of the "Central Limit Theorem" . (Last sentence on page 328, last
paragraph on p. 330, first paragraph on p. 332.) Also, do not worry for the
time being about the examples in section 3.2.
- Review questions: pages 335 and 351.
- Problems: p. 336: 1--8, 11, 12, 13, 14. p. 351: 1--12,
13, 16, 21, 22.
- In-class: p. 337: 20.
- EXTRA CREDIT: Find an article in the New York Times
that describes a poll. The New York Times provides readers with
a very careful explanantion of margin of error and level of confidence; find
their explanation either in an issue of the paper or on the paper's web site,
and report on it. Compare with the information provided by other papers.
Vocabulary:
- Parameters and statistics:
- population mean: the average value of a variable, where
the reference class is a population of interest. E.g. the average high
of all persons owning a Louisiana driver's license. This is a parameter.
- sample mean: the average value of a variable, where
the reference class is a sample from the population. This is a statistic.
It is also a variable that has as its refernce class all possible samples.
- population proportion: the proportion of a population
with a given property. E.g., the proportion of registered voters in East
Baton Rouge who are republican. This is a parameter.
- sample proportion: the proportion of a sample with
the property. This is a statistic. It is also a variable that has as its
refernce class all possible samples.
- Sample distribution: the distribution of a variable whose
reference class consists of all samples (of some fixed size) drawn from some
population.
- Example: Consider the population of all LSU students, and consider
drawing samples of size 100. The variable is the average height of the
people in the sample. (Here we are looking at the disrtibution of the
sample mean.)
- Example: Use the same population and the same sample size,
but now consider the variable "percent male". This is again
a something that can measured in each sample. The sampling distribution
tells us the relative frequency of each possible sample percent in the
reference class of all samples.
- Margin of error: a bound that we can confidently place
on the the difference between an estimate of something and the true value.
- Level of confidence: a measure of how confident we are
in a given marin of error.
Concepts:
We will concentrate on the estimating population
proportions by sampling.
- If we were to take many samples (of a given size) from a population that
was 40% democratic (say), then few samples would have exactly 40% democats.
Most would be close to 40%, but they would differ by varying small amounts.
- If many random samples of size 100 are drawn from a large population (of
democrats and non-democrats), then we can expect better than 95% of the samples
to have a statistic (proportion of democrats in sample) within one tenth of
the true value of the parameter (proportion of democrats in the population).
(For example, if the true proportion is 40%, then better than 19 out of 20
samples of size 100 will have a proportion between 30% and 50%.)
The meaning of margin of error and level of confidence.
- What you know about a population when you have a sample of size 100 is similar
to what you know about the contents of a jar of gum balls if you have the
following information:
- there are only two colors of gum ball in the jar (yellow and purple,
say);
- there are 19 gum balls of one color and 1 of the other;
- you have reached in and drawn a ball at random; it is yellow.
If you make it your policy under such situations to bet that yellow is
the predominant color, in the long run you will be right 19 out of 20 times.
Similarly, when I say that a certian survey method has margin of error of
plus or minus E at a level of conficence of x%, what I mean is that when
that method is used over and over, in x% of all cases, the true value of
th parameter will be within E of the statistic.
The margin of error an level of confidence depend on
the sample size (and NOT on population size):
- The size of the population being studied---provided it is much bigger than
the samples and provided that the sample is truly random---does not matter.
(In practice, one of the greatest challenges to the researcher is to get a
truly random sample.)
- The margin of error at 95% confidence is about equal to or smaller than
the square root of the reciprocal of the sample size. Thus, samples of 400
have a margin of error of less than around 1/20 at 95% confidence.
- To halve the margin of error at a given confidence level, quadruple the
sample size.
- The margin of error and the level of confidence are tied together. A better
(i.e., narrower) margin of error may be traded for a lesser level of confidence,
or a higer level of confidence may be obtiner by tolerating a larger margin
of error.
The underlying idea that explains how we can determine the reliability of statistics
is the notion of sampling distribution. In order to talk about this, I introduce
a new term: by a "p-population",
I mean a very large population that has proportion p of some
characteristic that is of interest, e.g., democrat.
- The binomial distribution tells us EXACTLY how likely it is for a random
sample of size n from a p-population to
have exactly k members with the characteristic of interest.
- Exact values for margin of error and level of confidence of statistics on
populaion proportions are derived from the binomial distribution.
Skills:
- Explain the vocabulary, above and illustrate with examples.
- Explain what it means when a reporter or researcher says that a poll has
a margin of error of 3 percentage points (say) at a level of confidence 95%
(say).
- Use a table to determine the levels of confidence and margins of error
that can be obtained with various sample sizes when attempting to determine
population proportions.
- Use the sqare root law to estimate the sample size needed to get a given
margin of error better than 95% confidence. (See text, page 350.)
Assessments:
- A jar of colored beads may be an analogy for more meaningful situations
that you might encounter (e.g., in the news). List some examples and draw
the analogy explicitly. For example: someone wants to predict the outcome
of an election by means of an exit poll. All the people who voted are analogous
to all the beads in the jar. The color of bead is analogous to the vote---e.g.
color : bead : : candidate
voted for : voter. The people who are questioned
in the poll are analogous to the sample.
- Suppose a large population is 40% red. Imagine that you have drawn a sample
of size 20 from this population. Describe what you think a typical sample
might be like.
- Suppose that you have drawn a sample of size 20 from a population of unknown
proportion red, and that our sample is 40% red. What do you think you cn deduce
about the population?
- A random sample of size 100 from a population of voters is 52% Republican.
What do think the true proportion of Republicans in the population is?
- Do you know anything more than just that the true proprtion is
near 52%?
- Imagine a large bin with pieces of paper---or a jar filled with colored
beads. Describe what we would do in order to estimate the sampling distribution
empirically.
- If we draw 1000 samples, each of size 400, from a population that is 30%
red, then how many samples will have a statistic of exactly 30% (the population
proportion that you decided to work with)? What will the greatest deviation
from p be?