Final Exam

Tuesday December 8, 5:30--7:30PM

The questions in the first three sections were selected and prepared by the students. The class is free to view the final before it is administered.


Instructions. Do 1, 4, 7, 8 and one additional problem from each section, for a total of 7 problems in all. Try the Bonus Problem.


Polls

1. A. List three or more things that can cause the statistic obtained by a poll to differ from the parameter that the poll is attempting to estimate. B. A national television network wants to conduct a poll to find out how many Americans will travel this Christmas. The pollsters have several ideas of how to generate a sample. Of their ideas, explain whetehr or not each one would generate a fair, random sample that would produce accurate results. Then, label the BEST option:

2. How do margin of error and confidence level relate to each other? What happens to the margin of error if the confidence level increases? Decreases? How do you change the margin of error while keeping the same confidence level?

3. Comment on the article at (location to be provided by student---Kyle???).


Studies and Experiments

4.A. What is the difference between an experiment and an observational study. Illustrate with an example of each, and draw contrasts between the examples. B. Describe a scenario in which two variables are associated and have a causal relationship. Describe a scenario in which two variables are associated and do NOT have a causal relationship.

5. The double blind procedure is used frequently in experiments on human subjects. In a double blind experiment, neither the person who is administering the treatment nor the subject knows the value of the explanitory variable in the treatment. What is the advantage of this form of experimentation? Describe how one might conduct a double blind experiment to test the effectiveness of zinc lozenges as remedies for the common cold.

6. In an experiment designed to see the effect that placement in a zoo has on animals' mental state, 50 American black bears were studied. 25 were observed in their natural environment, while 25 were placed in a section of the zoo that was set up to meet their needs. The bears were studied over a 30 day period. All 50 of the bears were from Montana. In assessing their mental state, reaction time, appetite, sleep patterns and other physical functions were examined.


Probability, Data Analysis, Inference

7. Pascal's Triangle

8. In a certain class, the scores on the final were 100, 97, 93, 90, 87, 82, 76, 71, 70 and 64.

9. Describe a specific scenario in which a correlation coefficient would be useful. In what sort of general contexts would you expect to encounter correlation coefficients. What does a correlation coefficient describe?

10. Here is (fictional) data from a study intended to determine if there is an association between class and the numer of times a college student works out per week:

 

Workouts per Week
Freshman
Sophomore
Junior
Senior
Row Totals
0--1
52
71
69
101
293
2--3
22
50
120
96
288
4--5
148
100
37
41
326
6--7
28
29
24
12
93
Column totals
250
250
250
250
1000

 


Bonus Problem (worth 1 million points)

The following is a summary of the class project we did in which we studied the packaging claim by Nabiscco that every bag of Chips Ahoy cookies contains 1000 chips. The liklihood that a bag contains fewer than 1000 chips depends on the way that chips are distributed among cookies, so we studied this distribution. However, the data that we gathered contained a surprising feature. In this problem, I am asking you to explain the surprise.

Background

In our initial classroom discussion, we examined a bag of Chips Ahoy cookies, and found it contained 50 cookies. We observed that cookies would have to average somewhat more than 20 chips each to ensure that every bag contains 1000 chips, assuming that all bags have about 50 cookies. The cookies in the bag were divided among the students, who took them home to count the chips.

It appears that students used various methods for counting chips. Some counted the chips visible on the surface of the cookie, while others broke the cookie up. Most cookies contained pieces of broken chips and small particles of chocolate. The class devised no consistent system for handling these.

After the class gathered its data, I collected my own data using 20 cookies from another bag. In making my counts, I used the following procedure. Each cookie was soaked in water, then the chocolate pieces in the cookie were recovered by straining and rinsing the disintegrated cookie. The chocolate pieces remained intact during this process. The only chocolate lost was some minute particles that made it through the strainer. When the rinsing was complete, chocolate pieces of various sizes remained. Most of these were recognizably complete chips. I discarded all pieces that appeared smaller than half of a chip, and counted each remaining piece as a single chip.

Data

The following table shows the data obtained by the class, the data obtained by me, and the data obtained from a computer simulation of the cookie manufacturing process based on the assumption that the chips were added to a huge batch of dough and mixed until randomly distributed. (Technically, I assumed that the chips had a so-called Poissin distribution.)

Sample
Chip counts
Mean of chip counts
Standard deviation of chip counts
Expected standard deviation, as predicted by Poisson model
50 cookies---counted by class
12, 13, 14, 14, 14, 15, 15, 15, 16, 17,
17, 17, 17, 17, 18, 18, 18, 18, 19, 19,
19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
20, 21, 21, 21, 21, 22, 22, 22, 22, 22,
22, 23, 23, 23, 23, 23, 24, 24, 24, 26
19.34
3.24
(4.40)*
50 cookies---simulation assuming a parameter value of 19.34 chips per cookie
10, 11, 12, 12, 13, 14, 15, 15, 15, 15,
15, 16, 16, 16, 17, 17, 17, 17, 17, 18,
18, 18, 18, 18, 18, 18, 18, 18, 19, 19,
20, 20, 20, 21, 21, 21, 21, 21, 21, 22,
22, 22, 23, 23, 24, 24, 25, 26, 27, 30
18.68
4.14
4.40
50 cookies---simulation assuming a parameter value of 19.34 chips per cookie (repeated)
11, 11, 12, 13, 13, 13, 14, 15, 15, 15,
15, 15, 16, 16, 16, 16, 17, 17, 17, 17,
17, 18, 18, 18, 18, 18, 19, 19, 19, 19,
20, 20, 20, 20, 20, 20, 20, 21, 21, 21,
22, 22, 23, 23, 23, 24, 24, 25, 25, 32
18.46
4.07
4.40
20 cookies---counted by me
10, 13, 13, 14, 14, 15, 15, 16, 17, 18,
18, 18, 19, 19, 20, 20, 20, 23, 24, 25
17.55
3.90
(4.19)*
20 cookies---simulation assuming a parameter value of 17.55 chips per cookie
11, 13, 14, 15, 15, 16, 17, 17, 17, 18,
19, 19, 19, 19, 20, 20, 20, 22, 22, 30
18.15
4.02
4.19
20 cookies---simulation assuming a parameter value of 17.55 chips per cookie (repeated)
12, 12, 13, 15, 16, 16, 16, 16, 17, 17,
18, 18, 19, 20, 20, 20, 21, 23, 23, 23
17.75
3.40
4.19

*These were calculated assuming the 19.34 and 17.55, respectively as the true parameter of chips per cookie among all manufactured cookies.

Discussion

I simulated the production of 1000 bags of 50 cookies each, using a total of 967,000 chips (resulting in a per-cookie average of 19.34). All 1000 bags had standard deviations for their chip data between 3.27 and 6.14. The standard deviation obtained by the class was only 3.24---smaller than the standard deviation in any simulated bag. Thus, the standard deviation obtained by the class is quite unexpected. This leads us to suspect that either there is something wrong in the data collection, or else the model I used to simulate the cookie production is wrong. (Perhaps the chips are added to the dough in some other way than by mixing them into a large batch.) Note that if students had used a variety of unbiased counting methods, we would expect MORE (not less) variability in the data than the amount predicted by the model. So the low standard deviation is particularly puzzling.

I also simulated the production of 1,000 samples of 20 cookies each, using enough chips to avearge 17.55 chips per cookie. Among these simulated samples, 380 had standard deviations of chip data which were less than 3.90.

The Question

How do we account for the surprising standard deviation obtained in the class's data?

PS

I have a pretty good idea of what may have happened. I've included a few clues. You don't need to be a math whiz to reach the same suspicion I have; you just have to have a good grasp of what standard deviation means, and a very good critical grasp of data collection.