**Stat2.2x Probability（概率）课程由加州大学伯克利分校（University of California, Berkeley）于2014年在edX平台讲授。**

**ADDITIONAL PRACTICE FOR THE FINAL**

**PROBLEM 1**

A box contains 8 dark chocolates, 8 milk chocolates, and 8 white chocolates. (It’s amazing how this box keeps replenishing itself and reappearing. It’s like the Magic Pudding. Australians will know what I mean, and the rest of you might enjoy finding out. It’s one of the classics of children’s literature.) A simple random sample of 6 chocolates is drawn. Find:

a) the expected number of dark chocolates

b) the SE of the number of dark chocolates

c) the chance that there are fewer than 2 dark chocolates

d) the chance that the second and third chocolates drawn are dark, given that the first and fourth chocolates drawn are not dark

e) the expected number of dark chocolates among the last four draws

**Solution**

This is hypergeometric distribution (Zeros and Ones: Sum of a sample without replacement), $n=6, N=24, G=8$.

1a) $$E(\text{dark chocolates})=n\cdot\frac{G}{N}=6\times\frac{8}{24}=2$$

1b) $$SE(\text{dark chocolates})=\sqrt{n\cdot\frac{G}{N}\cdot\frac{N-G}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{6\times\frac{8}{24}\times\frac{16}{24}}\times\sqrt{\frac{24-6}{24-1}}\doteq1.021508$$

1c) $$P(\text{fewer than 2 dark chocolates})=\sum_{x=0}^{1}\frac{C_{G}^{x}\cdot C_{N-G}^{n-x}}{C_{N}^{n}}$$ $$=\sum_{x=0}^{1}\frac{C_{8}^{x}\times C_{16}^{6-x}}{C_{24}^{6}}\doteq0.319118$$ R code:

sum(dhyper(0:1, 8, 16, 6)) [1] 0.319118

1d) $$P(\text{2nd and 3rd are dark | 1st and 4th are not dark})$$ $$=\frac{8}{22}\times\frac{7}{21}\doteq0.1212121$$

1e) Given no information about any other draw, the last four draws are probabilistically the same as any other four, say the first four. $$E(\text{dark chocolates among the last four draws})=4\times\frac{8}{24}\doteq1.333333$$

**PROBLEM 2**

The casino is offering a “house special” at roulette: there are 8 chances in 38 to win, and the bet pays 3 to 1. Suppose you bet $\$1$ on the house special, 200 times, independently. Find:

a) your expected average net gain per bet (and then pledge that you will never play this game)

b) the chance that you come out ahead

c) the chance that you lose more than $\$20$

**Solution**

2a) Sample mean with replacement: $$E=3\times\frac{8}{38}+(-1)\times\frac{30}{38}\doteq-0.1578947$$

2b) Let $x$ be the number of winning times. $$3x+(-1)\cdot(200-x) > 0\Rightarrow x > 50\Rightarrow x\geq51$$ Binomial distribution $n=200, k=51:200, p=\frac{8}{38}$: $$P(\text{come out ahead})=\sum_{k=51}^{200}C_{200}^{k}\times(\frac{8}{38})^k\times(\frac{30}{38})^{200-k}\doteq0.0750046$$ R code:

sum(dbinom(51:200, 200, 8/38)) [1] 0.0750046

2c) $$3x+(-1)\cdot(200-x) < -20\Rightarrow x < 45\Rightarrow x\leq44$$ $$P(\text{lose more than 20})=\sum_{k=0}^{44}C_{200}^{k}\times(\frac{8}{38})^k\times(\frac{30}{38})^{200-k}\doteq0.6660572$$ R code:

sum(dbinom(0:44, 200, 8/38)) [1] 0.6660572

**PROBLEM 3**

Households in a large city contain an average of 2.2 people, with an $SD$ of 1.2 people. A simple random sample of 625 households is taken.

a) Approximately what is the chance that there are more than 1400 people in the sampled households?

b) How would your answer to a) have been different had the sample been drawn with replacement?

**Solution**

3a) Sample sum without replacement but the correction factor is very close to 1 since the city is very large. $\mu=2.2, \sigma=1.2, n=625$: $$SE=\sqrt{n}\cdot\sigma=\sqrt{625}\times1.2=30$$ $$Z=\frac{1400.5-n\cdot\mu}{SE}$$ Calculating by R:

n = 625; mu = 2.2 z = (1400.5 - n * mu) / 30 1 - pnorm(z) [1] 0.1976625

Thus the chance is around $19.77\%$.

3b) It wouldn‘t. Because the city is large so the correction factor is very close to 1, that is, the chance will be the same whether draw with replacement or without replacement.

**PROBLEM 4**

There are three boxes. Box I contains one gold coin and one silver coin. Box II contains two silver coins. Box III contains two gold coins. A box is selected at random, and then one coin is selected at random from that box. Given that the coin is gold, what is the chance that the other coin in the box is gold? [No, the answer is not 1/2.]

**Solution**

Bayes Rules: $$P(\text{box 3 | the first coin is gold})=\frac{\text{the first coin is gold and it is from box 3}}{\text{the first coin is gold}}$$ $$=\frac{\frac{1}{3}\times1}{\frac{1}{3}\times\frac{1}{2}+\frac{1}{3}\times0+\frac{1}{3}\times1}=\frac{2}{3}$$

**PROBLEM 5**

A coin is tossed $n$ times. There is about $95\%$ chance that the proportion of heads is in the range $.49$ to $.51$. The number of tosses $n$ is closest to:

a) 1,000

b) 5,000

c) 10,000

d) 50,000

**Solution**

Sample proportion of ones. $p=0.5$ and the interval $.49$ to $.51$ has to be $0.5\pm2SE$, thus $$2SE=0.01\Rightarrow SE=0.005$$ On the other hand $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{\frac{1}{4}}{n}}=0.005\Rightarrow n=10000$$

**FINAL EXAM**

**PROBLEM 1**

Suppose you are trying to estimate the percent of women in a city. Other things being equal, a simple random sample of 0.1% of the population of a city that has 2,000,000 people is ________ as a simple random sample of 0.1% of the population of a city that has 500,000 people. Fill in the blank with the best of the following choices.

a) about 1/4 times as accurate

b) about 1/2 times as accurate

c) about as accurate

d) about 2 times as accurate

e) about 4 times as accurate

**Solution**

Square Root Law. $$2\times10^6\times0.1\%=2000,\ 5\times10^5\times0.1\%=500$$ $$\Rightarrow\sqrt{\frac{2000}{500}}=2$$ Thus the former is about 2 times as accurate as the latter. d) is correct.

**PROBLEM 2**

A group of 30 people consists of 15 children, 10 men, and 5 women. Tom and Jerry are two of the men in the group. Five people are picked at random without replacement.

2A Find the chance the first person picked is a man, given that the fourth and fifth people picked are children.

2B Find the chance that more than two women are picked.

2C Find the chance that Tom and Jerry both get picked.

**Solution**

2A) $$P(\text{1st person is a man | 4th and 5th are children})=\frac{10}{28}\doteq0.3571429$$

2B) Hypergeometric distribution $$P(\text{more than 2 women})=\sum_{x=3}^{5}\frac{C_{5}^{x}\cdot C_{25}^{5-x}}{C_{30}^{5}}\doteq0.02193592$$ R code:

sum(dhyper(3:5, 5, 25, 5)) [1] 0.02193592

2C) Both of Tom and Jerry get picked means we only have to select 3 persons among other 28 remaining people: $$P(\text{both of Tom and Jerry get selected})=\frac{C_{28}^{3}}{C_{30}^{5}}\doteq0.02298851$$ R code:

choose(28, 3) / choose(30, 5) [1] 0.02298851

**PROBLEM 3**

A gambling game pays 4 to 1 and the chance of winning is 1 in 6. Suppose you bet $\$1$ on this game 600 times independently.

3A Find the expected number of times you win.

3B Find the $SE$ of the number of times you win.

3C Find the chance that you lose more than $\$50$ (that is, your net gain in the 600 bets is less than $-\$50$).

**Solution**

Zeros and Ones: Sum of a sample with replacement, $n=600, p=\frac{1}{6}$.

3A) $$E(\text{winning times})=n\cdot p=600\times\frac{1}{6}=100$$

3B) $$SE(\text{winning times})=\sqrt{n\cdot p\cdot(1-p)}\doteq9.128709$$

3C) Let $x$ be the number of winning times, $$4x+(-1)\cdot(600-x) < -50\Rightarrow x < 110\Rightarrow x\leq109$$ Binomial distribution $n=600, k=0:109, p=\frac{1}{6}$: $$P(\text{lose more than 50})=\sum_{k=0}^{109}C_{600}^{k}\times(\frac{1}{6})^k\times(\frac{5}{6})^{600-k}\doteq0.8508149$$ R code:

sum(dbinom(0:109, 600, 1/6)) [1] 0.8508149

**PROBLEM 4**

In a grocery store, butter is sold in “sticks” that are shaped like little bricks. The weights of these sticks are like draws at random with replacement from a population with average 4 ounces and SD 0.2 ounces. The grocery store receives the butter in boxes; each box consists of 100 sticks.

4A Find the chance that the average weight of the sticks in one box is less than 3.999 ounces.

4B The grocery store has received 6 boxes of butter. There is about ___________ chance that in at least one of the boxes, the average weight of sticks is less than 3.999 ounces.

**Solution**

4A) Sample mean with replacement, $$\mu=4, \sigma=0.2, n=100\Rightarrow SE=\frac{\sigma}{\sqrt{n}}=0.02$$ $$Z=\frac{3.999-\mu}{SE}$$ Calculating by R:

z = (3.999 - 4) / 0.02 pnorm(z) [1] 0.4800612

4B) Following 4A), this is binomial distribution $n=6, k=1:6, p=0.4800612$: $$P(\text{at least 1 box is less than 3.999 ounces})$$ $$=\sum_{k=1}^{6}C_{6}^{k}\cdot p^k\cdot(1-p)^{6-k}=0.9802433$$ R code:

p = pnorm(z) sum(dbinom(1:6, 6, p)) [1] 0.9802433

**PROBLEM 5**

In surveys about sensitive topics, respondents are sometimes given ways to “hide” their answers from the surveyor. In a survey of taxpayers, one of the questions is, “Did you cheat on your taxes?” To answer, the respondent is asked to toss a fair coin. If it lands heads, the respondent must answer “yes.” If it lands tails, the respondent must answer the question truthfully, either “yes” or “no” (the answer has to be the one that is true). Assume that all respondents follow this procedure, and that for 10% of the respondents the truthful answer is “yes.” Also assume that the result of a respondent’s coin toss is independent of whether or not the respondent cheated on his / her taxes. Oneof the respondents is picked at random.

5A Given that the respondent cheated on his / her taxes, what is the chance that he / she answered “yes”?

5B Given that the respondent did not cheat on his / her taxes, what is the chance that he / she answered “yes”?

5C Given that the respondent answered “yes,” what is the chance that the respondent cheated on his / her taxes?

**Solution**

According to the information, we have $$P(\text{did not cheat on taxes})=0.1,\ P(\text{not cheated on taxes})=0.9$$

5A) $$P(\text{answered Yes | cheated on taxes})$$ $$=\frac{P(\text{cheated on taxes and answered Yes)}}{P(\text{cheated on taxes})}$$ $$=\frac{P(\text{cheated and tossed head})+P(\text{cheated and tossed tail})}{P(\text{cheated on taxes})}$$ $$=\frac{0.1\times0.5+0.1\times0.5}{0.1}=1$$ This result indicates that if someone cheated on taxes then he / she must answered "Yes"!

5B) $$P(\text{answered Yes | did not cheat on taxes})$$ $$=\frac{P(\text{answered Yes but did not cheat on taxes})}{P(\text{did not cheat on taxes})}$$ $$=\frac{0.9\times0.5}{0.9}=0.5$$

5C) $$P(\text{cheated on taxes | answered Yes})=\frac{P(\text{cheated on taxes and answered Yes})}{P(\text{answered Yes})}$$ $$=\frac{P(\text{cheated on taxes and answered Yes})}{P(\text{cheated on taxes and answered Yes})+P(\text{did not cheat on taxes and answered Yes})}$$ $$=\frac{0.1\times0.5+0.1\times0.5}{(0.1\times0.5+0.1\times0.5)+0.9\times0.5}\doteq0.1818182$$

**PROBLEM 6**

In a population of 10,000 adults, $20\%$ are smokers. A simple random sample of 600 of the adults is drawn.

6A Find the expected number of smokers in the sample.

6B The $SE$ of the number of smokers in the sample is closest to

6C Find the chance that there are fewer than 115 smokers in the sample.

**Solution**

6A) $$E=n\cdot p=600\times0.2=120$$

6B) $$SE=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{600\times0.2\times0.8}\times\sqrt{\frac{10000-600}{10000-1}}\doteq9.499949$$

6C) $$Z=\frac{115-120}{SE}$$ Calculating by R:

n = 600; N = 10000; p = 0.2 se = sqrt(n * p * (1 - p)) * sqrt((N - n) / (N - 1)) z = (115 - n * p) / se pnorm(z) [1] 0.2993334

**PROBLEM 7**

When a die is rolled, the face with six spots appears with chance $\frac{1}{6}$, independently of all other rolls. Rank the three events below in increasing order of probability. For example, if you choose “A B C”, you are saying that A has the smallest chance, B has more chance than A but less chance than C, and C has the biggest chance. [If you think that some of the events have the same chance, please think again.]

A: The face with six spots shows up on fewer than $16.7\%$ of the rolls when a die is rolled 60,000 times.

B: The face with six spots shows up on more than $16.7\%$ of the rolls when a die is rolled 30,000 times.

C: The face with six spots shows up on fewer than $16.7\%$ of the rolls when a die is rolled 30,000 times.

**Solution**

This is binomial distribution. Let $m=n\cdot p$, where $n$ is the number of rolls and $p=\frac{1}{6}$: $$P(A)=\sum_{0}^{m-1}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=60000$. $$P(B)=\sum_{m+1}^{n}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=30000$. $$P(C)=\sum_{0}^{m-1}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=30000$. R code:

dieroll = function(n, p, id){ # id=0 means fewer than a fixed proportion m = n * p if(id == 0){ print(sum(dbinom(0:(m - 1), n, p))) } else{ print(sum(dbinom((m + 1):n, n, p))) } } > dieroll(60000, 1/6, 0) [1] 0.4983005 > dieroll(30000, 1/6, 1) [1] 0.4962232 > dieroll(30000, 1/6, 0) [1] 0.4975965

Thus $$P(B) < P(C) < P(A)$$

**PROBLEM 8**

A die has 2 red faces, 2 blue faces, and 2 green faces. It is rolled 240 times. Let $R$ be the number of times red faces appear, and $B$ the number of times blue faces appear.

8A The random variable $R$ is the sum of 240 draws at random with replacement from

8B Consider the random variable $D = R - B$. That’s $D$ for “difference.” If all 240 rolls show blue faces, then $D = -240$; if they all show red faces, then $D = 240$; otherwise $D$ is somewhere in between. The random variable $D$ is the sum of 240 draws at random with replacement from

8C Find $E(D)$

8D Find $SE(D)$

**Solution**

8A) Note that $R$ is from 0 to 240, that is, if red was picked then $R=R+1$. Thus the similar pool should include 1 and 0, such as $$1,1,0,0,0,0$$ or $$1,0,0$$

8B) Similar to 8A. The equivalent pool should contain 1(red), -1(blue), and 0(green), such as $$1, 0, -1$$ or $$1, 1, -1, -1, 0, 0$$

8C) & 8D) Sample sum with replacement: $$\mu=0, n=240$$ and $$\sigma=\sqrt{(1-1)^2\times\frac{1}{3}+(-1-0)^2\times\frac{1}{3}+(0-0)^2\times\frac{1}{3}}=\sqrt{\frac{2}{3}}$$ Thus $$E(D)=n\cdot\mu=0$$ $$SE(D)=\sqrt{n}\cdot\sigma=\sqrt{240}\times\sqrt{\frac{2}{3}}=\sqrt{160}\doteq12.64911$$

**时间：**12-30