ECO 446 Flashcards
chapter 17
Sampling
statistical inference:
involves using the sample to draw conclusions about the characteristics of the population from which the sample came
Population: entire group of items that interests the
researcher
Sample: part of the population that we actually
observe
u is the population mean, x-bar is the sample mean
What is the difference between SAMPLE STATISTICS and POPULATION PARAMETERS?
Sample statistics are obtained from estimates using the sample data. Population statistics require knowledge of the entire population.
In general, we rarely know the true values of our population parameters!
Probability Distributions:
•Random variable: a variable X whose outcome is determined by chance, the outcome of a random phenomenon
Discrete: has a countable number of possibilities (coin flips, rolls of a dice)
Continuous: can take on any value in any interval (height, temperature)
•Probability distribution: a probability
distribution P[Xi] for a discrete random
variable assigns probabilities to the possible
values X1, X2 …
The Normal Distribution
Real world data often conform to a normal
distribution, and many probability distributions
converge to a normal distribution when they
are cumulated
Central limit theorem: If Z is a standardized
sum of N independent, identically distributed
(discrete or continuous) random variables with
a finite, nonzero standard deviation, then the
probability distribution of Z approaches the
normal distribution as N increases.
Bias
If estimated parameters are UNBIASED, it means
the expected value of the sample statistic is equal to the Population parameter!
If estimated parameters are BIASED, it is likely due to a biased sample. Some examples of sampling bias:
• selection: when a sample systematically excludes or under-represents certain groups
self-selection: when respondents choose to be in a particular group (examining physical fitness of joggers)
- survivor: when a sample follows individuals over time, yet only studies those who survive (medical studies, stock market)
- non-response: systematic refusal of some groups to participate in a study
unbias: E[x-bar] = u
A sample statistic is an unbiased estimator of a population parameter if the mean of the sampling distribution of this statistic is equal to the value of
the population parameter
however, we will never know u
chapter 17 exercise from the textbook
- Write the meaning of each of the following terms without referring to the book (or your notes), and compare your definition with the version in the text for each.
a. probability distribution
b. random variable
c. standardized random variable
d. sample
e. sampling distribution
f. population mean
g. sample average
h. population standard deviation
i. sample standard deviation
j. degrees of freedom
k. confidence interval
a. probability distribution
A probability distribution P[Xi] for a discrete random variable X assigns probabilities to the possible values X1, X2, and so on. Probability distributions are scaled so that the total area inside the rectangles is equal to 1.
b. random variable
A random variable X is a variable whose numerical value is determined by chance, the outcome of a random phenomenon.
c. standardized random variable
To standardize a random variable X, we subtract its mean u and then divide by its standard deviation std (or sigma):
Z = (X - u)/std
To standardize a random variable X, we subtract its mean 0 and then divide by its standard deviation 1
The standardized variable Z measures how many standard deviations X is above or below its mean. If X is equal to its mean, Z is equal to 0. If X is one standard deviation above its mean, Z is equal to 1. If X is two standard deviations below its mean, Z is equal to -2
d. sample:
part of the population that we actually observe
e. sampling distribution
The sampling distribution of a statistic, such as x-bar, is the probability distribution or density curve that describes the population of all possible values of this statistic. It can be shown mathematically that if the individual observations are drawn from a normal distribution, then the sampling distribution for x-bar is also normal. Even if the population does not have a normal distribution, the sampling distribution of x-bar will approach a normal distribution as the sample size increases
g. sample average
The sample average (also called the sample mean) is the simple arithmetic average of N observations :
h. population standard deviation
The standard deviation of the sampling distribution depends on the value of population standard deviation sigma, a parameter that is unknown but can be estimated. The most natural estimator of sigma, the standard deviation of the population is s, the standard deviation of the sample data. The sample variance of
N observations is the average squared deviation of these observations about the sample average
The sample standard deviation s is the square root of the variance: s = square(sample variance)
Standard error of X-bar = s/squre(N)
In 1908, W. S. Gosset figured out how to handle this increased uncertainty. Gosset was a statistician employed by the Irish brewery Guinness, which encouraged statistical research but not publication. Because of the importance of his findings, he was able to persuade Guinness to allow his work to be published under the pseudonym “Student” and his calculations became known as the Student’s t-distribution
Student’s t-distribution. When the mean of a sample from a normal distribution is standardized by subtracting the mean of its sampling distribution and dividing by the standard deviation of its sampling distribution, the resulting Z variable
Z = (X-bar - u)/(u/squre(N))
Gosset determined the sampling distribution of
the variable that is created when the mean of a sample from a normal distribution is standardized by subtracting and dividing by its standard error:
t = (X-bar - u)/(s/squre(N))
t-distributions that are identified by the number of degrees of freedom:
degree of freedom = number of observation - number of parameters that must be estimate
Here, we calculate s by using N observations and one estimated parameter X-bar; therefore, there are degrees of freedom N-1
k. confidence interval page 24
Now we are ready to use the t-distribution and the standard error of to measure the reliability of our estimate of the population mean price of homes in
Diamond Bar. If we specify a probability, such as we can use Table B-1 to find the t-value such that there is a probability that the value of t will exceed , a probability that the value of t will be less than , and a probability that the value of -t will be in the interval to t
- The heights of U.S. females between the age of 25 and 34 are approximately normally distributed with a mean of 66 inches and a standard deviation of 2.5 inches. What fraction of the U.S. female population
in this age bracket is taller than 70 inches, the height of the average adult U.S. male of this age?
Z = [70 - 66]/2.5 = 1.6
查The normal distrubution 表 p547
P(z=1.6) = 0.0548
A stock’s price-earnings (P/E) ratio is the per-share price of its stock divided by the company’s annual profit per share. The P/E ratio for the stock market as a whole is used by some analysts as a measure of whether stocks are cheap or expensive, in comparison with other historical periods. Here are some annual P/E ratios for the S&P 500:
Year P/E 1980 7.90 1981 8.36 1982 8.62 1983 12.45 1984 9.98 1985 12.32 1986 16.42 1987 18.25 1988 12.48 1989 13.48 1990 15.46 1991 20.88 1992 23.70 1993 22.42 1994 17.15 1995 16.42 1996 19.08 1997 21.88 1998 28.90 1999 31.55
Calculate the mean and standard deviation. Was the 1999 price-earnings ratio of 31.55 more than one standard deviation above the mean P/E for 1980–1999? Was it more than two standard deviations above the mean?
mean: 16.886
standard deviation: 6.43
z-scored for 1999: 2.2798
it is more than one standard deviation but more than two standard deviation
- Which has a higher mean and which has a higher standard deviation:
a standard six-sided die or a four-sided die with the numbers 1 through 4 printed on the sides? Explain your reasoning, without doing any calculations
Because of the numbers on each side are equally likely, we can reason directly that a six-sided die has an expected value of 3.5 and a four-sided die has an expected value of 2.5. Because the possibilities are more spread out on the six-sided die has the larger standard deviation.
- A nationwide test has a mean of 75 and a standard deviation of 10. Convert the following raw scores to standardized Z values: X = 94, 75, and 67. What raw score corresponds to Z = 1.5?
mean = 75 std = 10 X = 94 Z-scored = (94 - 75)/10 = 1.9
X = 75 Z-scored = (75 - 75)/10 = 0
X = 67 Z-scored = (67 - 75)/10 = -0.8
So, none of the raw score coresponds to Z = 1.5?
- A woman wrote to Dear Abby, saying that she had been pregnant for 310 days before giving birth. Completed pregnancies are normally distributed with a mean of 266 days and a standard deviation of 16
days. Use Table B-7 to determine the probability that a completed pregnancy lasts at least 310 days.
Table B-5 = B-7
mean(u) = 266
standard deviation (std) = 16
The z values and normal probabilities are:
P[x>310] = P[(x-u)/std > (310-266)/16] = P[z-score > 2.75] = 0.003
Therefore, p= 1-0.9970 or =0.003. There is a 0.3% chance that pregnancy lasts 310 days.
question: in the answer for ch 17, what is the 270 stand for. typo
- Calculate the mean and standard deviation of this probability distribution for housing prices:
Price X (dollars) Number of Houses Probability P[X]
400,000 15,000 0.30
500,000 20,000 0.40
600,000 15,000 0.30
E(x) = 400000 × 0.3 + 500000 × 0.4 + 600000 ×0.3 = 500000
V(x) = E(x^2) - [E(x)]^2
= 400000^2 × 0.3 + 500000^2 × 0.4 + 600000^2 ×0.3 - 500000^2
std = squre(V(x)) = 77459.66692
- Explain why you think that high-school seniors who take the Scholastic Aptitude Test (SAT) are not a random sample of all high-school seniors. If we were to compare the 50 states, do you think that a state’s
average SAT score tends to increase or decrease as the fraction of the state’s seniors who take the SAT increases?
The high-school seniors intend to take take the exam because they aim to get a better college offer and some students always have the above-average score. With the fraction of the state’s seniors who take the SAT increase, the state’s average SAT score decrease and vice versa. Because the weaker students join the SAT exam and pull the average score down.
- American Express and the French tourist office sponsored a survey that found that most visitors to France do not consider the French to be especially unfriendly. The sample consisted of “1,000 Americans
who have visited France more than once for pleasure over the past two years.” Why is this survey biased?
The survey may be biased. First of all, the sample only selected Americans who have visited France more than once for pleasure over the past two years. It excluded those visitors who visited France over the past three years or more. Secondly, some of the visitors who in the sample may not respond to the survey. These reasons may cause the x-bar differents population. So, that is why the survey is unbiased.
- The first American to win the Nobel prize in physics was Albert Michelson (1852–1931), who was given the award in 1907 for developing and using optical precision instruments. His October 12–November
14, 1882 measurements of the speed of light in air (in kilometers per second) were as follows:
299,883 299,796 299,611 299,781 299,774 299,696 299,748 299,809 299,816 299,682 299,599 299,578 299,820 299,573 299,797 299,723 299,778 299,711 300,051 299,796 299,772 299,748 299,851
Assuming that these measurements were a random sample from a normal distribution, does a 99 percent confidence interval include the value 299,710.5 that is now accepted as the speed of light?
looking for B-1 tow-tailed
sample mean(x-bar): 299756
sample standard deviation(std): 107.114
degree of freedom(n): 22
t-value for 99%CI (check the table B-1)(t): 2.819
x-bar - t*std/sqrt(n)