stats Flashcards
Continous variable
reflects a infinite number of potential values such as the average rainfall in a region
Discrete variable
countable # of distinct values (heads or tails)
to determine probability distribution
x values must be between 0 and 1 and sum of all must equal 1
population
entire group
sample
specific group you collect data from
statistics
number describing a sample
parameter
number describing the whole population
accuracy
the mothod measures what it intended, the statistic correctly estimates the population parameter
precise
if the method is repeated, the estimates are very consistent every statistic is nearly the same
sampling methods that create bias
convience sampling
voluntary sampling
preferred method
simple random sampling
what are the properties of the sampling distribution
Sampling distribution’s mean (μ¯X) = Population mean (μ) Sampling distribution’s standard deviation (Standard error) = σ√n,
shape
central tendency
variabiliy
example of measurement bias (leading question)
do you believe that obama’s horrible beliefs deserve another term in order to ruin our lives.
example of measurement bias (confusing question)
do you not disagree with the not recent slight changes to the american culture?
example of a nonresponse bias
do you currently have an std?
example of voluntary response bias
an internet poll asks its visitors if they prefer cats or dogs
example of a sample bias (nonrandom sample)
someone asks their twitter followers how they feel about the recent changes to congress
how do you measure precision
by using standard error
as population size increases, do accuracy and precision change?
no, both are unaffected
as sample size increases, do accuracy and precision change?
accuracy in unaffected, and it becomes more precise.
what does it mean to say that p-hat is a random variable
repeated sampling will result in different p-hat values.
Suppose a statistician is interested in determining the percentage of Americans who prefer Burger King to McDonald’s. She surveys 100 randomly chosen Americans and finds that of those surveyed, 37% prefer Burger King.
Identify…
a. the population
b. the sample
c. the parameter
d. the statistic/estimator of the study
a. americans.
b. 100 americans.
c. proportion of americans who are burger king to the number of burger king fans.
d. 37%
An analyst wants to know if there is a connection between time spent watching TV per day in hours and fat intake per day in grams. He performs a regression using time spend watching TV as the independent variable and fat intake as the dependent variable and finds that r = 0.5 and the regression line is given by: y = 45.8 + 10.3x
a). Explain what the correlation and regression line mean in the context of the data.
b). predict the fat intake of someone who watches 3 hours of tv a day
c). predict y when x=-2
d). which prediction is more reasonable?
a. The correlation means there is a moderate positive connection between time spent watching TV and fat intake.
The regression line means that for each additional hour of TV someone watches, we predict their fat intake will increase by 10.3 grams(slope), and the predicted fat intake of someone who watches no TV is 45.8(intercept)
b. 76.7
c. 25.2
d. b is more reasonable because you can’t watch a negative number of hours of tv in a day
what are the 4 requirements of the central limit theorem
Random and independent sample, population at least 10x the sample size, np ≥ 10, n(1p) ≥ 10; if you don’t know p, use p-hat