Midterm 2 Flashcards
Population
Parameter
Statistic
Population
–entire group of interest
Parameter (P - Pop.)
–numerical fact about Population
(MU - ave. GPA of all BYU students)
Statistic (S = Sample)
–numerical fact about Sample (x bar = ave. GPA of 170 students in sample)
Sampling distribution of sample mean (x bar) is a theoretical prob. distribution
–it describes the distribution of: …
It describes the distributions of:
- -all sample means
- -from all possible random samples
- -of the same size
- -taken from same pop.
MEAN (x bar) = mean of sampling distribution of x bar
SD (x bar) = st. dev. of sampling distribution of x bar
relationship btwn sample size and st. dev.
can change n, but if do samples 500x - the mean all becomes SAME - but as N increases, the st. dev. decreases (less variation)
–variation in sample mean values tied to size of each sample - NOT the # of samples
Distribution of X bar for ALL possible SRS of size n from a pop. with mean MU and ST. D sigma
- -center: MEAN of sampling distribution of x bar is Mu (pop. mean)
- -Spread: st. dev. of sampling distribution of x bar is (sigma / radical n)
In real research…not realistic to estimate sampling distribution of sample mean by actually taking multiple random samples from same pop. - but there is a math. machine that shows
–math staticians have figured out how to predict what the sampling distribution will look like w/o actually repeating numerous times and having to close a sample each time
Central limit theorem
- -if you take a SRS of size n from any pop. then the shape of the sampling distribution of x bar is approx. normal
- -shape gets more normal as N increases
- n > 30, considered large
- -CLT allows us to use st. normal table to compute approx. probability associated with x bar
- -The sampling distribution of statistic (like a sample mean) often follows a normal distribution if sample sizes are large
center = Mu = pop. mean
spread = st. dev. = sigma / radical n
shape =
–case 1 - pop. is normal, shape distribution of x bar is normal
–case 2 - pop. is not normal - the shape of distribution of x bar is approx. normal when N is large (> 30) CLT
why do we care about the sampling distribution?
- -sampling distribution allows us to assess uncertainty of sample results
- -if we knew the spread of sampling distribution we could now how far our x bar might be from the true Pop. mean
if sampling distribution has a lot of variance - if you took another sample it is likely to get a very diff. result
- -about 95% of the time, the sample mean will be within (z * (sigma / radical n)) of pop. mean
- -this tells us how “close” the sample statistic should be to the pop. parameter
ex. what is probability that mean of n=75 will exceed $2700? Mu = 2600, st. dev. = 500
(2700 - 2600) / (500/ radical 75) = z = 1.73
- -on table pop. = .0418
- -have to do the sample st. dev. for the denominator
- -if normal than the sample also normal
- -so can compute prob. less than 32 - bc normal curve just like pop. - eve if small sample size bc pop. is normal
Fill in blanks:
the sample distribution of x bar gives —– from all possible samples of same size from same pop.
the sample distribution of x bar gives ALL X BAR VALUES from all possible samples of same size from same pop.
Suppose we take all possible samples of the same size from a population and for each sample, we compute x̄. The mean of these x̄ values will be exactly equal to the mean of the population (μ) from which the samples were taken - TRUE
Suppose we have a very right skewed population distribution where μ = 80 and σ = 20. For random samples of size n = 100, what is the mean of the sampling distribution of x̄?
EQUAL TO 80
–must be equal to pop. mean bc large sample size
Suppose we have a very right skewed population distribution where μ = 80 and σ = 20. For random samples of size n = 100, what is the standard deviation of the sampling distribution of x̄?
Less than 20
–large sample size = small st. dev.
What is the advantage of reporting the average of several measurements rather than the result of a single measurement?`
The average of several measurements is MORE LIKELY to be close to the true mean than the result of a single measurement
—NOT always = true mean…but more likely
Take on sample of size N - compute X bar for sample. as estimate (guess) of Mu
–use knowledge of sampling distribution of x bar in general to say something about uncertainty associated with this x bar of SAMPLE DISTRIBUTION
Center = mean of x bar
Spread = s of x bar = sigma / radical n
Shape = 1. pop. is normal = normal
2. large sample size >30 = normal
CHART
if talking about: 1. AN INDIVIDUAL (x) --pop. of ind. is normal? = YES use z = (x - Mu) / sigma then the normal table to find p =NO can't solve stop now
- A sample mean (x bar)
–Pop. of ind. is normal???
= YES
use z = (X bar - MU) / (Sigma / radical n)
–then normal table
=NO
check sample size
= > 30 CLT so use (z = (x bar - Mu) / ( sigma / radical n)
–then normal table
n <30 = nothing you can do
If all possible samples of size 80 are taken from a population instead of size 20, how would this change the mean and standard deviation of the sampling distribution of x̄?
The mean would stay the same and the standard deviation would decrease
GOOGLE STATS 121 EXAM 2 QUIZLET
True or False: We can never compute probabilities on x̄ when the population is skewed.
False
The following scenario applies to questions 3-5:
Suppose we have a normal population distribution where μ = 80 and σ = 20. For random samples of size n = 100, what is the probability of getting an x̄ greater than 75?
.9938
–I did the pop. x bar formula and divided it by sigma over radical 100
statistical process control (jelly bean ex.)
series of interconnected steps in producing a product of service (ex. jellybeans)
–rec. raw ingredients - boil to syrup - add flavor - molded into shape - sugar coat - shell - glaze = branded = shipped
statistical Dogma for processes
all processes have NATURAL variation
–raw material, human performance, equipment performance, measurement
all processes susceptible to UNNATURAL variation
- -bad batch of raw material
- broken machine
Stats process control 2
use statistical paradigm to monitor process variables (inputs, outputs, etc.) over time to decide if variability consistent with natural variation
- -if consistent…continue process
- -if inconsistent, stop process - find cause of unnatural variation, fix problem, resume process
x bar control chart
in control process
out of control process
x bar control chart
–tool for monitoring variables of a process, alerting us when unnatural variation seems to have occurred
in control process
–process whose output exhibits only natural variation over time
out of control process
–process exhibits unnatural variation over time
Use central limit theorem to determine if process is in statistical control
- -ex. if jelly belly weight is normally distributed with mean MU = 1.1 grams and sigma = .1, then by CLT:
1. the sampling distribution of x bar is normal with mean 1.1 and st. dec. sigma / radical n
- “natural variation” of x bar is within 3 (sigma / radical n) of 1.1
constructing control chart for x bar
- draw horizontal centerline at Mu (pop. mean)
- draw horizontal control limits at
MU +/- 3* (sigma / radical n) - lot the means (x bar’s) from samples of size n against time
out of control signals
- one point above upper control limit or below lower control limit
- run of 9 POINTS IN a row on SAME SIDE of centerline
as soon as out of control signal is observed, STOP the process and look for a cause
What is the purpose of a statistical control chart?
To distinguish between natural and unnatural variation
inference testing
inference - drawing conclusions about a pop. (parameter) based on a sample (Statistic) with a measure of uncertainty
- -everything we’ve learned so far is to ensure valid inference
- -making generalizations about the population based on sample data
Pop. 1. producing data from pop.
- exploratory data analysis (data from sample)
- probability
- inference about pop.
Types of statistical inference 1. POINT ESTIMATION
- POINT ESTIMATION
–quantitative data ex.
based on sample of n = 47 policies, we estimate that the ave. premium is approx. $1800
–categorical data ex.
based on a. sample n = 144 households, we estimate that the proportion of infected bamboo is approx. 10.4%
Types of statistical inference 2. INTERVAL ESTIMATION
- INTERVAL ESTIMATION
–quant. data ex.
based on sample of n = 47 policies, we estimate that the ave. premium is btw $1700 and $1900
–categorical ex.
based on sample of n = 144 households, we estimate that the proportion of infected bamboo is btw 8.4% and 12.4%
Types of statistical inference 3. HYPOTHESIS TESTING
- HYPOTHESIS TESTING
–Quant. ex.
insurance agent believes that the ave. premium at her agency is $2500. Based on the sample n = 47 claims, we found that the ave. premium was $1800. this data, therefore, provides evidence that the ave. premium is less than $2500. That is, x bar = $1800 is an outcome that would rarely happen if the ave. was indeed $2500.
–categorical ex.
researchers believe that the proportion of infected bamboo is 2%. Based on a sample of n = 144 households, found that prop. was 10.4% This data provides evidence that the prop. is > 2%
Point estimation definition
–estimator and estimate
ESTIMATOR
- -general statistic that estimates the parameter
- -ex. the estimator of the pop. mean MU is the sample mean of x bar
ESTIMATE
- -a specific value of an estimator
- -ex. the ave. value of n = 47 is $1800
- the prop. of infected boards for n = 144 is 10.4%
- -the estimator of the pop. proportion p is the sample proportion p (with hat)
how good is the estimator x bar for MU?
A.) sampling needs to be done RANDOMLY
B.) sampling distribution of x bar tells us:
- on ave. x bar will give us the right answer - Calle this property UNBIASEDNESS
- As sample size n INC. the accuracy of x bar INC. (smaller st. dev)
Statistical inference chart
- CONFIDENCE INTERVAL point estimate +/- ME
- Test of significant res. fail to rej. at level alpha
both solve for:
- -conclusion about parameter
- -measure of uncertainty
LOOK AT CHART - CH. 17
four steps for confidence intervals
STATE
–Specify parameter of interest
PLAN
–choose procedure, level of confidence
SOLVE
–check conditions, carry out procedure
CONCLUDE
–interpret confidence interval
LOOK AT MY WRITING ASSIGNMENT