Stat - Exam #2 Flashcards
What is the Sampling Distribution of the Statistics?
- Stats calculated from a sample varies with each sample;
- Variation because each stat is a random variable that follows some probability density CURVE with a LOCATION and SPREAD;
What is Sampling Distribution?
A probability density curve of all possible values of a statistic computed for a sample size (n).;
-Focus on the Population Means
What is the Law of Large Numbers?
As the sample size gets LARGER, the difference between the sample average and the population mean gets SMALLER
What is affected by SAMPLE SIZE with normally distributed data?
-Normally distributed, the MEAN of sample average is NOT affected by sample size, but the standard deviation of the sample average IS affect by size
What is Sampling Distribution of Sample Average?
If data are distributed normally with mean (u) and standard deviation (sigma), then the average of a sample of size (n) with be distributed normally with mean (u) and standard deviation [sigma/(sq. rt of n)]
IF/THEN of Sampling Distribution of Sample Average
IF: x has shape NOR with location (u) and spread (sigma)
THEN: x-bar has shape NOR with location (u) and spread {sigma/(sq. rt of n)}
What is the Standard Error of the Mean?
The standard deviation of the same average;
— sigma_x-bar = sigma/(sq. rt of n)
How do you find the shape of a sample for data NOT normally distributed?
-The sample average and the sample standard deviation can be calculated, but the shape is determined from a z-curve and the z-table = Central Limit Theorem
What is the Central Limit Theorem?
- When there are at least 30 data points (any shape, mean u, and standard deviation) the SAMPLE AVERAGE will…
1. follow the NORMAL shape
2. have mean (u) — same as population;
3. and have standard deviation {sigma/(sq. rt. of n)}
LESS than 30 data points
- Unknown shape;
- Mean (u);
- Standard deviation {sigma/(sq. rt. of n)}
Greater than or Equal to 30 data points
- NORMAL shape;
- Mean (u);
- Standard deviation {sigma/(sq. rt. of n)}
Difference in the Sampling Distribution and Central Limit Theorem
Sampling distribution of the mean deals with the location and spread of the sample average;
-The central limit theorem deals only with the SHAPE of the sample average
What is the population standard deviation is UNKNOWN?
- Use the sample standard deviation to calculate confidence interval estimates of a population parameter;
- The sample standard deviation can be calculated ANYTIME there is a sample
What is used to get degrees of freedom and critical values of the sample test?
The t-table
What do you calculate when the population standard deviation is UNKNOWN?
-Replace the populations standard deviation with with SAMPLE standard deviation = t-transforation that yields a t-statistic
What is a t-transformation?
Converts a sample average into a t-statistic
t = (x-bar - u) / [s/(sq. rt of n)]
What is t-distribution?
If a simple random sample of size (n) is taken from a population that follows the normal distribution, then the t-statistic follows the t-distribution with (n-1) degrees of freedom
What is a t-statistic?
t _ (sigma/2), (n-1) =
- sigma/2 = gives the area in one tail = Column of t-table;
- n-1 give the degrees of freedom = Row of t-table
How do you use the t-table?
- Need to know the area under the tail and the degrees of freedom;
- If the exact are not in the table, follow the practice of always going down to the next lower degrees of freedom in the table;
- NOTE: the LAST row of that -table is the same as the z-table
What is a reasonable value for the population mean?
A CONFIDENCE INTERVAL gives a set of values that are reasonable choices for the population mean based on the information in the SAMPLE data
Where does the level of confidence come from?
The NORMAL probability curve
What is Inferential Stats?
Use the information from a sample to make conclusions about the population
What is an Interval Estimate?
- Value of sample stat is very seldom the exact population parameter, but pretty close;
- Calculate a sample stat and an INTERVAL indicating how close the stat is to the population parameter;
- *Central to Inferential Stats
What are the major methods of Inferential Stats?
- Confidence Interval Estimation = give an estimate of the value of the UNKNOWN population parameter;
- Hypothesis Testing = Claim about a population, then sample data are collected and use to test this claim
Which standard deviation is ALWAYS known?
- Sample!;
- Population is not usually known in everyday practice
What is required when the population standard deviation is UNKNOWN?
Requires the used of a t-value
Z-scores are only applicable when pop. standard deviation is already known
What is the Point Estimate of a population parameter?
The value of the sample statistic used to estimate the population parameter
What is the Point Estimate of of the population MEAN?
The value of the sample average;
-BEST estimator of the population mean
Sample Average = POINT ESTIMATOR
Actual value of Sample Average = POINT ESTIMATE
What is the Point Estimate of the population STANDARD DEVIATION?
The value of the sample standard deviation
What values are estimated?
ONLY the values of the population parameters, NEVER the values of the sample statistics
What are the Estimators for population parameters?
Location = Mean, Median, Mode
Spread = Standard Deviation, Range, IQR
What are the properties of a good estimator?
- Unbiased = expected value of estimator equals value of parameter;
- Consistent = larger sample makes estimator more accurate;
- Efficient = estimator has the smallest standard deviation
What is a Confidence Interval Estimate?
Range of values if an interval on the real number line;
-Expresses the natural uncertainty in the estimate
What is a Confidence Level?
The proportion of confidence intervals calculated from a large number of random samples that contain the value of the population parameter
Denoted = CL;
Common Values = 90%, 95%, 99%;
Decided by the researcher
What is the Significance Level?
The area outside of the region of confidence;
Denoted = sigma Calculated = {1 - (CL/100)}
What is a Critical Value?
The pair of values that bound the region of confidence
Denoted = (+/- z_alpha/2’), (+/- t_alpha/2, n-1)
What is the Confidence Interval Estimate of a Population Parameter of a Population when Standard Deviation is KNOWN?
- An estimate of the value of a population parameter consisting of
1. an interval of number, and
2. a level of confidence that the interval contains the value of the population parameter
Denoted = CI% = (LCL,UCL)
LCL — Lower Confidence Limit; UCL — Upper Confidence Limit
What determines the WIDTH of the confidence interval?
Comes from the confidence level CHOSEN and the spread of the sample average
What is given by the confidence level?
Gives the area in the RIGHT tail (alpha/2), which gives the critical values bounding the region of confidence (+/-z_alpha/2)
Where does the CENTER of the confidence interval come from?
The CENTER of the confidence interval comes from the value of the SAMPLE AVERAGE (x-bar)
What are the steps of a Confidence Interval?
- First find the critical values — defines the width of the interval (which is centered around the population mean, which is unknown)
- Need to slide the interval over until it is centered on the sample average, which is known;
- Convert the critical values into x-values;
- Two x-values in the proper from make the confidence interval
What is the Margin of Error?
The RIGHT term in a confidence interval estimate;
- Determined the width of the confidence interval;
- Anything that changes the margin of error changes the width of the confidence interval
Margin = {z_(alpha/2)} x {sigma/(sq. rt of n)}
What happens with a REDUCED margin of error?
NARROWS confidence interval
What happens with an INCREASED margin of error?
WIDENED confidence interval
What are the 3 ways to change the Margin of Error?
- Sample size: Increase = Narrow; Decrease = Widen;
- Confidence level: Increase = Widen; Decrease = Narrow;
- Standar deviation of the population: IMPOSSIBLE to change
What is the method for the Confidence Interval Estimate (z) of the Population Mean?
(Pop. SD is KNOWN)
- Statistics = Sample Average (x-bar) & Population Standard Deviation (sigma);
- Critical Value = Confidence Level (CL) & Critical Z-score (z_sigma/2);
- Compute = CI% = {x-bar +/- (z_alpha/2) (sigma/sqrt. n)}
- State = CI% = (LCL, UCL)
How is calculating confidence intervals estimates different when the population standard deviation is NOT known?
- Use the sample standard deviation (which can always be calculated) to calculate confidence interval estimates of a population parameter;
- Major difference is is the the DEGREES of FREEDOM must be known and must use the t-table to get critical values
What is the method for the Confidence Interval Estimate (t) of the Population Mean?
(Pop. SD is NOT known)
- Statistics = Sample Average (x-bar) & SAMPLE Standard Deviation (s);
- Critical Value = Confidence Level (CL), Degrees of Freedom (n-1) & Critical t-value {t_(alpha/2), (n-1)};
- Compute = CI% = {x-bar +/- {t_(alpha/2), (n-1)} x [s/sqrt. n)]}
- State = CI% = (LCL, UCL)
What is a Hypothesis Test?
- First write 2 statement about a population parameter;
— Status quo value of the population parameter is given in the first statement;
— Claim made by the researcher is given in the second statement; - Sample data are collected and analyzed;
- Finally concluded which statement is closer to the truth
How is a hypothesis test different from a confident interval calculation?
With a hypothesis test, you first must make some claim about a population parameter;
-Then collected data and determine if the claim is reasonable or not
What are the 3 steps of a hypothesis test?
- Hypothesize = a set of hypotheses are written giving the status quo and the researchers claim;
- Analyze = sample data are collected and analyzed;
- Conclude = a conclusion as to which statement is closer to the truth is made
What is a hypothesis?
A statement about a population parameter;
EX: u=50 (pop mean)
- Can be for one or more populations
- Must be about the value of a population parameter (never a sample stat)
- Made BEFORE data is collected — and sample must be appropriate to the the statement;
**Two Types = Null and Alternative Hypothesis
What is a Null Hypothesis?
A statement that the population parameter has the status quo value;
- Denoted = “H-naught” (H0);
- EX = H0 : u = 72;
- Assumed TRUE in the hypothesis test until the sample evidence proves OTHERWISE;
- ALWAYS contains an EQUAL SIGN
What is an Alternative Hypothesis?
- A statement that a population parameter does NOT have the status quo value;
- *Gives the researchers CLAIM;
- Denoted = “H-one” (H1);
- EX = H1 : u /= 72, H1 : u 72;
- Assumed FALSE in the hypothesis test until the sample evidence proves otherwise;
- NEVER contains an EQUAL SIGN
What is the purpose of analysis of a hypothesis test?
- Decide which hypothesis is closer to the truth, the null hypothesis or the alternative hypothesis;
- 3 scenarios for a hypothesis test =
1. z-Test of the Mean
2. t-Test of the Mean
3. z-Test of Proportion
What are the 3 methods in each scenario to conduct a hypothesis test?
- Critical Value Method = Traditional
- P-value Method = Modern
- Confidence Interval Method = Two-Sided
What is the conclusion to a hypothesis?
- Must chose only one of the two conclusions to end a hypothesis test;
1. REJECT the null hypothesis, or
2. NOT REJECT the null hypothesis - Never enough information to prove a hypothesis is true, so a hypothesis can be shown to be false or not false — never shown true
What is Hypothesis Testing?
A producer that uses our knowledge of probability with evidence from a sample to test a claim about a characteristic of a population;
— Claim is about the value of a population parameter;
— Can be for one or more populations
What are the Assumptions of Hypothesis Testing?
- Simple random sample;
2. Sample average is normally distributed
What is the logic behind hypothesis testing?
- Assume status quo value (null) is TRUE (this is NOT confidence intervals);
- Examine sample data;
— Sample evidence CLOSE to the status quo, support that value (null)
— Same evidence FAR from the status quo REFUTES that value and supports the alternative - Make one of two conclusions
— Status quo is reasonable given the sample
— Status quo is not reasonable given the sample
How do you define “close” and “far” from the status quo?
-Using the properties of the normal curve;
-Determine the critical z-score (remember that z-score of a point is just how many standard deviations away from the mean)
— Any value CLOSER to the mean than the z-score is CLOSE;
— Any value further way from the mean than the z-score if FAR
*Z-scores are then converted to x-values that support or refute the null hypothesis
95% Level of Confidence
- Any value inside Z-scores of +/-1.96 is CLOSE to the mean and support the null;
- Any value outside Z-scores +/-1.96 is FAR from the mean and refutes the null hypothesis (reject the null)
What are the 3 situation scenarios in hypothesis testing?
- Two-tail situation;
- One-tail situation to the left, and
- One-tail situation to the right
Two-Tail Situations
The null hypothesis can be rejected by sample evidence that is too big OR too small (rejection in both tail regions)
-Two- tail hypothesis: H0: u=72; H1 u /= 72
One-Tail Situations
The null hypothesis can be rejected by sample evidence that is ONLY too big or too small (rejection in ONLY ONE tail region)
- Right-Tail hypothesis: H0: u= 72; H1: 72< u
- Left-Tail hypothesis: H0: u = 72; H1 u <72
What is a Type 1 Error?
The null hypothesis is TRUE; but we REJECT the null hypothesis in the test;
-Denoted “alpha”
What is a Type II Error?
The null hypothesis is FALSE, but we do NOT REJECT the null in the hypothesis test;
- Denoted “beta”
Which type of Error is stats most concerned with?
- Type I;
- Choose the probability of making a Type 1 Error early in the hypothesis test;
- Usually let a Type II error float to whatever it becomes;
- *Type 1 Error = Level of Significance
What is Level of Significance?
The probability of making a Type I Error;
- Denoted “alpha”
- Rejection region in the hypothesis testing
How do you choose the level of significant?
- If the consequences of making a Type I error are severe, choose the level of significance to be SMALL (alpha = 0.01);
- If the consequences are NOT severe, the level of significance should be larger (alpha = 0.05 or 0.10);
- Inverse relationship of Type I and Type II errors;
- Raise probability of Type I (raise alpha), reduces the probability of a Type II error
How do Significance Level and Confidence Level relate?
- Like two side of a coin;
- A 5% significance level means a 95% confidence level of giving the correct conclusion
What is Level of Confidence?
- The probability of NOT making a Type I Error;
- Calculation: 1-alpha
What are the 3 methods of conducting a hypothesis test about a KNOW population mean using z-scores?
- Critical Value Method = Traditional;
- P-Value Method = Modern;
- Confidence Interval Method = Two-Sided
What is a Critical Value?
*Critical Value Method (z) =
A z-score which is critical to separate the REJECTION REGION from the ACCEPTANCE REGION;
-Denoted: +/- z_alpha/2, -z_alpha, +z_alpha
What is the Rejection Region of the Critical Value?
The set of all z-scores that are FAR from the mean, such that a NULL hypothesis is REJECTED;
- Denoted: alpha;
- Sometimes called the Critical Region
What is a Test Statistic?
A z-score, calculated form sample data, which is used to test if the NULL hypothesis is closer to the truth;
-Denoted: z_0;
- Calculation: z_0 = {(sample mean - pop. mean)/(pop. SD/ sqrt of sample (n))}
- SAME calculation for Left, Right, and Two-Tail Critical Values
Left-Tail Critical Value Method
- Hypothesis:
H0: u = u0;
H1: u < u0; - Critical Value = -z_alpha;
- Calculation
- Reject: z_0 < -z_alpha;
- Conclusion: Do, or do not, reject null
Two-Tail Critical Value Method
- Hypothesis:
H0: u = u0;
H1: u NOT equal u0; - Critical Value = +/-z_alpha/2;
- Calculation
- . Reject: z_0 < -z_alpha/2 OR z_alpha/2 < z_0;
- . Conclusion: Do, or do not, reject null
Right Tail Critical Value
- Hypothesis:
H0: u = u0;
H1: u > u0; - Critical Value = +z_alpha;
- Calculation
- . Reject: z_alpha < z_0;
- Conclusion: Do, or do not, reject null
When can you used the Critical Value method?
- Method is ROBUST for small deviation from normality (use normal probability plot),
- But NOT robust for data with outliers = use boxplot
What is a P-Value?
The probability of repeating an experiment under the assumption that the null hypothesis is true and getting a test statistic as extreme, or more extreme, than the value observed;
- This is the area under the curve from the TEST STAT to INFINITY;
- In one tail if is is a one-tail situation and in both tails if it is a two-tail situation
What is the P-Value method?
- Test Stat for Left-Tail, Two-Tail, and Right-Tail = z_0;
- Extreme Values:
- Left Tail = z < z_0;
- Two Tail = (z < -z_0) + (+z_0 < z);
- Right Tail = z_0 < z - Calculate = (Area from Z-table) X (Number of Tails)
What are the advantages of using the P-Value OVER the Critical Value Method?
- The decision made in the P-value method is the SAME way every time — no need to look up a different P-Value every time
- P-value gives info about the STRENGTH of evidence; P-value close to the level of significance means that the evidence for making conclusion is WEAK; P-value fat from the level of significance means that the evidence for making a conclusion is STRONG
How do you determine the hypothesis using a P-Value?
P-Value GREATER than alpha = DO NOT Reject the Null;
P-Value LESS than alpha = REJECT the Null
When are Confidence Intervals used?
ONLY when you have a two-tail hypothesis test;
-Because rejection region in a confidence interval is ALWAYS in both tails
What must be used to test claims about a population mean when the population standard deviation is NOT known?
-Sample standard deviation is always known (or can be calculated) and use the t-table (NOT the z-table)
What is a Critical Value?
- a t-value which is critical to separated the rejection region from the acceptance region;
- Denoted: {(+/- t_alpha/2), (n-1, -t_alpha), n-1, (+t_alpha, n-1)}
What is a Test Statistic for hypothesis tests when the population standard deviation is NOT known (using t-test)?
- a t-value, calculated from SAMPLE data, which is used to test is the NULL hypothesis is closer to the truth;
- Denoted: t_0;
-Calculation: t_0 = (sample mean - pop. mean)/(s/ sqrt. n)
(s = sample standard deviation; n = sample size)
What is the J-Method to Find Area (t)?
- Method ONLY for the t-table;
1. Start in the left margin at the degrees-of-freedom;
2. Go across the row until you find the number closest to the value of the test stat;
3. Read area in one tail at the top of the column
How do you decide the null using the t-table once the P-value is determined?
-One the P-value is obtained, the decision is made the SAME way as when the population standard deviation is KNOWN