INFERENTIAL STATS Flashcards
consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
Inferential statistics
statistician tries to make inferences from samples to populations.
Inferential statistics
Inferential statistics uses ___________—, i.e., the chance of an event occurring.
probability
A ___________________ consists of all subjects (human or otherwise) that are being studied.
population
Most of the time, due to the expense, time, size of population, medical concerns, etc., it is not possible to use the entire population for a statistical study; therefore, researchers
use ________
samples
__________ is a group of subjects selected from a population.
sample
An area of ____________ called hypothesis testing is a decision-making
process for evaluating claims about a population, based on information obtained from samples.
inferential statistics
__________________ includes making inferences from samples to populations,
estimations and hypothesis testing, determining relationships, and making
predictions.
Inferential statistics
Inferential statistics is based on _______________
probability theory
One aspect of inferential statistics is __________, which is the process of estimating the value of a parameter from information obtained from a sample.
estimation
Inferential statistical techniques have various ______________that must be met before valid conclusions can be obtained.
assumptions
Some statistical techniques are called _______. This means that the distribution of the variable can depart somewhat from normality, and valid conclusions can still be obtained
robust
A continuous, symmetric, bell-shaped distribution of a variable
Normal DIstribution
If a random variable has a probability distribution whose graph is continuous, bell-shaped, or symmetric, it is called a ____________________
normal distribution.
was named after the German mathematician Carl Friedrich Gauss.
Bell curve or Gaussian Distribution
A normal distribution curve is _____-shaped
bell
IN A NORMAL DISTRIBUTION, The ______________________- are equal and are located at the center of the distribution.
mean, median, and mode
A normal distribution curve is ______ (i.e., it has only one mode).
unimodal
IN A NORMAL DISTRIBUTION, The curve is symmetric about the _____, which is equivalent to saying that its shape is the same on both sides of a vertical line passing through the center.
mean
IN A NORMAL DISTRIBUTION, The curve is _________; that is, there are no gaps or holes. For each value of X, there is a corresponding value of Y.
continuous
IN A NORMAL DISTRIBUTION, The curve never touches the _______. Theoretically, no matter how far in either direction the curve extends, it never meets the ______ but it gets increasingly closer.
x axis
IN A NORMAL DISTRIBUTION, The total area under a normal distribution curve is equal to __________
1.00, or 100%
The area under the part of a normal curve that lies within 1 standard deviation of the mean is approximately _________
0.68, or 68%
The area under the part of a normal curve that lies within
2 standard deviations, about __________
0.95, or 95%
The area under the part of a normal curve that lies within
3 standard deviations, about __________
0.997, or 99.7%
3 TYPES OF DISTRIBUTION
Symmetric Distribution
Negatively/Left-Skewed Distribution
Positively/Right-Skewed Distribution
- the data values are evenly distributed about the mean
Symmetric Distribution
– majority of the data falls to the right of the mean
Negatively/Left-Skewed Distribution
– majority of the data falls to the left of the mean
Positively/Right-Skewed Distribution
A normal distribution with a mean of __ and a standard deviation of __
0 AND 1
Suppose a college president wishes to estimate the average age of students attending this semester. The president could select a random sample of 100 students and find the average age of these students, say 22.3 years (this is an example of a ___________)
point estimate
A specific numerical value estimate of a parameter.
Point estimate
The best point estimate of the population mean µ is the _____________.
sample mean X
There isn’t really a way of knowing how close a particular point estimate is to the population mean. That’s why most statisticians prefer another estimate which is the ____________
interval estimate
An interval or a range value is used to estimate the parameter.
Interval estimate
This estimate may or may not contain the value of the parameter being estimated
Interval estimate
If the sample size is >30, the distribution of the means will be approximately ______
normal
3 properties of a good estimator:
The estimator should be an ___________. That is, the expected value or the mean of the estimates obtained from samples of a given size is equal to the parameter being estimated.
unbiased estimator
The estimator should be _________. For a ___________ estimator, as the sample size increases, the value of the estimator approaches the value of the parameter estimated.
consistent
The estimator should be a ______________________-. That is, of all the statistics that can be used to estimate a parameter, the ____________________ has the smallest variance.
relatively efficient estimator
– is the percentage of times you expect to get close to the sample estimate if you’re going to rerun the experiment again.
Confidence level
Three common confidence levels:
90%, 95%, 99%
Critical value of 90%
± 1.65
Critical value of 95%
± 1.96
Critical value of 99%
± 2.58
α value of 90%
0.10
α value of 95%
0.05
α value of 99%
0.01
The _____________ is a statistic expressing the amount of allowable random sampling error in results
margin of error
The larger the ___________, the less confidence that the sample results are close to the “true” figures for the whole population
margin of error
The act of choosing the number of observations or replicates to include in a statistical sample
SAMPLE SIZE DETERMINATION
used in any empirical study to make inferences about a population from a sample
SAMPLE SIZE DETERMINATION
3 METHODS OF HYPOTHESIS TESTING
The classical approach
The P-value approach
The confidence interval approach
If the sample value observed is too many standard deviations away from the true value claimed under H0, then it must be too unlikely H0, is true
The classical approach
If the probability of the sample value being that far away is small, then it must be too unlikely H0, is true
The P-value approach
If we are not sufficiently confident that the parameter is likely enough, then it must be too unlikely
The confidence interval approach
Every hypothesis-testing situation begins with the __________
statement of a hypothesis
A ____________ is a conjecture about a population parameter.
statistical hypothesis
The _____________ symbolized by H0 is a statistical hypothesis that states that there is no difference between a parameter and a specific value, or that there is no difference between two parameters.
null hypothesis
The _______________, symbolized by H1 is a statistical hypothesis that states the existence of a difference between a parameter and a specific value or states that there is a difference between two parameters.
alternative hypothesis
____________ – critical area is two sided and tests whether a sample is > or < a certain range of values
Two-tailed test
____________ = if interest is in the increase only
One-tailed test (right)
__________(can be rejected based on statistical evidence)
Always stated with “equals” sign representing a given value
Null hypothesis
____________ (can be used to support a claim)
Sometimes known as the research hypothesis
Alternative hypothesis
Alternative hypothesis is also known as the
research hypothesis
A type __ error occurs if you reject the null hypothesis when it is true
I
A type ___ error occurs if you do not reject the null hypothesis when it is false
II
type II error occurs if
you do not reject the null hypothesis when it is false
A type I error occurs if
you reject the null hypothesis when it is true
α value - 0.10
Level of Significance
10% chance of rejecting a true null hypothesis
α value - 0.05
Level of Significance
5% chance of rejecting a true null hypothesis
α value - 0.01
Level of Significance
1% chance of rejecting a true null hypothesis
The _______________- is the maximum probability of committing a type I error. This probability is symbolized by a (Greek letter alpha). That is P(type I error) = α.
level of significance
The ______________ separates the critical region from the noncritical region.
critical value
The ____________________ is the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected.
critical or rejection region
The ___________________ is the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected
noncritical or nonrejection region
A ____________ indicates that the null hypothesis should be rejected when the test value is in the critical region on one side of the mean.
one-tailed test
Conjuncture about a population parameter.
This conjecture may or may not be true
STATISTICAL HYPOTHESIS
There is no difference between 2 parameters
Null Hypothesis (H0)
There is a difference between 2 parameters
Alternative Hypothesis (H1)
HYPOTHESIS TESTING AND CRITICAL
VALUES
- State the hypothesis
- Select the statistical test
- Choose the level of significance
- Formulate a plan for study
- Analyze results and make a decision
a statistical test for the mean of a population and it is used when:
* n is greater than or equal to 30
* when the population is normally distributed
Z TEST FOR MEAN
a statistical test for the mean of a population and it is used when the population is normally or approximately normally distributed or is unknown
T TEST FOR MEAN
If σ is KNOWN and n > 30, use the
z-test.
If σ is KNOWN, and n < 30, use the
t-test.
If σ is UNKNOWN, but n > 30, use the
t-test.
If σ is UNKNOWN, and n < 30, use the
t-test.
probability of getting a sample statistic such as the mean or a more extreme sample statistic in the direction of the alternative hypothesis when the null hypothesis is true
Z TEST FOR MEAN
p > 0.10
weak or no evidence
0.05 < p ≤ 0.10
Moderate evidence
0.01 < p ≤ 0.05
Strong evidence
p ≤ 0.01
Very strong evidence
DECISION RULE USING A P-Value
if p value <= alpha value
reject the null hypothesis
DECISION RULE USING A P-Value
if p value => alpha value
do not reject the null hypothesis
DECISION RULE USING A P-Value
if p value <= 0.01
reject null hypothesis. Difference is
highly significant
DECISION RULE USING A P-Value
0.01 < p < = 0.05
reject null hypothesis. Difference is
significant
DECISION RULE USING A P-Value
if 0.05 < p <=0.10
consider consequences of type 1
error before reject null hypothesis
DECISION RULE USING A P-Value
if p > 0.10
Do not reject the null hypothesis.
Result is significant