lecture 3. statistical inference Flashcards
does this describe exploratory data analysis or statistical inference?
its purpose is unrestricted exploration of data, searching for interesting patterns; conclusions apply to the subjects and circumstances for which we have data in hand; conclusions are informal based on what we see in the data
exploratory data analysis
does this describe exploratory data analysis or statistical inference?
purpose is to answer a specific question, posed before the data produced; conclusions apply to a larger group of subjects or a broader class of circumstances; conclusions are formal, backed by a statement of our confidence in them
statistical inference
this is a method for drawing conclusions about a population from a sample. uses probability to indicate how trustworthy its conclusions are. assumes a random sample - if this is not the case your conclusions may be faulty
statistical inference
what are the two most common types of statistical inference?
- significance tests
- confidence intervals
this common type of statistical inference estimates a population parameter.
confidence intervals
this common type of statistical inference assess the evidence in the data for some claim about the population
significance tests
generating statistics from a relatively small sample to provide an indication of a population value (parameter) is the process of ________
estimation
________ are fixed for a given population (the mean for a given population is constant)
parameters
_______ are estimates of parameters and cary sample by sample. in effect they are random variables.
sample statistics
the ____________ distribution is the distribution of all possible values of that statistic in all possible random sample so the same size n from population N
random sampling
if a random variable x has a population mean μ and population variance σ2, then the sampling distribution of means (of samples of size n) will have a mean of μ and variance ____
σ2/n
this property means variability of the random sampling distribution depends on _______ and ________
sample size and the variability of the population
the ______ (smaller/larger) the sample size, the smaller the σ2/n
larger
_______ (more/less) variance = the more certain we can be
less
since sample means are normally distributed, we can define a ________ which is a range of values used to estimate the true value of the population parameter; it is the probability (usually expressed as a percentage) or the proportion of times that the ____ actually does contain the population parameter
confidence interval (CI)
states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed.
central limit theorem
In a sample mean, we use the _______ to describe the variability of individual observations
standard deviation, σ
in a sample mean, we use the ______ to describe the variability of a sampling distribution
standard deviation of the sampling distribution of means/THE STANDARD ERROR OF THE MEAN
For normal distributions, μ +/- σ contains 68.26% of the observations, μ +/- 2σ contains 95.44% of the observations and so on based on the empirical rule. How does this statement differ for a Sampling distribution?
For normal distributions, μ +/- σ contains 68.26% of the observations, μ +/- 2σ contains 95.44% of the observations, etc.
For sampling distribution, mean +/- σ/√n contains 68.26% of the observations, mean +/- 2σ/√n contains 95.44% of the observations, etc.
what are the SIX steps in hypothesis testing
- state the null hypothesis
- state the alternative hypothesis
- set the decision level (alpha)
- choose the test statistic
- calculate P (assumes null hypothesis is true)
- make a decision concerning null hypothesis
this states that any differences in the data are due to chance
null hypothesis
this states that any differences in the data are “real’ or significant
alternative hypothesis
if P (null hypothesis is true) is less than 0.05, do you reject or retain the null hypothesis
reject null hypothesis
if P (null hypothesis is true) is greater than 0.05, do you reject or retain the null hypothesis
retain the null hypothesis