STA441 Flashcards
Quantitative variable
Representing amount of something
Categorical
Codes represent category membership
Explanatory Variable
Predictor or cause (Contributing factor)
Response Variable
Predicted or effect
Statistic
Numbers that can be calculated from sample data
Parameters
Numbers that could be calculated if we knew the whole population
Distribution
Population histogram
Conditional distribution
For each value x of the explanatory variable X, there is a separate distribution of the response variable Y
When are response and explanatory variables unrelated
If the conditional distribution of the response variable is identical for each value of the explanatory variable
When are response and explanatory variables related
If the distribution of the response variable does depend on the value of the explanatory variable.
Null hypothesis
Explanatory and response variable are unrelated in the population
P-value
The probability of getting our results (or better) just by chance.
i.e. the minimum significance level at which the null hypothesis can be rejected.
Type 1 error
Null hypothesis is true, but we reject it
Type 2 error
Null hypothesis is false, but we fail to reject it
Power
Probability of correctly rejecting the null hypothesis.
i.e. 1-P(type 2 error)
Confidence interval
Pair of numbers chosen so that the probability they will enclose the parameter is large (e.g. 0.95)
What to say if results are not statistically significant
The data does not provide enough evidence to conclude that the variables are related
Independent observations
Simple random sampling.
Cases are not linked.
Assumption and usage of Independent T-test
Random sampling, independently from 2 normal populations.
Possible different population means.
Same population variance.
Compares 2 means.
One or two tailed tests? How to draw directional conclusion?
Always two-tailed. Can draw directional conclusion based on estimates of the parameters.
Assumption and usage of two-sample t-test
If both samples are large, normality and equal variance does not matter much.
Observations are independent.
Random sampling of pairs.
Differences are normally distributed.
Compare difference of 2 explanatory variables.
Between cases
A case contributes exactly one explanatory and one response variable value.
Within cases
A case contributes several pairs (explanatory and response), usually one pair for each value of the explanatory variable.
ANOVA
Extension of independent t-test: More than two values of the explanatory variable.
Simple regression and correlation
One explanatory variable.
Random sampling of (X,Y) pairs.
Variance all equal.
Response variable is quantitative.
Explanatory variable usually quantitative.
r = 0 indicates no linear relationship (slope of least squares line is 0)
r^2 is proportion of variation explained.
Chi-squared test of independence
Both variables are categorical.
Large random sample.
The variable consisting of combinations of explanatory variable, response variable, has multinomial distribution.
Lowest expected frequency no more than 5.
Independent observations are important.
Confounding variable
A variable that contributes to both explanatory and response variable, causing a misleading relationship between them.
Observational study
Explanatory variable and response variable just observed and recorded
Experimental study
Cases randomly assigned to values of explanatory variable
How to write conclusion to pure observational studies
There is enough evidence to suggest that X is related to Y