Midterm Flashcards
If p-value is less than alpha
we reject H0 and claim significance
If p-value is greater than 0.05
we fail to reject H0 and there’s not enough evidence to claim significance
The null hypothesis (H0)
there is no significant difference between specified populations of a sample
Nominal data
gender (male, female); hair color; ethnicity
Ordinal data
first, second, third; letter grades; economic status
Discrete data
number of students in class; number of workers in a company; number of home runs in a baseball game
Continuous data
height of children; square footage of house; speed of cars
Pearson’s correlation coefficient
measures the degree to which two outcomes are linearly related
sample statistic = r
population parameter = p
r > 0
variables are positively correlated
r < 0
variables are negatively correlated
r = 0
variables are uncorrelated
How to measure correlation
significance, direction, and effect size (strength)
Spearman’s rank correlation
non-parametric alternative to Pearson’s correlation
uses ranks, can be applied to ordinal variables, not sensitive to outliers
Parametric data
large n > 30 and uses normal data
Nonparametric data
small n < 30 and uses ordinal data
Simple linear regression
explores the nature of the association between two variables
Independent variable
the cause; they appear on the right side of a regression equation and on the x-axis of a scatter plot
also called: explanatory variable, predictor variable
Dependent variable
the effect; they appear on the left side of a regression equation and on the y-axis of a scatter plot
also called: response variable, outcome variable
Coefficient of determination
represented by r2 and is the square of the Pearson’s correlation coefficient
Residual
distance from an observation point to the regression line
Goodness of fit (r2)
r2 = 1: all variation in y can be explained by variation of x
r2 = 0: x gives no information about y
Assumptions of regression
normality, linearity, homoscedasticity, independence
Normality
the distribution of the y value is normal
Linearity
the relationship between y and x can be described by a straight line
Homoscedasticity
the variability of y does not change across all values of x
Independence
the y’s are independent