Module 10 Flashcards
What is the goal of a correlation test?
to evaluate whether there is an association between two numerical variables. asks whether one variable trends up (or down) as the other changes
What is a correlation test?
- the measure of association between two numerical variables.
- The correlation coefficient can take on values from ⍴=-1, which indicates perfect negative association, to ⍴=0 indicating no association, to ⍴=1 indicating a perfect positive association.
- no implied causation between the variables
- both variables are assumed to have variation (both have comparable amounts of variation among sampling units)
- not used for prediction- only used to evaluate the association between variables
What is association?
Is a pattern whereby one variable increases (or decreases) with a change in another variable. There is no implied causation between the varaibles.
How is the strength of association measured?
by pearsons correlation coefficient. the correlation coefficient can take on values between p=-1 to p=1.
What does a correlation coefficient of p=-1 mean?
indicates a perfect negative correlation
what does a correlation coefficient of p=0 mean?
indicates no association
what does a correlation coefficient of p=1 mean?
indicates a perfect positive correlation
What is the correlation coefficient?
the statistical test used to evaluate a sample coeffiecient against a null hypothesis
What are the assumptions behind a correlation test?
- each pair of numerical values is measured on the same sampling unit
- numerical values come from continuous numerical distributions with non zero variation
- if there is an association between the variables, it is a straight line
what is the bivariate normal distribution?
an extension of the normal distribution for two numerical variables that allows for an association between them. the countour lines are slices through the bivariate normal distribution.
What are the null and alternative hypothesis for the correlation coefficient/
null=correlation coefficient is zero
alternative=the correlation coefficient is not zero
What is the null distribution for a correlation test?
the sampling distribution of correlation coefficients from a statistical population where there is no association between the variables (ex. p=0)
What is the correlation test based on?
t-distribution
How do you conduct the hypothesis test for a correlation test?
- locate the critical t score that corresponds to the type 1 error rate on t-distribution
- compare that to the observed t score
- statistical decision is made either by comparing the observed and critical t score or by comparing the corresponding p value and type one error rate
if the observed score is greater than the critical score, then we reject the null hypothesis.
if the observed score is less than or equal to the critical score, we fail to reject the null hypothesis
What is the scientific conclusions for a correlation test for directional and non-directional?
non-directional:
* reject null hypothesis and conclude there is an association between the variables
* fail to reject the null and conclude there is no evidence of an association between the two numerical variables
directional:
* reject the null hypothesis and conclude there is evidence of a positive/negative association between the two numerical variables
* fail to reject the null hypothesis and conclude that there is no evidene of a pos/neg association between the two numerical variables
What is the linear regression test designed for?
to evaluate whether changes in one numerical variable can predict changes in a second numerical variable
What is the focus of linear regression?
prediction
one variable is designated as the predictor variable and the other one as the response variable
what is a key distinction for linear regression tests from correlation tests?
sampling error is only considered to onyl occur in the response variable and not in the predictor variable for linear regression tests
What are the predictor and response variables for a linear regression?
predictor variable
* often called the independent variable
* variable that was manipulated by the researcher
response variable
* the dependent variable
* the measured response following the manipulation
What is the linear equation for a linear regression test?
- linear regression assumes that the relationship between the numerical variables is described by a linear equation
- response variable: y
- predictor variable: x
- and the two parameters which are slope (b) and intercept (a)
what are the slope and intercept for the linear equation? also what is the linear equation?
y=a+bx
slope (b)
* the slope describes the relationship between the numerical variables
* it is the amount that the response variable (y) increases/decreases for every unit change in the predictor variable
* pos values rep an increasing relationship, zero no relationship, and negative decreasing relationship
intercept (a)
* the value of the resoibse variable (y) when the predictor variable (x) is at zero
* changing the intercept raises or lowers the line, but does not change the relationship between the variables
What are the three components of a statistical model for linear regression?
- systematic component: describes the mathematical function used for predictions. the linear equation for linear regression
- random component: describes the probability distribution for sampling error. for linear regression this is a normal distribution for the response variable
- link function: the link function connects the systematic component to the random component.
What does it mean to fit the statistical model to the data for linear regression development?
- fitting the model means to estimate the intercept and slope that best explains the data
- for linear regression this is done by minimizing the residual variance.
- a residual is the difference between the observed data point and the predicted value (r=Y-y)