module 10 Flashcards
Correlation tests are trying to determine if there is an ______ between variables
association
t or f: correlation can imply causation
false
what is the strength of association between two numerical variables measured by
pearson’s correlation coefficient, used to evaluate a sample correlation coefficient against a null hypothesis.
for the purposes of hypothesis testing, the population parameter is given the greek letter ____
⍴
The correlation coefficient can take on values anywhere from ⍴=____ to ⍴=_____
⍴=-1 to ⍴=1
p=-1 means a perfect ______, a value of ⍴=0 means ____, and a value ⍴=1 means a perfect _____
negative correlation, no association, positive correlation
Bivariate normal distribution
extension of the normal distribution for two numerical variables that allows for an association between them
what are the steps for conducting a correlation test
- define null and alt hypotheses
- establish null distribution
- conduct stat test
- draw scientific conclusion
the null and alt hypotheses for correlation test if its directional
Ho: ⍴=____
Ha: ⍴≠____
HO: ⍴=0
HA: ⍴≠0
the null and alt hypotheses for a strong or weak correlation test
Ho: ⍴____0 (or ⍴____0)
Ha: ⍴____0 (or ⍴___0)
Ho: ⍴≤0 (or ⍴≥0)
Ha: ⍴>0 (or ⍴<0)
What is the null distribution for a correlation test?
- t-distribution
- therefore, correlation tests are special cases of single sample t tests
______ the null hypothesis if the observed score is greater than the critical score (i.e., tO>tC) or if the p-value is smaller than the Type I error rate (i.e., p<⍺).
reject
______ the null hypothesis if the observed score is less than or equal to the critical score (i.e., tO≤tC) or if the p-value is larger or equal to the Type I error rate (i.e., p≥⍺).
fail to reject
For a correlation test, the scientific conclusion depends on _______ in the hypotheses.
directionality
t or f
For non-directional hypotheses, the conclusion is either:
Reject the null hypothesis and conclude that there is evidence of an association between the two numerical variables.
Fail to reject the null hypothesis and conclude that there is no evidence of an association between the two numerical variables.
true
For directional hypotheses, the conclusion is either….
For directional hypotheses, the conclusion is either:
Reject the null hypothesis and conclude that there is evidence of a positive (or negative) association between the two numerical variables.
Fail to reject the null hypothesis and conclude that there is no evidence of a positive (or negative) association between the two numerical variables.
what does r refer to in relation to correlational tests
the statistical test used to evaluate a sample correlation coefficient against a null hypothesis.
linear regression test
determines if changes in one variable can predict changes in another variable
in linear regression, one variable is the ____ and the other is the _____
predictor (independent), response (dependent)
Sampling error is considered to only occur in the _____ variable and not in the ______ variable
response, predictor
the _____ variable is always the variable you want to make predictions about
response
Linear regression assumes that the relationship between the numerical variables is described by ______
- y=a+bx
- b=slope
- a=intercept
3 components to the statistical model
- systematic component (makes predictions, the equation)
- random component (probability distribution for sampling error, normal distribution for the response variable)
- link function (connects systematic component to the random component)
what do you need to estimate in order to fit the statistical model to data
the intercept and slope that best explains the data (a and b in the equation)
residual difference
- (r)
- difference between observed data and predicted value
- how far apart your actual data point and the line of the slope are
residual variance
- average squared residual value across all data points
- aka sum of squares (SSQ)
how is residual variance calculated
- Calculate the residual for each data point
- Take the square of each residual
- Sum the squared residuals across all data points
- Divide by the degrees of freedom, which are df=n-2
n
number of sampling units
how to minimize the sums of squares/residual variance
vary the slope and intercept parameters
systematic component
mathematical relationship that connects the predictor variable to the response variable
how many hypotheses can be tested for linear regression
two, one for each parameter (intercept and slope)
varying the intercept changes the _____ and varying the slope changes the ____
height of the line, relationship between variables
directional null and alt hypotheses for intercept
HO: a≤βa
HA: a>βa
non directional null and alt hypotheses for intercept
HO: Intercept is not different from a reference value, or a=βa in symbols
HA: Intercept is different from a reference value, or a≠βa in symbols
directional null and alt hypotheses for slope
HO: b≤βb
HA: b>βb
non directional null and alt hypotheses for slope
HO: Slope is not different from a reference value, or b=βb in symbols
HA: Slope is different from a reference value, or b≠βb in symbols
four assumptions to linear regression
- linearity (response can be described by a linear combo of the predictor variable)
- independence (predictor variable should be independent of each other)
- normality (normally distributed, shapiro wilks test)
- homoscedasticity (should b similar across range of predictor variable)
what is the shapiro wilks test used for
- evaluate the assumption of normality quantitatively, if the residuals are normally distributed or not
t or f: There is no 1-to-1 map between violations of assumptions and trustworthiness of the statistical conclusions.
true