Module 10 Flashcards

1
Q

What is the goal of a correlation test?

A

to evaluate whether there is an association between two numerical variables. asks whether one variable trends up (or down) as the other changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a correlation test?

A
  • the measure of association between two numerical variables.
  • The correlation coefficient can take on values from ⍴=-1, which indicates perfect negative association, to ⍴=0 indicating no association, to ⍴=1 indicating a perfect positive association.
  • no implied causation between the variables
  • both variables are assumed to have variation (both have comparable amounts of variation among sampling units)
  • not used for prediction- only used to evaluate the association between variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is association?

A

Is a pattern whereby one variable increases (or decreases) with a change in another variable. There is no implied causation between the varaibles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the strength of association measured?

A

by pearsons correlation coefficient. the correlation coefficient can take on values between p=-1 to p=1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a correlation coefficient of p=-1 mean?

A

indicates a perfect negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does a correlation coefficient of p=0 mean?

A

indicates no association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does a correlation coefficient of p=1 mean?

A

indicates a perfect positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the correlation coefficient?

A

the statistical test used to evaluate a sample coeffiecient against a null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the assumptions behind a correlation test?

A
  • each pair of numerical values is measured on the same sampling unit
  • numerical values come from continuous numerical distributions with non zero variation
  • if there is an association between the variables, it is a straight line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the bivariate normal distribution?

A

an extension of the normal distribution for two numerical variables that allows for an association between them. the countour lines are slices through the bivariate normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the null and alternative hypothesis for the correlation coefficient/

A

null=correlation coefficient is zero
alternative=the correlation coefficient is not zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the null distribution for a correlation test?

A

the sampling distribution of correlation coefficients from a statistical population where there is no association between the variables (ex. p=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the correlation test based on?

A

t-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you conduct the hypothesis test for a correlation test?

A
  • locate the critical t score that corresponds to the type 1 error rate on t-distribution
  • compare that to the observed t score
  • statistical decision is made either by comparing the observed and critical t score or by comparing the corresponding p value and type one error rate

if the observed score is greater than the critical score, then we reject the null hypothesis.

if the observed score is less than or equal to the critical score, we fail to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the scientific conclusions for a correlation test for directional and non-directional?

A

non-directional:
* reject null hypothesis and conclude there is an association between the variables
* fail to reject the null and conclude there is no evidence of an association between the two numerical variables

directional:
* reject the null hypothesis and conclude there is evidence of a positive/negative association between the two numerical variables
* fail to reject the null hypothesis and conclude that there is no evidene of a pos/neg association between the two numerical variables

13
Q

What is the linear regression test designed for?

A

to evaluate whether changes in one numerical variable can predict changes in a second numerical variable

14
Q

What is the focus of linear regression?

A

prediction
one variable is designated as the predictor variable and the other one as the response variable

15
Q

what is a key distinction for linear regression tests from correlation tests?

A

sampling error is only considered to onyl occur in the response variable and not in the predictor variable for linear regression tests

16
Q

What are the predictor and response variables for a linear regression?

A

predictor variable
* often called the independent variable
* variable that was manipulated by the researcher

response variable
* the dependent variable
* the measured response following the manipulation

17
Q

What is the linear equation for a linear regression test?

A
  • linear regression assumes that the relationship between the numerical variables is described by a linear equation
  • response variable: y
  • predictor variable: x
  • and the two parameters which are slope (b) and intercept (a)
18
Q

what are the slope and intercept for the linear equation? also what is the linear equation?

A

y=a+bx

slope (b)
* the slope describes the relationship between the numerical variables
* it is the amount that the response variable (y) increases/decreases for every unit change in the predictor variable
* pos values rep an increasing relationship, zero no relationship, and negative decreasing relationship

intercept (a)
* the value of the resoibse variable (y) when the predictor variable (x) is at zero
* changing the intercept raises or lowers the line, but does not change the relationship between the variables

19
Q

What are the three components of a statistical model for linear regression?

A
  1. systematic component: describes the mathematical function used for predictions. the linear equation for linear regression
  2. random component: describes the probability distribution for sampling error. for linear regression this is a normal distribution for the response variable
  3. link function: the link function connects the systematic component to the random component.
20
Q

What does it mean to fit the statistical model to the data for linear regression development?

A
  • fitting the model means to estimate the intercept and slope that best explains the data
  • for linear regression this is done by minimizing the residual variance.
  • a residual is the difference between the observed data point and the predicted value (r=Y-y)
21
Q

What is the residual variance and how is it calculated?

A

residual variance is the average squared value accross all data points (sums of squares)

  1. calculate the residual for each data point
  2. take the square of each residual
  3. sum the squared residuals across all data points
  4. divide by the degrees of freedom, which are df=n-3
22
Q

what are the four steps for conducting a hypothesis test?

A
  1. define the null and alternative hypotheses
  2. establish the null distribution
  3. conduct the statistical test
  4. draw scientific conclusions
23
Q

What is the intercept in regards to the predicted value for hypothesis test? what are the null and alt. hypothesis?

A

it is the predicted value of the response variable when the predictor variable is zero

  • null=intercept is not difference from reference value
  • alt=intercept is different from reference value
24
Q

Where does the null distribution for the slope and intercept come from?

A
  • come from repeatedly sampling an imaginary statistical population where the null hypothesis is true for each
  • the null distributions for a linear regression are t distributtions
25
Q

What are the four main assumptions for a linear regression?

A
  • linearity: the response variable should well described by a linear combination of the predictor variable
  • independence: the residuals along the predictor variable should be independent of each other
  • normality: the residual variation should be normally distributed
  • homoscedasitcity: the residual variation should be similar across the range of the predictor variable
26
Q

How can linearity be evaluated?

A

by looking at a plot of residuals against the predictor variable
* if it is well described by a straight line, then the residuals from a simple linear regression will not have any overall trend to them
* if not well described by a straight line, then the residuals would not have a trend
* violations of linearity often looking like a smiling of frowning face

27
Q

What is independence?

A
  • the assumption that the residuals are independent of each other across the predictor variable
  • violations of independence can occur when there is repeated sampling of the same sampling unit or when there is a spatial or temporal relationship among the sampling units
  • one way to prevent violations is to ensure that sampling units are selected at random and indepdently of each other
  • violations of independence can be evaluated qualitatively by looking at a plot of residuals against the predicotr variable
28
Q

How does it look when residuals are dependent or independent?

A

independent: residuals that are close together will vary between positive and negative numbers seemingly at random

not independent: there will be adjacent runs of positive and then negative residual values

29
Q

What is normality?

A
  • assumption that the residuals are normally distributed
  • it is about the residuals, not the data
30
Q

How can violations of normality look like?

A
  • can happen if the statistical poopulation has a skewed or unusual distribution (since the residuals may not be normally distributed)
  • if your data violate the assumption of linearity above, it can appear as violations
  • can be qualitatively evaluated by looking at a histogram of the residuals with a normal distribution overlaid on top
  • if the assumption of normality is met, then the histogram of residuals will look similar to the reference normal distributions
31
Q

What is the shapiro wilks test?

A
  • a statistical test that evaluates the null hypothesis that the residuals are normally distributed
  • less subjective than a qualitative exploration of the histograms
32
Q

What are the steps for conducting a shapiro wilks test?

A
  • null=residuals normally distributed
  • alt=residuals are not normally distributed
33
Q

What is homoscedasicity?

A
  • the assumption that the residuals have the same variance across the predictor variable
  • if the residuals have little variation along some parts of the predictor variable and large amounts at others, then the data is said to be herero.
  • can be caused by the residuals not being well described by a normal distribution