W4: RQ for Predictions 1 Flashcards
Predictions: What is it, and what is the keyword? What does it not do?
Using knowledge about one/more constructs to indicate standing on another construct.
- Indicate / Account for / Explain (if used descriptively)
- Not cause
- Barometer indicates / account for /explain (if used descriptively), but it does not cause the weather
What is in a good RQ involving prediciton
- ) Statement ending with ?
- ) Include all relevant constructs
- ) Indicate all relevant population
- ) Use predict as “driving word” (key)
DV/IV in a prediction RQ. X/Y.
Does the meaning and focus of RQ change if X and Y swapped?
- DV: Y-Axis
- Being Predicted (Predicted)
- IV: X-Axis
- Doing the predicting (Predictor)
- Meaning and focus of RQ change depending on which variable is defined on the DV/IV
What is “Variation”. How is it measured
Variation
- Total amount of variability in a distribution of scores from the mean
- Sum of squared deviation scores; or
- Sum of squares
- Hence, gets larger as n increases
What is “Variance”. How is it measured and expressed
Variance
- Average Sum of Squares in a distribution of scores
- Expressed in a squared metric, relative to the scores on which it is calculated
- Hence, it is independent of sample size
What is “Standard Deviation”. How is it expressed
Standard Deviation
- Square Root of Variance
- Expressed in same metric as scores on which it is calculated
What is the geometrical interpretation of deviations
- SD
- Length
- Variance
- Area (It is a square)
- Variation
- Sum of all the Areas/Squares
- (That’s why it’s called sum of squares)
What is the key distinguishing features between correlation and regression
- Correlation
- Symmetric Relationship
- Regression
- Asymmetric Relationship
What is a Symmetric Relationship. In terms of correlation; IV/DV; scatterplot
Variables have the same role and function in the characteristic of scores being summarised.
- Cor (A,B) = Cor (B,A)
- No IV/DV
- Scatterplot: Variable on X/Y axis does not matter
What is a Asymmetric Relationship. In terms of correlation; IV/DV; scatterplot
Variables have the different role and function in the characteristic of scores being summarised.
- Cor (A,B) /=/ Cor (B,A)
- IV/DV declared a priori
- Scatterplot: Variable on X/Y axis fundementally important
What is the formula and conceptual formulation of a correlation
Correlation
- Sum of cross products of deviation z-scores / (n-1)
- rxy = E (Zxi x Zyi) / n - 1
- Standardised (z) measure of strength and direction of association
- (X and Y deviations in z will not be the same!)
- We can look at the size of correlation to determine the strength of the association directly (-1 to 1)
What is the conceptual formulation of a covariance
Covariance
- Sum of cross product of deviation scores / (n-1)
- Sxy = E [(Xi - μx ) x (Yi - μy)] / (n-1)
- Unstandardised (using deviation) measure of strength and direction of association
- (X and Y deviations will not be the same!)
- We cannot look at the size of covariance to determine the strength of the association directly (-∞ to +∞)
Can we calculate correlation from covariance, vice versa? In what circumstances can we do that
Yes.
But we have to know the standard deviations of variable x and y
- rxy = Sxy / sdx x sdy
- Sxy = rxy x sdx x sdy
What alternative name of line of best fit. What does it do?
Linear Regression Line
- Summarise relationship between 2 variables
Formula to calculate the slope of the regression line
bx = rxy x (SDy / SDx)
- Sample correlation x Respective standard deviations.
- Know value of sample correlation
- Know respective standard deviations
- Therefore, slope will be the same if SD is the same, which is not very possible.
Alternatively,
b = (Y2 - Y1 ) / (X2 - X1)
What is bx ? An SS or a PP?
- bx
- Sample statistic
- An estimate of the population parameter ρx
- Don’t forget, population parameter can never be “Calculated”
How can the slope value interpreted
- For any 1 unit increase on the X variable
- The value of Y variable increases ____ units
What is the full regression equation
- Yi = a + bXi + ei
- Yi
- Observed scores on DV
- Xi
- Observed scores on IV
- ei
- Residual scores
- Difference between observed and predicted scores on DV
- a
- Intercept
- Yi
What is the regression model equation
Y^i = a + bXi
- Y^i
- Predicted scores on DV
- Xi
- Observed scores on DV
- a
- Intercept
- b
- Regression coefficient
- Expected change in scores on DV for each unit change of IV
What is the SS in linear regression?
SStotal =
- SSreg
- Derived from SS value from Linear Regression Line to Mean
- I.e. variation explained by linear regression model
- SSres
- Derived from SS value from Linear Regression Line to observed scores
- i.e. variation not explained by linear regression model
In a regression equation, what does a^ and b^ aim to do. What method is this. Is it biased?
Ordinary least squares estimator
- Find values of a^ and b^ to minimise the sum of squared residuals
- Minimise the difference between obsered and predicted values of DV
- Therefore, Maximize strength of prediction
- OLS is unbiased :)
What is the difference between simple and multiple linear regression model
Simple
- One intercept and One regression coefficient
- Yi = a + bXi + ei
Multiple
- One intercept and p partial regression coefficients (where p >= 2)
- Yi = a + b1X1i + … + bpXpi + ei
What is the aim in research using linear regression
Use sample regression estimates to make an inference about corresponding unknown population parameter values
Are the coefficient value for each value in simple regression the same as the coefficient value in the multiple regression. Explain.
Different
- Multiple regression
- Correlation among IVs in their relationship to DV and with each other is partialled out/removed.
- i.e. if there is no correlations among IVs, simple regression = multiple regression
- Slope of each edge (partial regression coefficient) is an effect that indepedent of other IVs
- Correlation among IVs in their relationship to DV and with each other is partialled out/removed.
What is the interpretation of the intercept in a regression model
Predicted value on the DV when people have a zero on all independent variable in the model
What is the upper and lower bound in a 95% confidence interval
2.5% and 97.5%
How do we interpret a 95% confidence interval of 0.10 and 0.86 in a regression?
We can be 95% confident that the population coefficient value for the regression of ____ on ___ is between 0.10 and 0.86
What is an unbiased 95% interval estimator. What is it NOT
- Over large number of repeated samples drawn from the population, CIs calculated in each sample will contain the true population parameter value 95% of the time on average
- (i.e. actual converge rate will be 95% over the long run)
NOT 95% chance the population parameter will be captured in an interval
What if the interval estimator is biased
Actual converge rate will be smaller/larger than the nominal rate over the long run (e.g. 89%/98%)
What if the interval estimator is consistent
Actual converge rate will get increasingly closer to 95% over the long run as sample size increases.
(If it is 98% and goes to 99% as n increases, it is still not consistent!)