W4: RQ for Predictions 1 Flashcards
What is a prediction
Using knowledge about one/more constructs to indicate people’s standing on another construct.
Used in the sense of indication (not explanation or cause)
e.g. a barometer predicts, but does not explain/cause the weather
What is in a good RQ involving prediciton
- ) Statement ending with ?
- ) Include all relevant constructs
- ) Indicate all relevant population
- ) Use predict as “driving word” (key)
DV/IV in a prediction RQ. X/Y. Does the meaning and focus of RQ change if X and Y swapped?
DV: Being Predicted - Y Variable
IV: Predictor - X Variable.
it changes depending on which variable is defined on the DV/IV.
What is “Variation”. How is it measured
Variation:
Total amount of variability in a distribution of scores from the mean.
Measured:
Sum of squared deviation scores (or Sum of Squares). Gets larger as n increases
What is “Variance”. How is it measured
Variance:
AVERAGE Sum of Squares in a distribution of scores (both population and samples)
Measured:
Expressed in a squared metric, relative to the scores on which it is calculated
What is “Standard Deviation”. How is it measured
Standard Deviation:
Square Root of Variance (both population and samples)
Measured:
Expressed in same metric as scores on which it is calculated
What is the key distinguishing features between correlation and regression
Correlation: Symmetric Relationship
Regression: Asymmetric Relationship
What is a Symmetric Relationship. In terms of correlation; IV/DV; scatterplot
Variables have the SAME role and function in the characteristic of scores being summarised.
Cor (A,B) = Cor (B,A)
No IV/DV
Scatterplot: Variable on X/Y axis does not matter
What is a Asymmetric Relationship. In terms of correlation; IV/DV; scatterplot
Variables have the DIFF role and function in the characteristic of scores being summarised.
Cor (A,B) /=/ Cor (B,A)
IV/DV declared priori
Scatterplot: Variable on X/Y axis fundementally important
What is the conceptual formulation of a correlation
Sum of all the cross product of z-scores / df
> STANDARDIZED (conversion to z) measure of strength and direction of association
What is the conceptual formulation of a covariance
Sum of all the cross product of deviation scores/ df
> UNSTANDARDIZED (using deviation) measure of strength and direction of association
Can we calculate correlation from covariance, vice versa?
Yes
What is the line of best fit
Linear Regression Line
What is the slope value of a regression line: Formulation
Correlation (xy) x (SDy / SDx)
What is bx (slope value of a regression line) and an estimate of
It is a sample statistic and also an estimate of the corresponding population parameter
How can the slope value be interpreted as
For any 1 unit increase on the X variable, the value of Y variable increases (b) units
What is the full regression equation
Yi = a + bXi + ei
Yi = observed scores on DV Xi = observed scores on IV ei = residual scores (difference between observed and predicted scores ON DV) a = intercept
What is the regression model equation
Y^i = a + bXi
Y^i = predicted scores on DV Xi = observed scores on DV a = intercept b = regression coefficient (expected change in scores on DV for each unit change of IV)
In a regression equation, what does a^ and b^ aim to do.
By using Ordinary Least Squares Estimator (OLS)
a^ and b^ aims to MINIMIZE the sum of squared residuals (i.e. Max strength of prediction)
What is the difference between simple and multiple linear regression model
Simple:
- One intercept + One regression coefficient
Yi = a + bXi + ei
Multiple:
- One intercept + p partial independent variables (where p >= 2)
Yi = a + b1X1i + … + bpXpi + ei
What is the aim in research using linear regression
Use sample regression estimates to make an inference about corresponding unknown population parameter values
In R Studio, in a linear regression. Where the IV/DV and what are the properties
DV is always on the left. Must be numeric
IV is always on the right. Either numeric/factor
Are the coefficient value for each value in simple regression the same as the coefficient value in the multiple regression. Explain.
Different.
In a multiple regression, the correlation AMONG IVs in their relationship to DV is partialled out/removed.
> Slope of each edge is an effect that is INDEPENDENT of the other DV
What is the interpretation of the intercept in a regression model
Predicted value on the DV when people have a zero on all independent variable in the model
What is the upper and lower bound in a 95% confidence interval
2.5% and 97.5%
How do we interpret a 95% confidence interval of 0.10 and 0.86
We can be 95% confidence that the population coefficient value for the regression of ____ on ___ is between 0.10 and 0.86
What is an unbiased 95% confidence interval
Over a large number of repeated samples drawn from the population, the confidence interval calculated in each sample will contain the true population parameter value 95% of the time on average
i.e. actual converge rate will be 95% over the long run
What if the interval estimator is biased
Actual converge rate will be smaller/larger than the nominal rate OVER THE LONG RUN (e.g. 89%/98%)
What if the interval estimator is consistent
Actual converge rate will get INCREASINGLY closer to 95% over the long run as sample size increases