Prediction Flashcards
What is statistical prediction
Using scores on one variable to predict them on another variable
X = independent or predictor variable
Y = dependant or criterion variable
Regression
A statistical method used to model and predict the relationship between a dependent variable and one or more independent variables
Simple regression: one X and Y
E.g. Y = global temperature X = CO2
Multiple regression: more than one X
E.g. Y = Global temperature X1 = CO2, X2 = Total deforestation, X3 = Carbon offsets
What is the regression equation
- Method for predicting Y and X using the relationship info
- Have to define a regression equation/prediction equation
Simple regression: equation describes a straight line best fitting the data points
Straight line: Y = a + bX
- Y (Dependent Variable): variable being predicted or explained.
- X (Independent Variable): variable that is used to predict or explain the value of Y.
- a (Y-intercept): value of Y when X is equal to zero. starting point of the line on the Y-axis.
b (Slope): the rate of change of Y for every one-unit change in X.
what is the Standardise score for Y prime
the correlation between sample xy times by the standardised score
What is r^2
R^2 is how well we are making predictions
Used in understanding how well our model is in making predictions about y, important to know what is the proportion of the variance in y that we are accounting for in the model
- Then know how much you aren’t accounting for = how much variance there is in y that is beyond the powers of the prediction
If that number is larger than the proportion of variance that we can predict = question if the model is useful
*the proportion of variance of Y accounted for by the model
What does r mean
R: Strength of the relationship between the predictors with Y
- When there is only one predictor the strength of the relationship is equal to the relationship between x and y
What does it mean by line of best fit
Defined an equation that gives a line that when drawn through the data points, that sum of squared residuals (errors) are the smallest value it could be
when r is large
Y values will cluster closer to Y(prime)
larger proportion of the STD of Y is accounted by prediction
when r is small
Y values will vary more from Y(prime)
smaller proportion of the STD of Y is accounted by prediction
Assumptions of linear regressions
- both x and y are normally distributed
- Y is what you expect to be on average what X is (Mean of the distribution of Y is reflected in Y prime)
- Linear relationship
- Homoscesidacity: Variance of distributions of Y scores for each X score is the same (should have the same STD regardless of the X value)
what is the standard error of the estimate used for
needed for finding how often it could occur (STD of the distribution)
○ standard deviation of the distribution of observed scores around the corresponding predicted score
○ measures predictive error (how dispersed above or belove the predictive line your values are)
StandardisedResidual
The difference between what actually happened and what your model predicted. “How weird is this residual compared to others?”
What is a Decision-wise error rate
The probability of making a Type I error
If you set your alpha = 0.05, then there’s a 5% chance that you’ll reject Ho when it’s actually true — for each individual test.
What is the collective error rate
the probability of at least on test (e.. 1 even if there are 36) of rejecting the null even if its true
1-(1-alpha)^number of labs
Types of replication
Direct Replication
Conceptual Replication
Direct Replication
Try to repeat the study exactly, same method, same analysis.
- Confirmatory study!
May differ slightly (e.g., different sample sizes, locations).
Helps confirm if the original result was just a fluke.
Conceptual Replication
Test the same idea in different ways (e.g., new measures, new tasks).
- exploratory study!
Checks if the effect generalises to different contexts.
BUT: You now run into multiple comparisons — more tests = higher chance of false positives again