RS4 Flashcards
What is the basic comparative design?
Comparing the scores of one group with another.
Usually the mean
What is the basic correlational exam?
Researcher is measuring two or more different variables at the same time in a single group of cases
See if there is correlation between those variables
What is a correlation coefficient?
Standard statistical index measuring (i) the direction and (ii) the strength of a relationship between two variables
Ranges from -1 to +1
It is an effect size itself:
.1 is small
.3 is medium
.5 is large
What is the coefficient of determination?
r2 - r squared.
The proportion of variance in one variable shared by the other
An r of .6 indicates how much variance is shared?
36%
.6x.6 x 100
What is covariance? what is the standard measure of covariance?
The relationship between how much scores on two variables deviate from their respective means. So if X and Y both deviate similarly then the covariance is positive, if they do not similarly deviate then the covariance will be negative.
The correlation coefficient is the standard measure. Makes it unitless
What are the four types of correlation coefficients?
Spearman’s rho
Kendall’s Tau
Phi-coefficient
Biserial point correlation
When would you use spearman’s rho and Kendall’s Tau-b?
Smaller data set, with a large number of tied ranks (the same value)
When would you use point biserial correlation?
When one variable is dichotomous (categorical with 2 categories i.e. dead or alive), NOT when there is an underlying continuum (pass/fail)
and the other variable is continuous
When would you use the phi correlation (2x2)
Correlating between two categorical (nominal) but dichotomous variables
What does reliability refer to in terms of a test? Some ways it is measured…
The consistency over time:
- Test-retest
Consistency of the items within the measure:
- Split half
- Cronbach’s alpha
What does validity refer to in terms of a test?
The extent a measure measures the underlying construct
Criterion related validity:
- Predictive and concurrent validity
Minimum correlation of test-retest reliability expected?
Minimum correlation of 0.6 expected
Problems with split half reliability?
Only estimates reliability of half
Not obvious which way to split the test
What reliability coefficients mean it is how reliable?
<0.6 is suspect
.6-7 satisfactory
> .8 is excellent
Methods of estimating criterion related validity?
Predictive validity
- See how well your test predicts some later obtained criterion scores
Concurrent validity
- See how well your test scores concurrently predict obtained criterion scores
What is linear regression?
Technique used to predict an outcome score. Examines the amount of variance that can be explained by one predictor (simple) or more than one (multiple).
In regression what are the dependent and independent variables called?
Dependent is an outcome
Independent is a predictor
What equation is used in regression?
The equation of a straight line:
Y = b0 + b1X
Y = outcome variable or expected value of y given a value of x X = predictor variable b0 = intercept (value of Y when X is 0) b1 = regression coefficient (gradient - strength/direction of the relationship)
How do we test the significance of the predictive model in simple regression?
F ratio test and R2.
F ratio expressed as the Mean Squares
General aim of the least squares method for simple regression?
Goal is to minimise the sum of the squared differences (error) between the observed value of the dependent (outcome) variable and the predicted value (provided by regression line)
Sum of the squared residuals.
In this way it is trying to get the best fit possible.
Does the intercept have to make ‘sense’
No - the intercept may or may not actually make sense in real life
How does the simple linear regression equation change when you have to estimate population parameters?
The y becomes yhat and the beta’s become b’s
ŷ = mean value of y for a given value for x
What are residuals in regression?
The distance from the best fit line also called the error.
Always add up to 0
Why do you square residuals
- makes them all positive
2. Emphasises the larger deviations - exaggerates them
What is the sum of squared residuals?
The squared residuals (errors) all added together.
How do we determine how good the linear regression model is?
We are comparing it to the line without the predictor variable (the straight line) - if it is good it should reduce the distance or the SSE (sum of squared errors)
Is the expected value of y an exact value?
No it’s actually just the mean of a distribution
What is the least squares criterion?
Min Σ (yi ŷi)2
yi = actual value
ŷi = predicted
Minimum of the sum of the squared residuals (the y’s)
This formula is used to calculate the SST, SSR and SSE
What is the centroid?
The point of the mean x value and mean y value, the line of best fit must go through this
Interpreting the regression equation ŷi = 0.1462x - 0.8188?
ŷi = 0.1462x - 0.8188
b1 = 0.1462(1) (gradient)
b0 = -0.8188
SO for every £1 the bill amount (x) increases the tip (ŷi) increases by 0.1462- this part makes sense
if the bill amount is zero (x) then the expected tip is -0.8188: this obviously doesn’t make sense, but it doesn’t have to.
What is the SSR in simple linear regression model?
It’s the Sum of squares for residual.
The SST (total) is the SSE (error) in the straight line with maximum error
In our model we want less error. The SSR (regression sometimes called model) is this SST - SSE. The better the line fits the data the smaller the SSE will be and the bigger the SSR (model) will be. We are comparing to our bad model when SST = SSE
In the simple linear regression equation what is the SSE?
The difference between the predicted value of y (from the model) and the actual value at each point. The predicted value has been calculated by plugging in the x values to the simple linear regression equation.
What is the coefficient of determination in regression?
Quantifies the ratio between SSR (model) and SST (total).
Shows us how well the model fits
It is an r2 value
Do SSR/SST, can times the r2 by 100 to get a percentage
What is multicollinearity in multiple regression?
The fact that independent (predictor variables) variables can also be related to EACH other as well as the outcome (dependent variable)
This is bad if they are because it can be harder to discern what factor is really affecting the dependent variable, If this happens they are redundant and you’d take out related variables
Disadvantages to having lots of predictor variables in multiple regression?
Means that there are many more relationships to consider, have to account for each predictors relationship to the outcome as well as to each other.
What is the (estimated) multiple regression equation?
ŷ = b0 + b1x1 + b2x2 + … (however many predictors)
Each coefficient in multiple regression should be interpreted as what, for example the b1 in b1x1?
Each coefficient is the estimated change in y (outcome) corresponding to a 1 unit increase in that predictor, when all other variables are held constant.
What is the adjusted R squared?
The R squared adjusted for the number of independent variables (lower than R squared)
How can you use Standard Error (SE) in multiple regression?
Tells us how wide the ‘band’ is around the regression line - in the independent (predictor variables units)
How high can the VIF and tolerance be to indicate the variables are not multicollinearity?
VIF should be below 10
Tolerance should be higher than 0.2
How many participants should there be in regression analysis?
Liberal: 15 ptps per predictor
Conservative: Tabachnick and Fidell (1996) suggest 50 + 8 x(number of IVs)
Stepwise suggests 40 per IV