Research Methods D Flashcards
1) Weak, 2) moderate and 3) strong bivariate correlations
1) .1-.3
2) .3-.6
3) .7-.9
(sign doesn’t influence strength)
When to use Pearson’s r and Spearman’s rho
Pearson’s when parametric, Spearman’s when non-parametric
Complications in correlations (5)
Small samples (under 10) unreliable Non-normal distributions Outliers (are they omitted?) Non-linear relationships Hetrogenous samples
Example of hetrogenous sample
If r=.5, but for men and women in sample r=.12 and r=.14 respectively…
separate correlations produce much weaker scores but together form a moderate correlation
How correlation explains the variance
Can be used to see how much the variation of scores in the data are explained by the study… showing the overlap on a ven-diagram. Use R^2 for this
If r=.7, how much of the variance is explained?
.7 x .7 = .47, so R^2 = .49, so 49% of variance is explained
How to make simple correlations more inline with reality’s complexity
Partial out (control for) other theoretically driven causal or confounding variables when analysing
How to decide when to use 1) more or 2) fewer questions in a questionnaire
Use more when dealing with: complex concepts, attitudes/beliefs or psychometric factors
Use less when: only a few dimensions, concepts are well defined and for attributes / behaviours
key principles of questionnaires (3)
test-retest reliability
Addresses intended concept (validity)
Can be meaningfully, quantitatively, analysed
How to solve problem of acquiescent or socially desirable respondents of questionnaires (3)
Invert some questions (back-to-front coding)
Include contradictory statements to see if they answer the same
Include dummy / masker questions to make the topic of questionnaire subtler (social desirability reduced)
How to do questionnaire data entry (2)
Each participant gets a row, each item getting a column
Data must be typed in raw (no altercations)
Why negative items must be reverse coded
So all items point the same direction, the top score conceptually meaning the same thing
Reliability
Extent that the measure is stable / consistent, and produces similar results when administered repeatedly
How to test questionnaire reliability (3)
Test-retest
Split-half - giving half of questions to one group and half to the other
Item analysis (the best) - sorts the useful and non useful questions (tests internal consistency)
Describe Cronbach’s alpha in item analysis
If items on questionnaire fit together coherently, the Cronbach’s alpha will be closer to 1. If = 1, all items will have been answered the exact same
Describe correlations in item analysis
If an item makes a useful contribution to a questionnaire, its score will correlate with the questionnaire total.
If it does not, this reduces the alpha so may want to be removed
How to conduct an item analysis with the correlations and Cronbach’s alpha
Use the item total statistic (how much it correlates to questionnaire score) to decide which item to recode or delete, repeat an item analysis after each change until Cronbach’s alpha is: .7 < alpha < .8…. but preferably closer to .7.
Start by recoding the most negative items and then deleting the smallest correlations - also refer to ‘alpha if deleted’ column
(NEVER recode an item twice)
What the item total statistic can tell you in item analysis if negative (and moderate-strong)
It is measuring the conceptual opposite of what was intended
What the item total statistic can tell you in item analysis if low
The item does not differentiate between people, everybody giving the same answer… question could display too extreme a view, or too common a belief
What the item total statistic can tell you in item analysis if low and the alpha increases if deleted
Question does not measure the intended thing, it lacks relevance… answers look random on graph comparing to overall questionnaire score
Known groups validity and how to test
Differing scores found for groups already known to differ
Test with a t-test
Concurrent validity
New scale compares to the established ‘gold standard’ measure (already reliably tested) - about its predictive power against other questionnaire
Construct validity
Appears consistent with theories of construct the questionnaire is interested in
Content validity
If all aspects of the content appear reflected (and proportionally reflected) in the questionnaire
Criterion validity and e.g.
Results are consistent with other measures, matching theory. e.g. IQ tests designed to correlate with child’s age
Face vailidity
If experts, participants (etc) agree that the construct is being accurately measured
Relationship between reliability and validity
Without reliability, there can be no validity… if results do not show a consistent pattern then the concepts could not have been measured
Describe factor analysis conceptually (3)
Correlates all items on the questionnaire in all possible combinations
Can then see what item’s correlations cluster together, these have something in common… meaning they explain the same part of the variance
Qualitatively decide on a label to give these clusters based on features of the questions
Why having factors in a questionnaire is useful (2) and not useful (1)
Makes the overall topic more subtle
Gives a greater understanding of scores
Not useful as makes the analysis more complicated
Steps of factor analysis from the output
Use scree plot to decide how many factors are needed
Construct the basic factors
Rotate factors, so they make more ‘sense’
Label the factors (ideally unrelated to each other)
Factor loadings
How much each item relates to factor, if it correlates to it then it relates to it
Scree plot
Graphical representation of effectiveness of each factor, showing how much variance is explained. Generally, should use if greater than 1.
What is the factor loading threshold for an item belonging to a factor
Is arbitrary but around .4 - .5
What to do if item belongs to more than one factor
Make a judgement which to include it in depending on size of factor loading, and then depending on qualitative relatedness to each factor (can belong to both factors!)
Rotation in factor analysis
Graphically, turn the factors x and y axis together, keeping perpendicular… until scores are nothing for one factor and more so for the other (increasing simplicity)
If done for all factors… are proportionally the same still
What to do with negative factor loadings
They are still members of the factor as if they were positive.
What to do with upside-down factors
If majority / all of its factor loadings are negative, be aware of what this is saying when creating a label
How to get the best factor analysis
Re-do it with differing numbers of factors depending on scree plot (especially when borderline close to 1), until satisfied with the labels and loadings etc
How to label factors (2)
Use the size and direction of loadings to determine label, they show significance of each item
Look at meaning of items together, label going beyond just one item
Steps to creating a questionnaire (4)
Item pool - defining concept and creating questions
Pilot testing - sees if it asks what we intend
Reliability check - test-retest / item analysis
Validity check - e.g. discriminative
Purpose of factor analysis (2)
Informs about the underlying structure of the construct being measured
Shows how participants conceptualise items
Benefit of a 4-point likert scale
Forces choice to either agree or disagree
Convergent validity
Correlation of results with an existing questionnaire
Incremental validity
How the questionnaire is distinguished from other measures
Why we use correlations (4)
To show reliability (e.g. test-retest)
Predict the outcome of one variable on another
To show validity
Theoretical verification
What R^2 is diagrammatically equivalent to
the overlap on a ven diagram
What does the ‘variance explained’ tell us
R^2 shows how accurate the model is based on its predictive power
How correlation can be closer to being causal evidence
If supported by theoretical or observational inputs that can explain how there are no / little other variables to effect the association
What is the criterion
The dependent variable, what we predict (y-axis)
What is the predictor
The independent variable, what we are predicting from (x-axis)
What non-linear regression looks like
Curved line on the graph
Equation of a linear regression line, and what each bit means
y = b0 + b1(x) … + bn(x) + error
b0 is the intercept at y axis
b1 is the slope of the line (deciding the steepness)
error is the extent of residuals from the line
Why is the equation of a linear regression line useful
Can be used to predict y when x is know, or deciding what x should be… (or vise-versa, when rearranging)
How to see extent of a residual using equation of a linear regression
By inputting a subjects x into equation, giving y value on the regression line… how far this is from subjects actual y value is the residual
What does a regression line do in terms of residuals / error
Tries to make them as small as possible overall, being the line of best fit… making the sum of squared errors as small as possible (squared to cancel direction of error)
On SPSS what to do before carrying out linear regression (2)
Check normal distribution in the explore function, then create a scatter plot on chart builder
What does linear regression assume (4)
A linear relationship, other functions are for non linear
Is homoscedastic, y being normal (the same) at all values of x (CANT be hetroscedastic)
Y’s spread is the same at all values of x
Criterion is normally distributed, not needed for predictor
Different types of outlier (2)
Leverage - distance from the mean of the plot
Error - distance from the regression line
Which type of outlier effects the strength of the model more
The error, can still have a strong model if have a large leverage but small error… not the case if a large error and small leverage
What the linear regression equation graphically represents if there are two predictors
A flat plane (surface), but a residual plot is a better visualisation
Why can there never be more than one b0
It represents the intersection with the y axis, and there is only ever one y axis
Why does the sum of each predictors R^2 (x100) almost never equate to the total variance explained of the predictors
Because predictors almost always overlap in the variance they explain, they correlate with one another (called co-linearity)
What happens if co-linearity between predictors is too high
Some variable may be dropped from the analysis as they do not predict any unique variance… it is explained fully by the other predicting variables
Why is a ven-diagram good for multiple linear regression
Can visually see how much variance each predictor uniquely explains and how much they correlate together
Part correlation
Amount of unique variance explained by the predictor, as a proportion of the total variance in the criterion (whole circle of DV in ven-diagram)
Partial correlation
Unique variance explained by predictor (once the other relationships have been ‘partialled out’ (controlled for)), as a proportion of the variance of the criterion that is not explained by anything (not overlapping in ven-diagram)
When is multiple regression analysis most effective
When predictors are not too strongly inter-correlated… wont be as much unique variance explained
Describe linear regression in psychological terms
How much extra of the differences in peoples scores can be explained by each additional relationship
What is the best way to do linear regression equation
Is not neccasarily the most sensible, can have negative regression coefficient even if the correlation coefficient is possitive
Simultaneous multiple regression
Take all the predictors wanted from the model and see if they make a significant contribution to the model
Benefit of comparing models
Can work out the additional variance explained by making the 2nd model the same as the first other than adding a predictor (IV)… can then see what this contributes individually
(Compare model with predictor against one without it!)
How to decide what models to choose to compare
Theoretically motivated from previous literature or logical thinking, e.g. which variable is likely mediating between other variable and DV
Why the order predictors are entered into a model matters
Only true if the predictors correlate (overlap) at all… usually the case.
The first predictor entered takes all the variance it explains, if the second correlates to the first then the output wont include overlapping variance explained with the first for the second (will have smaller variance than if entered first)
Reporting a regression analysis (4)
Descriptive statistics (can be in table)
Correlation matrix
Description of analysis carried out (e.g. order of predictors)
Table of the model along with interpretation (e.g. of Beta scales)
Logistic regression
Predicting of group membership, is a regression with a categorical criterion variable (DV is in distinct categories)
Predictors in logistic regression
Similar to multiple regression, can have multiple predictors of continuous or categorical nature (we are only doing continuous)
Types of criterion variable in logistic regression (2), and methodological change depending on the type
Can be binomial (2 groups), or multinomial
need bigger sample for more groups
How to categorise in logistic regression
Split the data into arbitrary sections depending on DV score (see example in ppt)
Odds for an occurrence = …
Probability of an event occurring / probability of an event not occurring
(odds against an occurrence = vise-versa)
Types of log used in logistic regression odds
Either natural log, or log-e, where e is a special number such as pi
What values mean in log odds
Are from negative to positive infinity, negative odds mean it is less likely whereas positive odds mean more likely (0 = 50/50?)
How log odds are used in logistic regression
Use the continuous predictors to determine the likelihood of someone belonging to a particular category / group
Limitations of logistic regression (4)
Relationship between probabilities on DV is assumed sigmoidal (S-shaped)
Is very sensitive to outliers
Ratio of the sample size to variables needs to be high
SPSS assumes we are describing relationships , not making predictions… so no population info
What is logistic regression’s equivalent to R^2
Cox or Smell methods make predictions of variance explained, but R^2 does not exist in logistic regression
What is equivalent to an ANOVA, explain…
Regression with categorical predictors, information is just reported differently… ANOVAs just test the significance of a regression model, but regression also gives significance
How categorical predictors work
Dummy variables created whereby IVs are coded 1 (yes) or 0 (no) into dummy predictors so each IV has a unique binary code
Equation when using categorical predictors, and what this would mean if IV1 = 1 , 0
y = b0 + b1(T1) + b2(T2)…
For IV1: y = b0 + b1
Where dummy variables are added to during regression analysis using categorical predictors
In the Fixed Factors box, where IVs go for linear regression, to code the conditions into the analysis
How ANOVAs and regression analysis differ in information presented
Regression gives coefficients and a constant, ANOVA gives the same info of coefficients but with the constant added to them, displayed as the mean of each group
ANCOVA
A combination of ANOVA and linear regression, it shows the effect of an IV on a DV whilst partialling out the effect of a co-variate
Why is an ANCOVA advantageous
It reduces error variance by removing a confounding variable (co-variate)
Example of a co-variate
If looking at if different styles of lecturing help learning, a co-variate could be the interest in learning for each group… if some groups have much higher interest (helping learning) then interest will confound results
How ANCOVA is seen graphically
Shows each groups DV on the y-axis, with the x-axis showing their scores on the co-variate
What an ANCOVA output will show
Means of each group on DV, like an ANOVA… also the means of each group on DV once co-variate has been partialled out (can compare to original)
Applications of ANCOVA (3)
Removing the influence of confounding factors
Calculate difference between pre and post test scores, in order to remove pre-test scores
For non equivalent, intact group designs (naturally occurring)
Problem and solution with using ANCOVA for pre and post test scores
Influence of pre test score is not partialled out if the difference between pre and post test scores are correlated
Solved by having the pre test score as the co-variate in the ANCOVA
Example of covariate in intact (existing) groups
Comparing women testosterone between different occupation groups, a co-variate would be age. Testosterone differs with age and different occupations are likely to have differing ages (e.g. bar staff or scientist)… so would confound
Graphically creating adjusted means for co-variates
On scatter plot with DV (y-axis), co-variate (x-axis) and different groups plotted… plot the mean of each group for the DV (horizontal) and its intersect with groups regression line (initial mean). Adjust it to the intersect of the co-variates grand mean (vertical) with the regression line of each group (adjusted mean).
ANCOVA assumptions (3)
Normal stuff: ND, HoV and int / rat data
Linear relationship between DV and co-variate
Homogeneity of regression (each group has parallel(ish) repression lines)
Covariate is reliably measured
In item analysis, if the next item to be deleted has the same ‘alpha if deleted’ as another, how do I decide which to delete first?
Delete the item with worse wording, or the one with lower item total statistic.
Types of closed question (3)
Likert scale (agree to disagree) Self-efficacy scale (cannot do to can do) Semantic differentials (e.g. dominant to submissive)
Valance in item analysis
If the item is positively or negatively weighted towards the construct under investigation
Getting to item analysis in SPSS
Analyse - Scale - Reliability analysis
Reporting an item analysis
Report negative correlated items removed and how many weak correlations removed, and final alpha value
Discriminant validity and how to test
If believed unrelated constructs are actually unrelated
Test with a t-test
Eigenvalue in factor analysis
Is the y-axis on a scree plot, is a measure of the amount of variance explained by a factor
Communalities in factor analysis
How much items variance is explained by the factor
Factor analysis in SPSS
Analyse - Dimension reduction - Factor
Subscale totals in factor analysis
Add together item scores (from questionnaire) for all items loading onto a factor
Reporting factor analysis (4)
How many factors are extracted
That rotation was used
The factor labels with a brief definition
Amount factor accounts for overall variance
What is R^2 adjusted
Predictive power of regression equation for the population, not the sample`
Writing up linear regression (4)
Include R^2, relate to the hypothesis, an ANOVA report (F, df, p) and report coefficients table (beta, t and p)
Writing up multiple linear regression (4)
Table of the descriptives
Table of correlations
Describe the analysis done
Show output in table form
When to use a hierarchal regression rather than simultaneous
When there are theoretical considerations that can help achieve the best model
Why use hierarchal over a stepwise regression
It has theoretical input, more manually search for best model so cannot be confounded by chance variations in the data
What method does simultaneous multiple regression use
The Enter method
Stepwise regression
Uses statistical differences to determine which predictors come up with the best model for regression
How stepwise regression works
All variables initially entered into model. Then, in a series of steps, removes possible variables and adds unused ones until the optimum model found
Types of stepwise regression (2)
Forward selection - variables only ever added, cannot be removed
Backward elimination - All entered at start, then some removed and cannot be re-entered
Caution about stepwise regression
It is not theoretically motivated, meaning it can be influenced by chance variations in the data. We should not make strong claims about the regression equation
Writing up a hierarchical regression (3)
Descriptives, correlations and hierarchical output… all in table form
Difference between logistic regression and multiple regression
Logistic uses a non linear equation applied to the values predicted by the regression equation, in order to predict group membership (DV)
Writing up logistic regression (3)
Chi squared reported
Estimates of variance explained (by Cox and Smell method)
Whether each predictor was significant, with values
Why logistic regression is sensitive to outliers
Logistically, participants cannot be ‘a bit’ part of a group, therefore small variations have a big impact on group membership
Writing up ANCOVA (2)
ANOVA report and table of means, after the covariate is partialled out
ANCOVA equation
y = b0 +b1 (T1) + b2 (T2) + b3 (X)
X = covariate
What the varimax rotation method tries to do
Get factors with the least amount of overlap by getting items to load only onto one factor