Standardised methods Flashcards
What does the reliability of a measure refer to?
Is it as free as possible from random error?
- accurate and consistent
- free from random error
What does the validity of a measure refer to?
Does it measure what it says it measures?
- free from random AND systematic error
What does unidimensionality of a measure refer to?
Are we measuring just the one thing we want to measure or have we ended up measuring other things too?
What does discrimination of a measure refer to?
How well do our items distinguish between levels of the thing we’re measuring?
What does equivalence of a measure refer to?
Does the measure perform the same way for different groups of people?
What is norm-referencing?
How are scores distributed in the population?
How are measures standardised?
- rigorously tested for validity and reliability
- Norm-referenced (compare scores against population norms)
- Often delivered in tightly controlled ways
What is the equation for observed scores?
Observed score = true score +/- error
What are random errors?
Usually small deviations above or below true score
E.g., you measure a table three times using the same tape measure and get slightly different values: 174.6cm , 174.2 cm, 174.4 cm
If we take a number of measurements, the sum and mean of random errors should tend towards zero.
What are systematic errors?
Unlike random errors, systematic errors do not cancel each other out with multiple measurements: they accumulate
E.g. The plastic tape measure that you use to measure the table has been stretched out from years of use. It consistently underestimates the true length of the table
What is an example of random error and systematic error in a questionnaire?
Random error- today 5- strongly agree, next week 4- mostly agree
Systematic error- Administering the questionnaire during Covid-19 pandemic when very few are socialising regularly
When do systematic errors occur?
When items are supposed to measure just one dimension of a construct (unidimensional) but in fact, measures more than 1.
eg. intended dimension: extraversion
Unintended dimension: testing environment (during pandemic)
How could random error be reduced?
- repeat measurements and average them (not as simple for psychological variables)
How can systematic errors be reduced?
- Use multiple measurements, each with different downsides (nuisance factors) - variable of interest is measured consistently but nuisance factors are not
What does it mean to be consistent/dependable? (reliability)
- across time and context
What is test-retest reliability?
If you measure something at one point in time, will it remain consistent at another point in time?
What is parallel form reliability?
Will measured characteristic be the same when using multiple versions fo a measure?
What is internal consistency?
Are all items doing just as good a job as one another in measuring the psychological construct of interest?
- operationalization
If measuring the same thing: will be highly correlated
What is a strength of test-retest reliability?
Demonstrates that the measure is temporarily stable
What are 3 weaknesses of test-retest reliability?
- Based on total score
- What about emotion or motivation?
- How long between testing sessions?
What is a strength of parallel forms reliability?
Reduces risk of learning effects when evaluating reliability over time
What is a measure of internal consistency?
Cronbach’s Alpha
What does Cronbach’s alpha measure?
- Mean correlation between items in a subscale
- Number of items in a subscale
What is the maximum value of Cronbach’s alpha?
1 (higher=more reliable)
What Cronbach’s alpha value indicates reliability for research purposes?
> 0.7
What is the split-half technique?
Another way of quantifying internal consistency
- compare scores across 2 halves of a measure
eg. questionnaire has 20 items – does total score of first 10 items correlate with total score of second 10 items?
What are 2 strengths of internal consistency?
- It’s essential! Poor internal consistency can only be due to items measuring different things
- Rubbish in, rubbish out…
What are 2 weaknesses of internal consistency?
- If you increase number of items, Cronbach’s alpha increases
- Extremely high Cronbach’s alpha values might be bloated - too narrow a range of questions were asked
What is inter-rater reliability largely used for?
Coding of observational data
- could be subjective - hard criteria to interpret
- could be objective - clearer criteria
What is internal validity?
Can causal relationship be explained by other factors?
What is external validity?
Can we generalise to other situations/populations?
What is construct validity?
How well we are measuring what we want to measure
What is translation validity?
Is the operationalisation a good reflection of the construct?
What is criterion validity?
How well does the measure agree with some external standard?
What is face validity?
Does the instrument appear to measure the construct?
- not based on theoretical concepts
What is content validity?
To what extent do the items actually represent the whole of the construct dimension that we are trying to measure?
What are the 4 sub-categories of criterion validity?
- predictive validity
- concurrent validity
- convergent validity
- discriminant validity
What is predictive validity?
Does a score on the measure predict the value of another variable in the future?
What is concurrent validity?
Does the measure now correlate with info from a related measure?
What is convergent validity?
Does the measure correlate with another variable that it should theoretically be related to?
What is discriminant validity?
Does the measure correlate with a conceptually unrelated construct? (BAD)
What are 2 ways to score subscales?
- sum scores (if scale is not equally weighted)
- mean scores across subscale (unequal number of items in subscales but equal weighting)
How do you get standardised scoers?
- collect large amounts of data from sample that are representative of population
- Convert raw scores to standard scores (such as z-scores- how far a score is from the mean using standard deviations)
- 50% of ppts will score lower than the mean and 50% above the mean)
- an ongoing process
Why do you want to use standardised scores?
- Gives a reference point for where a score on a measure lies compared to population
- can assess people against these norms
What are extreme scores (ref. standardised scores)?
Certain number of SDs below mean (usually 1.5/2)
What are advantages of using standardised measures?
Rigorous design process
- start with hundreds of possible measures (items)
- Often initial item list evaluated by expert panel
- often subject to factor analysis
Validity and reliability repeatedly tested
Tests have descriptive statistics for population norms which you can use to compare with your own
Why is adapting an existing measure risky?
- Even a slight alteration of wording can impact how people answer
- cant lay claim to original questionnaire’s reliability or validity after adaptation
- Adapted questionnaires should undergo some pre-testing to evaluate reliability and validity - must report these findings