Psychometrics; Lec 9 & 11 (no 10 due to bank hol); Lab 5 Flashcards
What is a psychometric test?
A psychometric test is a standardised procedure for sampling behaviour and describing it using scores or categories
Most tests are ‘norm-referenced’ what does this mean?
They describe the behaviour in terms of norms, test results gathered from a large group of subjects.
While most tests are norm-referenced, some are ‘criterion-referenced’, what does this mean?
The objective is to see if the subject can attain some pre-specified criterion
What are 5 things one should consider in writing a test?
- Ensure that all aspects of the construct are dealt with e.g. anxiety - all aspects
- Need to be long enough to be reliable - start with 30 questions and reduce to 20
- Should assess only one trait
- Should be culturally neutral
- Should not be the same item rephrased (mentioned during FA)
In terms of establishing item suitability. There should not be too many items which are…?
There should not be too many items which are either very easy or very hard.
i.e., 10% of items with scores .8 is questionable
In terms of establishing item suitability; items should have an acceptable standard deviation, what does this mean?
If the SD is too low then it is not tapping into individual differences.
In terms of establishing item suitability; if there are different constructs then…
… it is important that an equal number of items refers to each construct.
What is criterion keying and how is it used to establish item suitability?
Criterion keying - items are chosen based on their ability to differentiate the population in general from a specific group (e.g. surgeons, pilots).
Criterion keying is atheoretical
Groups must be well defined
Why should you interpret measures that have been established using criterion keying liberally?
Because there will be overlap in response distribution.
How is factor analysis used to establish item suitability?
Based on FA, items that have a low loading (
Classical item analysis is used to establish item suitability and improve reliability, how does this work?
Based on classical item analysis, the correlation of an item’s score with the score on the whole test (excluding that item) is calculated.
Removing items with low correlations improves reliability. Although, because reliability is also a product of the number of items, there is a balance.
Each time an item is removed the correlation of each item to the main score must be recalculated since this will change as items are removed.
How many psychological constructs should each scale measure?
One
What does ‘measurement error’ mean?
That for any one item, the psychological construct only accounts for low % of the respondent’s variation. (other factors cause most of variation - age, religious beliefs, sociability, peer-group pressure)
How do you get rid of random variation (e.g. age, religious beliefs) when building a scale?
Use several items and this variation should cancel itself out such that measured variance is due to the underlying construct
One important measure of reliability for psychometric instruments is that of temporal stability - what is this? What is it often referred to as?
Temporal stability is often referred to as ‘test-retest reliability’.
Temporal stability involves administering the same test to people over a time span. It measures whether the measure produces the same outcomes over time.
e.g. If a respondent scores strongly as an extrovert on a particular day and then, 2 weeks later, scores strongly as an introvert, we may begin to question whether the instrument is measuring anything useful.
What are two ways of measuring the extent to which a scale measures one construct only? (a form of reliability testing)
- Split half reliability
2. Chronbach’s Alpha
What is split half reliability?
Split half testing measures internal consistency reliability.
Steps:
- Adminster the test to a large group of students
- Randomly divide the test questions into two parts. For example, separate even questions from odd questions.
- Score each half of the test for each student
- Find the correlation coefficient (Pearson’s) for the two halves - a reliable test will have high correlation.
Split-half reliability and Cronbach’s Alpha are measures of internal consistency, what does this mean?
Internal consistency reliability is a way to gauge how well a test or survey is actually measuring what you want it to measure.
What is Cronbach’s Alpha?
Cronbach’s Alpha measures internal consistency reliability (how well a test or survey is actually measuring what you want it to measure) for multiple question likert scales.
What are some problems with Cronbach’s Alpha?
- It is influenced by the average correlation between the items and the number of items in the test
- It can be artificially boosted by asking the same question twice
- Test should not be used if alpha is below .7