Test theory and psychometrics Flashcards
What must you know about a test to use it?
- Name of test
- Type of distribution
- Definition of construct
- Standardisation or raw?
- Norms / cut offs / population
- Evidence used to support measures - validity
- M and SD
- Out of
- Experience of Tester
What is a distribution?
A set of test scores
What are raw scores?
unmodified scores of test performance
What is frequency distribution?
a tally of the number of score occurrences
What is a grouped frequency distribution?
Scores that fall within intervals are counted (95-99 etc.)
How can we describe the distribution of test scores?
Using:
measures of central tendency
measures of variability/ dispersion
What are the measures of central tendency?
mean, medium and mode
What are the measures of variability/ dispersion?
range, standard deviation
What does SD mean?
The average distance from the mean
What is used to develop norms?
Standard scores
What is a normative sample?
A group of people’s scores who are used as reference
Norms are?
test performance data for reference
What is a criterion?
A standard on which a judgement or decision is based: e.g. GPA x to get into a course
How many scores in a normal distribution fall above and below the mean?
50:50
How many scores in a normal distribution occur between the mean and 1 SD?
34%
How many scores in a normal distribution occur between +-1 SD? e.g. -1 SD to +1 SD
68% (34% on each side)
How many scores in a normal distribution occur between +-2 SD? e.g. -2 SD to +2 SD
96% (48% on each side)
How many scores in a normal distribution occur fall above 2SD?
2% (on each side)
How many scores in a normal distribution occur fall between 1SD and 2SD?
14% (on each side)
What does 75th percentile mean?
higher than 75% of people (in the top 25%)
What are the limitations of using percentiles?
changes all of the time
may not give an accurate reading
if ordinal data it looses precision
Why don’t we just give raw scores?
For broader meaning/ comparison
What is the simplest example of a standard score?
z score
What is the formula for calculating a z score?
z = x - mean / SD
What is the mean for a set of z scores?
0
What is the SD for a set of z scores?
1 SD
What does a z score tell us?
How many SD the raw score is below or above the mean
Other than a z score what is another type of standardised score?
t score
What is the mean of a t score?
50 (half way between 0 and 100)
What is the SD of a t score?
10
What does a t score range between?
5 SD above and below the mean
e.g. 0 - 100
IQ mean and SD?
M = 100 SD = 15
What does FSIQ stand for?
Full scale IQ
Why, if reporting scores to parents is it better to use a t score than a z score?
No negatives (0-100 vs. -2 to 2)
How do you calculate a t score?
T = (Z x 10) + 50
How would you calculate a standardised score on the IQ test?
x = (z x SD) + Mean
What does a true score assume?
That there is an ideal score that captures a degree of a construct
What does test theory say about true scores?
They are not possible to obtain (measurement error)
Test theory states someones x score =
True score + error
What are some sources of error in test scores?
- Test construction (validity, sample)
- Test Administration (conditions, followed manual?, professionalism, computer used?, emotional state, relationship)
- Scoring and interpretation (subjective responses)
What could we assume if there was no error in test scores/
- We’d get the same score twice
- X would be the true score
- Two parallel tests would get the same score
- Wouldn’t need a range
What is psychometrics?
The area of psychology interested in the quality of tests
What makes a good test?
Validity and reliability
What is validity?
supporting evidence for interpretation of scores
e.g. construct validity: measures what is is operationalised to measure
Why is a measure of IQ iffy when it comes to validity?
Can you really use a number to represent how much of that construct (IQ) you have?
What is a nomological network?
a collection of research surrounding and supporting the validity of a construct
What are two types of construct validity?
Convergent: same as other tests measuring same construct
Discriminant: different to unrelated constructs
How many factors in the Weschler?
4 cognitive domains
What types of factor analysis are there?
Exploratory
Confirmatory
What are 4 other types of validity (other than construct validity)?
face
content
population
criterion: predictive and concurrent
What is predictive (criterion) validity?
Using a test to select the most adequate person (e.g. air force pilot that costs a lot to train - want to predict if they will stay)
What is varamax rotation?
a way to figure out factors that are completely unrelated to each other (but not that useful)
What type of analysis is usually used for confirmatory analysis?
SEM
How is reliability assessed?
Correlation
What are the 4 types of reliability?
- Internal consistency (alpha: how well they hang together)
- Test-retest (acceptable = .8)
- Inter-rater
- Parallel forms
What is an example of something that could get good reliability but poor validity?
Using foot size to measure intelligence
• always get the same score
• not a good reflection of the construct IQ
What is an example of something that could get good validity but poor reliability?
XXXX
needs to be valid to be considered reliable
What is an example of something that could get good validity but poor utility?
Something that is too expensive
What is an example of something that could get good utility but poor validity?
Myer briggs - questionable validity but useful for team building
SEM stands for
standard error of measurement
What is used because of the standard error of the mean?
Confidence intervals: e.g. 95% of the time, the true value will fall between x and x q
How do you calculate confidence intervals?
CI = Score +-(1.96 x SEM)
The larger the SE of the Mean?
The lower the reliability of the test
How do you calculate the SEMeasurment?
SD sq root of 1-r
If SD was 12 and r was .88 what would the SEMeasurment?
4.16
What statistics do you use in a meta-analysis?
effect size and SE
Why is height so easy to report?
- Don’t have to rely on self report
- Can observe directly
- Have agreed definitions
- Have agreed ways of measuring
What are some different types of distributions?
normal bimodal positively skewed negatively skewed j-shaped rectangular