Week 3 - Psychometrics I Flashcards

1
Q

What is standardisation?

A

Ensuring different measurements are in equivalent units before comparing them. When we standardise a raw score, we do it relative to a sample of people (see norms).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are norms?

A

The population to which a standardised score is compared. Who is in the standardisation sample depends on the context.

Usually, it would be a representative sample of people from a specific population or age group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What should the standardisation sample be like?

A

Stable and representative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you establish stability in a standardisation sample?

A

Stability is established by using a big sample.

Mean and Standard Deviation of sample will be more accurate and stable and therefore a better representation of the population mean & SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you establish representativeness in a standardisation sample?

A

Ensuring the sample ‘looks like’ the particular population in question.

Examples include:

  • The same ratio of males to females as population
  • The same age distribution
  • The same socio-economic distribution
  • The same educational distribution
  • The same pattern of geographic origins as the population, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you try to get a representative sample?

A

Stratified cluster sampling i.e. deliberately recruit to get particular ratios of subgroups.

If this fails (e.g. sample has 40% females; population has 50% females) you can also try weighting – e.g. counting each female as 1.25 people and each male as 0.83 of a person.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Perks of a normally distributed variable.

A

Just the mean and standard deviation can tell us how someone’s score compares with everyone else.

We can do more sensitive (parametric) statistical tests on it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What statistical inferences can we make if a variable is normally distributed?

A
  1. Mean = median = mode therefore 50% of people are below/above the mean
  2. 68% of scores +/- 1 SD around mean.
  3. 95% of scores +/- 2 SD around mean.
  4. Tails of distribution are 2 to 3 SD from the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

All about Z scores…

A

Mean of 0 and SD of 1.

To calculate a z score:

  1. Raw score minus average
  2. Divided by standard deviation

Think of it as establishing the absolute amount that a particular person deviates (1) and then accounting for how extreme that is given the usual amount people deviate (2).

You need to memorise this for the exam!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

All about T scores…

A

Mean of 50 and SD of 10.

They are used by people who dislike decimals and the MMPI.

To calculate a T score:

  1. First calculate the Z score
  2. Then multiple by 10 (i.e. the T distribution’s SD)
  3. Then add 50 (i.e. the T disribution’s mean)

This is a linear transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

All about IQ scores…

A

Mean of 100 and SD of 15.

They are used by some intelligence tests.

To calculate an IQ score:

  1. First calculate the Z score
  2. Then multiple by 15 (i.e. the IQ distribution’s SD)
  3. Then add 100 (i.e. the IQ disribution’s mean)

This is a linear transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you make your own standard scale?

A
  • Choose a mean (any number you like).
  • Choose a standard deviation (any number you like).
  • Take someone’s raw score on something, together with the mean and SD of some reference sample of that thing in raw score units.
  • Calculate the z score.
  • Multiply the z score by the SD of your scale.
  • Add the mean of your scale.
  • The number you’re left with is the person’s score on your scale.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a percentile rank?

A

The percentage of people in the norm group falling BELOW a certain raw score.

To determine a percentile rank:

  • Calculate a z score
  • Look up a table of the standard normal distribution

This is NOT a linear transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the disadvantages of percentile ranks?

A

– Confusion between percentile rank and percentage correct (e.g., in an exam).

–Because of the normal distribution, percentile ranks close to 50 include a smaller range of raw scores (they are “bunched up”), whereas percentile ranks close to 0 or 100 includea much wider range of raw scores (they are more spread out).

That is, a percentile rank means different things depending on where you are on the scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Stanines?

A
  • Often used in school tests
  • It has 9 divisions
  • Each division is .5 standard deviation wide
  • The middle band (5) is from -.25 to +.25 standard deviations
  • 20% of people are in this middle band (see different % in each band)

Note: They don’t state this but this is not a linear transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is reliability?

A

The extent to which measurements are consistent or repeatable.

  • Lower measurement error = higher reliability.
  • In Classical Test Theory, reliability is true variance (hypothetical variation of scores in a sample if no measurement error) divided by total variance (actual variation in data - including error)
17
Q

What are the four ways we can estimate the reliability of a test?

A

By calculating:

  1. Internal consistency (Cronbach’s, KR-20)
  2. Inter-rater reliability (corr between raters)
  3. Alternate forms reliability (corr between forms)
  4. Test-retest reliability (corr between times)
18
Q

What is test-retest reliability?

A

Test-retest reliability is the correlation between scores on the same test by the same people done at two different times.

19
Q

What is alternate/parallel forms reliability?

A

Parallel/Alternate Form reliability is the correlation between scores on two versions of the same test by the same people done at the same time.

Note: Parallel forms have stricter criteria (same SD) than alternate forms but for the sake of establishing reliability, they are used the same way.

20
Q

What is inter-rater reliability?

A

Inter-rater reliability is the correlation between scores on the same test by the same people provided by two different examiners.

21
Q

How do you calculate internal consistency?

A

If there are more than two outcomes (e.g. likert scales), use Cronbach’s alpha.

If there are only two outcomes (e.g. yes/no), use KR-20.

Note: When there is a correct answer, you always use KR-20. Even if there are four options (A, B, C, D), only one option is correct. Therefore there are only two outcomes (false, false, true, false).

22
Q

What is validity?

A

Validity is the extent to which the measure actually measures what it’s supposed to measure.

23
Q

Which are the only two types of validity that are opinion-based i.e. not based in empirical evidence?

A
  • Face validity (does it appear valid?)
  • Content validity (items in test appear to cover whole domain?)
24
Q

What is face validity?

A
  • Face validity is how valid a test appears to be (usually from the perspective of the person taking the test).
  • Tests may have good face validity without any actual validity.
  • Tests may have poor face validity while still having good actual validity.
25
Q

Despite its lack of empirical basis, why can face validity be useful?

A

So that laypeople:

–Accept or choose the test because they think it’s appropriate.

–Take the test seriously.

–Don’t leave missing values, etc.

26
Q

What is content validity?

A

How adequately a test samples behaviour representative of the universe of behaviour it was designed to sample.

Often based on the opinions of experts in whatever you’re measuring - therefore on one level it’s nothing more than face validity.

Think of exams having an equal proportion of content from each lecture OR Mark’s hazard test representing typical traffic hazards encountered by Brisbane drivers.

27
Q

What is criterion validity?

A

A criterion is the standard against which the test is evaluated (e.g., actual driving speed was one of the criteria used to validate the speed questionnaire).

The criterion itself needs to be reliable, valid, relevant, and not subject to criterion contamination.

The correlation between the participants’ test scores and their actual job performance ratings is an example of the criterion-related validity coefficient.

28
Q

What is the method of contrasted groups?

A
  • One approach to criterion-related validity is determine whether test scores of groups of people vary as expected.
  • Clinical group vs. non-clinical controls (e.g., dental phobia patients vs. non-dental phobic controls).
29
Q

What is criterion contamination?

A

The criterion used to assess the validity of our test is pre-determined by the test.

Our so-called validity test is circular – all we can really claim is that people who score high on our test will score high on our test.

e.g. using the method of contrasted groups to see if our test can tell apart schizophrenic and non-schizophrenic patients…who were diagnosed using the same test (i.e. silly).

30
Q

What are convergent and discriminant validity?

A
  • Convergent validity: Do test scores correlate with other measures of the same (or a similar) thing?
  • Discriminant/Divergent validity: Do test scores NOT correlate very highly with measures that you’d expect them NOT to correlate with?

i.e. basically the same thing but you are after the presence or absence of a correlation depending.

31
Q

What is construct validity?

A
  • Construct validity is how well the scores on your test reflect the construct (i.e., the trait or characteristic) that your test is supposed to be measuring.
  • It’s been argued that construct validity is essentially an umbrella term for all other types of validity evidence (except face validity).
32
Q

What is the Standard Error of Measurement (SEM)?

A

We assume the client’s true score (their score if the test was perfectly reliable) is at the middle of the distribution.

If your client has only taken a test once, then our best guess of where the middle of this distribution is, would be the actual score they obtained on the test.

On that assumption, we can estimate the likely margin of error (confidence interval) in someone’s score, by adding and subtracting the SEM from their actual score.

It is calculated using a special formula but we don’t need to do this calculation.

33
Q

What is Standard error of the difference (SEdiff)?

A

A special formula we don’t need to know that is used to work out whether someone’s score is significantly different from:

  1. Their own score on the same test at a different time.
  2. Their score on another test of the same thing.
  3. Someone else’s score on the same test.
  4. Someone else’s score on another test.
34
Q

What is the reliable change index?

A
  • In clinical contexts, people often use the Reliable Change Index in place of the Standard Error of the Difference.
  • It’s just a slightly alternative way of doing the same job, where you work out the difference between two scores (e.g., how much someone changed due to some intervention) and divide by the Standard Error of the Difference.
35
Q

What do you do if you expect scores to increase on repeated administrations of a test? (new non-3020 content)

A
  • One way to address this is to re-standardise people’s scores for their 2nd attempt against a sample of 2nd attempt scores (i.e. you correct for the expected improvement).
  • Another method is to add/subtract a constant to an individual’s second score that reflects the average change in a standardisation sample.
  • Remember that any changes need to be significantly different to be considered meaningful - you’ll always get some score fluctuation due to measurement error.