Psychometrics; Lec 9 & 11 (no 10 due to bank hol); Lab 5 Flashcards

1
Q

What is a psychometric test?

A

A psychometric test is a standardised procedure for sampling behaviour and describing it using scores or categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Most tests are ‘norm-referenced’ what does this mean?

A

They describe the behaviour in terms of norms, test results gathered from a large group of subjects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

While most tests are norm-referenced, some are ‘criterion-referenced’, what does this mean?

A

The objective is to see if the subject can attain some pre-specified criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 5 things one should consider in writing a test?

A
  1. Ensure that all aspects of the construct are dealt with e.g. anxiety - all aspects
  2. Need to be long enough to be reliable - start with 30 questions and reduce to 20
  3. Should assess only one trait
  4. Should be culturally neutral
  5. Should not be the same item rephrased (mentioned during FA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In terms of establishing item suitability. There should not be too many items which are…?

A

There should not be too many items which are either very easy or very hard.

i.e., 10% of items with scores .8 is questionable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In terms of establishing item suitability; items should have an acceptable standard deviation, what does this mean?

A

If the SD is too low then it is not tapping into individual differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In terms of establishing item suitability; if there are different constructs then…

A

… it is important that an equal number of items refers to each construct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is criterion keying and how is it used to establish item suitability?

A

Criterion keying - items are chosen based on their ability to differentiate the population in general from a specific group (e.g. surgeons, pilots).

Criterion keying is atheoretical

Groups must be well defined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why should you interpret measures that have been established using criterion keying liberally?

A

Because there will be overlap in response distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is factor analysis used to establish item suitability?

A

Based on FA, items that have a low loading (

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Classical item analysis is used to establish item suitability and improve reliability, how does this work?

A

Based on classical item analysis, the correlation of an item’s score with the score on the whole test (excluding that item) is calculated.

Removing items with low correlations improves reliability. Although, because reliability is also a product of the number of items, there is a balance.

Each time an item is removed the correlation of each item to the main score must be recalculated since this will change as items are removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many psychological constructs should each scale measure?

A

One

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does ‘measurement error’ mean?

A

That for any one item, the psychological construct only accounts for low % of the respondent’s variation. (other factors cause most of variation - age, religious beliefs, sociability, peer-group pressure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you get rid of random variation (e.g. age, religious beliefs) when building a scale?

A

Use several items and this variation should cancel itself out such that measured variance is due to the underlying construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

One important measure of reliability for psychometric instruments is that of temporal stability - what is this? What is it often referred to as?

A

Temporal stability is often referred to as ‘test-retest reliability’.

Temporal stability involves administering the same test to people over a time span. It measures whether the measure produces the same outcomes over time.

e.g. If a respondent scores strongly as an extrovert on a particular day and then, 2 weeks later, scores strongly as an introvert, we may begin to question whether the instrument is measuring anything useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are two ways of measuring the extent to which a scale measures one construct only? (a form of reliability testing)

A
  1. Split half reliability

2. Chronbach’s Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is split half reliability?

A

Split half testing measures internal consistency reliability.

Steps:

  1. Adminster the test to a large group of students
  2. Randomly divide the test questions into two parts. For example, separate even questions from odd questions.
  3. Score each half of the test for each student
  4. Find the correlation coefficient (Pearson’s) for the two halves - a reliable test will have high correlation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Split-half reliability and Cronbach’s Alpha are measures of internal consistency, what does this mean?

A

Internal consistency reliability is a way to gauge how well a test or survey is actually measuring what you want it to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Cronbach’s Alpha?

A

Cronbach’s Alpha measures internal consistency reliability (how well a test or survey is actually measuring what you want it to measure) for multiple question likert scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some problems with Cronbach’s Alpha?

A
  1. It is influenced by the average correlation between the items and the number of items in the test
  2. It can be artificially boosted by asking the same question twice
  3. Test should not be used if alpha is below .7
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are 3 types of validity?

A
  1. Logical
  2. Statistical
  3. Construct
22
Q

What are two types of logical validity?

A
  1. Face

2. Content

23
Q

What are two types of statistical validity?

A
  1. Concurrent

2. Predictive

24
Q

When first trying to decide upon a research methodology and a suitable instrument, what are 3 very practical issues that need to be considered?

A
  1. Is the research question well established or more exploratory in nature?
  2. Should the questions conform to an open or closed format?
  3. Are the instruments of interest licensed or freely available?
25
Q

With regards to psychometrics, what are the practical implications of the research question being well established?

A

Concepts will mostly already be defined, meaning you can point to an existing psychometric measure.

Questions can build on previous research meaning they can be more precise than they would be in exploratory research

26
Q

With regards to psychometrics, what are the practical implications of the research question being more exploratory in nature?

A

If the latent constructs you are measuring are less well-studied you may need to adapt measures, or use multiple measures to try to reach the construct you are looking for.

27
Q

With regards to psychometrics, what are the practical implications of using questions that are open ended?

A

Probably more appropriate to use interview style, giving participants time and space to speak at length

Design likely to be qual/content analysis

If using FA would be EFA

28
Q

With regards to psychometrics, what are the practical implications of using questions that are closed format?

A

Closed questions are more quant - this numerical data is likely to be analysed through correlations of EFA.

29
Q

With regards to psychometrics, what are the practical implications of using questionnaires that are licensed as opposed to freely available?

A

If something is licensed it can cost a lot of money; this can be prohibitive (e.g., for students).

30
Q

When we have decided what kind of scale we want to use, how do we choose from a range of options?

A

We rate potential scales based on their psychometric properties with respect to validity and reliability.

31
Q

What are the 4 different types of validity?

A
  1. Face validity
  2. Content validity
  3. Construct validity
  4. Criterion validity
32
Q

What is face validity?

A

Considering what the test is supposed to be measuring, do the questions seem to be addressing the issues that they should be? (a somewhat superficial form of validity, but usually quite obvious)

33
Q

What is content validity?

A

Does the scale measure the full breadth of the concept?

34
Q

What is construct validity?

A

The extent to which the test measures the theoretical construct that it is supposed to measure.

35
Q

What is criterion validity? What are two types of criterion validity?

A

A more quantitative form of validity - it compares the scale against a direct measure of the construct.

Can be divided into:

  1. Convergent
  2. Divergent
36
Q

Convergent validity is a sub-type of criterion validity (comparing the scale against a direct measure of the concept) - what question does it attempt to answer?

A

Does the scale give similar results to those of other measures of the same concept?

37
Q

Divergent validity is a sub-type of criterion validity (comparing the scale against a direct measure of the concept) - what question does it attempt to answer?

A

Does the scale give results different from a scale that is supposed to measure a different concept?

38
Q

Why can good face validity be conceived of as both a good and bad thing?

A

It is good because it likely won’t be confusing to participants - i.e. they will understand what is being asked of them.

It is bad because it makes it potentially easier to manipulate by providing socially desirable answers.

39
Q

How could you measure content validity for the GHQ-12?

A

You could have it reviewed by mental health experts and compared to the DSM to see if it measures all the symptoms of dysphoria.

40
Q

How could you measure the construct validity of the GHQ-12?

A

A commonly used method to measure construct validity is confirmatory factor analysis (Atkinson et al., 2011). You can use CFA to see whether the measure actually just measures dysphoria as it is supposed to, or whether it measures any other latent variables.

41
Q

How could you measure the convergent criterion validity of the GHQ-12?

A

You could compare results to a similar, validated measure on depression e.g. Becks inventory of depression.

People who score high in depression on GHQ should also score high in depression on Becks.

42
Q

How could you measure the divergent criterion validity of the GHQ-12?

A

You could compare results to a different, validated measure on a similar, but different construct e.g., anxiety - Becks anxiety inventory

Assuming the individual does not have comorbid anxiety (something that should be assessed), someone who scores high on the GHQ should not have a similar score on the BAI.

43
Q

What is the most basic form of reliability?

A

Test-retest reliability - it involves determining whether if you repeat a test you get similar results across both sets of trials

44
Q

What is the internal consistency of a scale?

A

It measures whether several items that propose to measure the same general construct produce similar scores.

45
Q

Using reliability analysis you can do 3 things, what are they?

A
  1. Determine the extent to which items in your questionnaire are related to each other
  2. Get an overall measure of the internal consistency of the scale as a whole
  3. Identify problem items that should be excluded from the scale
46
Q

What 5 models of reliability are available in SPSS?

A
  1. Alpha (Cronbach)
  2. Split-half
  3. Guttman
  4. Parallel
  5. Strict parallel
47
Q

What is Cronbach’s Alpha? How does it work?

A

It is a model of internal consistency, based on the correlation between the sums of items taken from the two halves of the questionnaire. The Cronbach method produces all possible split-half combinations, finds the correlation between them, and quotes an average of these correlations.

48
Q

Why is it important to compute a reliability measure every time a questionnaire is used?

A

Because reliability is sample dependent (for example, hungover university students might not care as much about giving accurate answers as someone who is paying a mental health professional to deliver the same test.

49
Q

How do you compute Cronbach’s Alpha (1951) in SPSS?

A

Analyze –> Scale –> Reliability Analysis –> move all items of the scale to be analysed into the ‘item’ dialogue box –> OK

50
Q

What is the acceptable score for Cronbach’s Alpha?

A

> 0.7

51
Q

What is reliability?

A

The reliability of any given measurement refers to the extent to which it is a consistent measure of a concept