Lecture 3/4 Flashcards

1
Q

Casual def of validity?

A

Validity is the degree to which a psychological test measures what it
purports to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

APA def of validity?

A

The degree to which evidence and theory support the interpretation
of test scores for proposed uses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does validity depend on?

A

test score interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are five types of validity (no expl)?

A

internal structure, associations with other variables, test content (content validity), consequence of use and response process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is content validity (test content)?

A

he degree to which the content of a measure truly reflects the full domain of the construct for which it is
being used, based on expert judgement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is construct underrepresentation? what is it a part of?

A

test is without or too little questions that are needed for the construct
part of content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is construct irrelevant content and what is it a part of?

A

test with irrelevant content, part of content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is face validity?

A

validity in the eyes of the test user (NOT by experts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is reponse process?

A

For the test score to have a valid interpretation, respondents should use the intended reponse process to answer the items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are ways to find out if respondents used the desired reponse process?

A

Direct evidence = think out loud protecols and interviews
indirect = process data (mouse movements, etc.), statistical analysis (item-total correlations, reliability, etc.) and experimentally manipulating response processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are general threats to response processes?

A

Poorly designed items and respondents (aka guessing, social desirability, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the internal structure of a test?

A

Theoretical internal structure (uni or multidimensional) and basically, does the theoretical structure comply to the practical structure?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is internal structure tested? What consists of a good structure as in evidence?

A

factor analysis. Good structure means amount of factors found match theory, further supported by the factor loadings and factor correlations should comply with theory (if the theory says moderate correlation, this should show in the factor correlation matrix)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is associations with other variables (in the context of validity)?

A

Do the test scores relate to other tests and variables in a theoretical
meaningful way? For instance, theoretically, length and weight are moderately correlated, this should also hold true for the observation

Note that a larger than expected correlation is also not a good thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a nomological network?

A

Summarizes all theoretical relations between the construct of
interest and other constructs and variables (and also test items?)

so once again theory should = observed for good validity

see slide 28, lect 3 for visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are discriminant and convergent evidence (within validity!)?

A

discriminant = unrelated in theory (and the items discriminate this as such) and convergent = related in theory (and the items show this as such

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What else can a nomological network include beside constructs?

A

observed variables (i.e., age, grades, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is criterion validity? What is it a part of?

A

part of associations w/ other variables and it is the association btwn the construct and an observed variable it should theoretically be related to
(for instance, maybe age is related to critical thinking in theory, so a good criterion validity should mean it is also related in observation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

There are two subtypes of criterion validity, which are these and explain?

A

Concurrent = association btwn construct and observed variable measured at the same time
> correlation btwn age and intelligence
Predictive = association btwn construct (measured in the future) and observed variable
> primary school grades and salary first job

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What can be two causes of finding a correlation btwn two self-report social skill tests (validity wise)?

A

trait variance (shared variance bc same trait) or method variance (shared variance bc same method)

21
Q

Within the multitrait-multimethod matrixes what are convergent and discriminant validities, what do you want them to look like?

A

Convergent should be high(est) and is due to measuring the same trait w/different tests aka due to trait variance (monotrait-heteromethod)

Discriminant validity should be lower than convergent. This can be either heterotrait-heteromethod or heterotrait-monomethod.
This should not be high bc different traits (and perhaps even different methods), but may be high due to similar traits

22
Q

What is consequence of use (validity)?

A

Does the use of the test scores have its intended consequence?

23
Q

What are factors that contribute to consequence of use?

A

Evidence of intended effects (does it act help, aka more efficient or is it better etc)

Evidence regarding uninteded differential impact on groups (basically is it racist or nah- or like benefit a certain group)

Evidence regarding unintended systematic effects (use of test impacts organization systems, teachers focus more on topics that come up in test)

24
Q

What are the advantages of classical test theory?

A

Its intuitive and easy to apply (in SPSS and easy to do in excell, no large sample sizes/many items needed)

25
Q

What are the disadvantages of CTT?

A

Focus on the test and not the items, test properties depend on the population (reliability/difficulty) and person properties depend on the test (i.e., sum score is higher if the test is easy)

26
Q

What is the Modern Test Theory (MTT)?

A

Specify a measurement model in which we mathematically link the item scores to the construct (also called: latent variable/trait or factor in MTT)

27
Q

What is an assumption of MTT?

A

Unidimensionality

28
Q

What is the item response theory (IRT)?

A

It is a psychometric approach emphasizing the fact that an individual’s response is influenced by qualities of both the individual themselves and by qualities of the item.

There are several forms of IRT, but all include some type of item characteristic curve (see slide5, lecture 4) and an individual’s trait level (theta)

29
Q

What is theta in the IRT?

A

It means an individuals trait level, or where on the latent scale a person is (i.e. very low stats skills).

0 usually denotes the average and a trait level of 1.5 then means 1.5 sds above the average

30
Q

What is the Rasch model?

A

A type of model within IRT, it adds the parameter beta or the item difficulty on top of theta

31
Q

What is beta in IRT? How is it related to theta?

A

The item difficulty, or how easy or difficult it is to answer (this also accounts for things like personality tests).

The theta and beta are interconnected as the difficulty is directly related to the trait level of the individual. An easy item will be able to be answered by almost everyone (including a low trait level), but a difficult item will not.

0 usually denotes the average and a trait level of 1.5 then means 1.5 sds above the average

32
Q

What is the 2PL?

A

The 2PL is a type of model within IRT, it has the parameters theta, beta and alpha

33
Q

What is alpha in IRT?

A

The alpha is the item discrimination, this denotes the slope of the item characteristic curve and can take on any number. It indicates the relevance of the item to the trait being measured by the test (p.514), in other words, it discriminates between high and low trait levels

This is the same as the (corrected) item-total correlation in CTT

Note: the relevance component is important for discrimination because if an item contains irrelevant content for the latent skill, it will not be able to differentiate between high and low trait levels

34
Q

What is the 3PL?

A

A type of model within IRT, it has the parameters theta, beta, alpha and c.

35
Q

Wat is c in IRT?

A

The guessing value, it accounts for people without knowledge guessing an answer and still having a chance to get the correct one

36
Q

What does a beta of 0 usually mean (IRT)?

A

If an item has a difficulty of 0, this usually denotes that a person with a trait level of 0 has a 50% chance of getting the answer correctly

37
Q

Why can the 1PL (Rasch), 2PL and 3PL not be used for all tests?

A

They only account for dichotomous items (aka two possible item scores)
Note: this doesn’t equate response amounts, a multiple choice question can contain 7 options, but is still dichotomous (bc it’s either right or wrong)

38
Q

What is a Polytomous item/model?

A

Where the item has more than two item responses (like a likert scale or an essay item)

39
Q

What is the graded response model (GRM)?

A

It is a model within IRT that can be used when an item is polytomous. It includes the parameters difficulty and discrimination (and of course trait level lol)

see: p.531 for visualization

40
Q

The GRM and PL models of IRT calculate/display their items in different ways (i.e. how many parameters for a 16 item test)

A

PL models calculate only one difficulty parameter per item (so in a 16 test, 16 difficulty parameters - not counting the other possible ones)

GRM calculates multiple difficulty parameters dependent on amount of options per item. For an item with 5 response options, m-1, so 4, difficulty parameters will be calculated (and one discrimination one)

41
Q

What does it mean within a GRM to have a beta1 of -1.78?

A

It means that a person with a trait level of -1.78 has a 50% chance to response higher than the first response option

42
Q

What is the difference between the item characteristic curve and the item information function?

A

the ICC shows the characteristics of the items, whereas the IIF/IIC tells you at which point you get the most information (aka discrimination) for a certain trait level.

The ICC and the IIF look different (sloped curve vs normalish distribution), but denote the same thing. The middle of the slope (alpha stuff) is exactly the top of the normal distribution

43
Q

Now in practice we often time look at the scale information function/curve, bot the item one, why and what is the difference?

A

We are usually interested in the whole of a test, not a part of it, so the SIF/SIC shows you the complete amount of information for the test

44
Q

What is the difference between a norm referenced and a criterion refenced test on the scale information function?

A

Norm referenced looks like a rlly fat normal distribution, where the information is high for every trait level, a criterion referenced looks like a normal distribution that provides the most information around the cut-off point

see slide 28, lecture 4 for visualization

45
Q

Wat is the Differential Item Functioning (DIF)?

A

An item is labeled as having DIF when people with the same latent ability but from different groups have an unequal probability of giving a response

Part of IRT

46
Q

What are the two caused of DIF?

A

item difficulty and discrimination dependent on the group (so you would need specific cultural knowledge for instance)

47
Q

What is Computerized Adaptive Testing (CAT)?

A

It is the IRT but used by a computer. Basically the computer has a large item bank, starts you off with average questions and corrects for you being incorrect or correct (aka highers or lowers trait level and gives questions that fit accordingly). This continues until the estimate for the latent trait doesn’t change anymore

48
Q

What are advantages and disadvantages of IRT?

A

Ad: population and test independent, focuses on items

disad: statistically complex and needs large samples