3rd exam Flashcards

1
Q

What is the formula for classical testing theory?

A

X= T + E (x- observed score), t (true score), E (error, systematic and random)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What creates a problem for classical testing theory?

A

Guessing on an achievement test could cause the true score to be wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do we know when people guess?

A

We never know when someone is guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Abott’s formula

A

allows you to understand and calculate true score for blind guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If you are guessing wrong what happens within classical testing theory?

A

the observed score is not reflective of their true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Abbotts actual math formula

A

R (correct responses) - W (wrong responses) divided by K (number of alternatives) -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To overcome the influence of blind guessing

A

one should advise examinees to attempt every question– since not all guessing is blind. Guessing one can narrow down and get it correct and the number of times blind guessing goes on tends to be less frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an error in multiple choice questions?

A

not the question its self but the responses you chose from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the error within short-answer questions?

A

the issue is what is the question asking and how do I answer it? this affects reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ebels idea of reliability and response options

A

reliability studies have been done on the number of response options, a better way to increase test reliability is to add more items (responses should be around 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speed tests

A

best way to calculate reliability for speeded tests is to do a split half reliability on the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With speed tests how should you do reliability

A

administer half the test and give half the time to complete the test, also administer 2 weeks apart, better indicator of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Halo Effect

A

raters tendency to perceive an individual who is high (or low) in one areas is also high (or low) in other areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 kinds of halo effects

A

general impression model and salient dimension model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

General impression model

A

tendency of rater to allow overall impressions of an individual influence judgment of a persons performance (ex: person may rate reporter as “impressive” and thus, also rate him/her as her speech as strong)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Salient dimension model

A

take one quality from the person and that affects the rating of another quality of the person (ex: people rated as attractive are also rate as more honest) (make inferences about an individual based on one salient trait or quality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Simpson paradox

A

aggregating data can change the meaning of the data, can obscure the conclusions because of a third variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Percentages are at the heart of the simpson paradox, why are they bad?

A

because they obscure the relationship between the numerator and denominator (ex: 8/10 is 80% but also 80/100 80% is the same but number of people who reviewed a restaurant is different)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is important in knowing the percentage?

A

you need to know what the numerator and denominator are, or you are misinterpreting the percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What happens when you disaggregate the data?

A

you can truly see if the phonomenon is actually occurring in simpson paradox

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Clinical Decision-Making

A

make decisions on own clinical experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mechanical decision-making

A

make decisions based on data or statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Clinical psychologists often feel that their decision making is

A

absolute, but it is flawed because there are biases that we pull that affect our decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Robin Dawes

A

asserts that mechanical prediction is better than clinical prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Dawes example

A

asked faculty to rate students in graduate program from 1964-1967. Asked them to rate each student on a 5pt scale , however was very low correlation between current faculty ratings and ratings by the admissions committee, but ratings were correlated with GRE and Undergrad GPA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

quantitative data (mechanical decisions) were

A

more predictive than clinical judgment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When can mechanical or quantitative prediction work?

A

when people highlight what variable to examine to determine prediction-people are necessary to choose what variables to examine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

dawes crude mechanical decision making

A

ex: marital relationship satisfaction was determined based on higher sex versus argument rations-people tend to rate relationships higher if have more sex and less fights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

People are not good at what with the data according to Dawes?

A

integrating the data in unbiased ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

There is resistance to what prediction

A

mechanical prediction, our belief in prediction is reinforced by isolated incidents we can access (we rely on testing which is quantitative data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Always need to know the base rate?

A

to make sure to not make clinical judgment errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Clinical decision making always has to be balanced by

A

Mechanical decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

When people seek out treatment, they seek it out when they are most

A

Severe, or something is really impacting them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When you are severe, you generally don’t get more severe, which relates to the

A

Regression to the mean, which relates to the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Why is mechanical better than clinical prediction?

A

Dawes says that humans make errors in judgment because they ignore base rates, ignore third variable, ignore regression to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Third variable examples

A

ice cream sales go up, same as crime does in the summer, the third variable is heat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Representative thinking

A

we tend to make decisions based on the information we readily have access to. we use this as shortcuts to live our life, but with diagnosis we need to do more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Using representative thinking

A

can sometimes cause errors in thinking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Heuristic

A

simple rule to make decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Factor analysis goes under

A

Nondichotomous scoring systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Item response theory goes under both

A

Item analysis for both dichotomous and nondichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Generalize ability theory goes under the

A

Overall test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Factor analysis

A

determine which items are associated with latent constructs, these are constructs that cannot be measured directly, we do this mathematically (allows us to look at item quality).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Anxiety as a latent construct

A

3 buckets (overarching constructs): physical, emotional/psychological and cognitive (every disorder has buckets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Within anxiety the latent construct, what would the 3 overarching constructs contain?

A

Physical (heart rate, sweating, shaking, GI distress), Emotional/psychological (irritability, worry, nervousness), Cognitive (poor concentration, rumination)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

3 necessary conditions to write a factor analysis

A
  1. factor structure represents what we know about the construct
  2. factor structure can be replicated
  3. factor structure is clearly interpretable with precise scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

what type of sample does a factor analysis require?

A

need a an over-inclusive larger sample between 200-500 subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

facets

A

defined-homogenous item clusters that directly map onto the larger order factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What happens when there are more items in a factor analysis?

A

created ability to tap into the constructs that you may have not anticipated, it can also produce facets or sub-constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

With item format, where can you not do it?

A

cannot use dichotomous item response formats because it can cause a serious disturbance in the correlation matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

why do authors suggest having rating scales or likert scales from 5 to 7 points?

A

more response items greater amount of variance can be captured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Who should you sample for factor analysis?

A

Heterogeneity is needed, researchers should get a sample that can represent all trait dimensions

53
Q

one of the reasons for conducting a factor analysis

A

develop and identify a hierarchical factor structure

54
Q

Hierarchical factor structure

A

allows us to statistical identify those items that appear to be relevant to the construct, may identify another area or construct that was not thought of before putting together the items

55
Q

Major criticism of factor analysis

A

develop these items on constructs that may or may not have a measurable criterion

56
Q

the second reason for conducting factor analysis

A

improving psychometric properties of a test

57
Q

how to improve psychometric properties of a test?

A

factor analysis can help developers determine items to remove, revise, or add more items to improve the internal consistency reliability of items

58
Q

all tests with sound items should have a strong?

A

Internal consistency

59
Q

with the sample size if the factors are well defined you can use a

A

smaller sample of between 100-200

60
Q

The third reason for conducting a factor analysis is developing items that discriminate between samples

A

some items maybe endorsed by certain groups and them you may need to revise those same items so they are more discriminating for another group

61
Q

The fourth reason for conducting factor analysis, developing more unique items- decreasing redundancy

A

having identical items are inefficient- whatever error is present will be associated with both items

62
Q

Why are short forms good?

A

more efficient, less time consuming, easier for examinee and assessor

63
Q

2 primary objections to short form development

A

1) can the short form give the appropriate information for an appropriate assessment
2) is the short form accurate and valid

64
Q

General problems for short forms

A

1) there is an assumption that all the reliability and validity of the long form automatically applies to the abbreviated short form
(due to the reduced coverage can not assume there is similar reliability and validity)
2) there is an assumption that the new shorter measure requires less validity evidence (primary problem when you have less items and content coverage you will compromise the validity of the measure as well)

65
Q

Empirical evidence of short forms (Smith, McCarthy & Andersen)

A

Examined 12 short forms to examine equivalence to longer original form,
-found that if large measure does not have good validity, how can a short one?
-by reducing the items the content coverage maybe compromised
-significant reduction in reliability coefficients
-many researchers do not run another factor analysis on short forms
-need to administer short form to an independent sample to determine validity
-need to use short form to classify clinical populations and compare to long form
-need to establish genuine time and money savings with a short form

66
Q

Item response theory 2 types

A

difficulty and discriminability

67
Q

Item Response Theory

A

a mathematical and statistical tool to determine item quality, to see how items look differently based on specific groups or individuals who are apart of a group

68
Q

Classical testing theory is limited because

A

all error is lumped together in one term E (in formula), we can’t determine error at the individual item level

69
Q

Item Response theory relating to error from Classical Testing theory

A

allows to examine error at the item level using a hiearachial mathematical modeling to observe scoring patterns.

70
Q

Two types of item analysis

A

item difficulty and discriminability

71
Q

How do we know what a good item is on a test

A

First we did factor analysis, but sometimes problems with this, according to IRT we do item difficulty or discriminability

72
Q

Item difficulty Dichotomous

A

defined by the number of people who get a particular item correct ex: if 84% of people get item #24 correct than the difficulty level for that is .84

73
Q

Item difficulty levels based on higher or lower Dichotomous

A

the higher the number the easier the item, the lower the number the harder the item

74
Q

Item difficulty is based on

A

Chance

75
Q

What should item difficulty be set at?

A

should be set at a moderate level of difficulty by whose average difficulty should equal .50

76
Q

When deciding difficulty levels need to consider what

A

depends on who you are testing ex: medical students should be .2 vs. disabled students .7-.9 (level of skill set is limited)

77
Q

What are the best level of difficulty?

A

best tests choose items that are between .3-.7 in difficulty

78
Q

Test floor

A

you should have a sufficient amount of easy items for disabled, testing the floor

79
Q

Test ceiling

A

sufficient amount of hard items (for doctoral level students, medical students)

80
Q

item discriminability Dichotomous

A

determines whether people who have done well on a particular item have also done well on the entire test

81
Q

extreme group method

A

compares people who have done very well with those who have done very poorly on a test

82
Q

How is discrimination found? Dichotomous

A

discriminating between the upper group and the lower group means its a very good item, because its able to discriminate between groups

83
Q

difference between higher and lower numbers for discrimination Dichotomous

A

the higher the number the more discrimination, the lower the number the less discrimination

84
Q

overthinking the problem

A

when there is a negative number in discrimination

85
Q

D= index of discrimination

A

number of persons passing in Upper and Lower limits are expressed in percentages and the difference between those percentages is the index of discrimination

86
Q

how do we know it is dichotomous?

A

whenever we have the word correct, because dichotomous is right or wrong

87
Q

Point Biserial Method

A

find the correlation between the performance on the item and compare it with the entire test

88
Q

Point Biserial positive meaning

A

ranges from -1 to +1, if the number is positive or closer to one, it tells us that it discriminates in that those that scored higher on the test also got this particular question or item correct

89
Q

Point Biserial negative meaning

A

ranges from -1 to +1 if there is a negative point biserial, it indicates that their may be a problem with the item

90
Q

Point Biserial chart explanation

A

showing higher number relationship to difficulty of question, amount of those individuals are getting it correct

91
Q

item characteristic curves are

A

dichotomous and let you know if the item is good

92
Q

overthinking representation

A

when the item starts going up and goes down on a item characteristic curve

ex: upper group goes up and goes down

93
Q

we focus on which group?

A

the upper group for dichotomous

94
Q

item response function

A

a mathematical function describing the relation between where an individual falls on the continuum of a given construct such as depression and the probability that he/she will give a particular response to a scale item designed to measure that construct

95
Q

difficulty for non-dichotomous

A

is symptom severity, looking at L= mild, M= moderate and U= severe, the farther away from y-axis more severe

96
Q

Discriminability nondichotomous

A

means that the item discriminates between individuals that have severe symptoms and mild symptoms

97
Q

item difficulty curve non-dichotomous

A

the curve that is furtherest from y-axis is considered the most difficult

98
Q

item difficulty and discrimination for non-dichotomous will always provide

A

mathematical model will always provide a curve to show these

99
Q

item discrimination curve for non-dichotomous

A

the curve that has the steepest slope is most discriminating item

100
Q

advantages of IRT over CTT

A

IRT can look at the probability of getting an item correctly based on test takers ability, qualities. Can adapt to computer administration to give specific items related to ability level, IRT lets us better test those at higher and lower abilities and it lets us compare different groups (ethnicities, gender) on same items to examine patterns of responding, allows us to move away from bias questions and greater accuracy at the item level

101
Q

generalizability theory is based on what aspects of the test?

A

the overall test, a new understanding of reliability

102
Q

Why is generalizability theory moving away from Classical Testing Theory?

A

to understand how reliability is affected by various sources of error

103
Q

Classical Testing theory only assumes 2 sources of error

A

random and systematic error

104
Q

measurement error

A

error thats associated when we try to quantify a specific construct or concept

105
Q

measurement error is associated with 3 errors

A

procedural error, instrumental error and evaluator error

106
Q

procedural error

A

a non-standardized administration, this is not chance based because the more you practice the less you will commit this error

107
Q

instrumental error

A

error associated with the instrument or the items on the test

108
Q

evaluator error

A

any error that is committed by the assessor, one could be making problematic interpretations about the data, or not scoring correctly

109
Q

measurement error is similar to circumscribed error

A

accounting for all possibilities

110
Q

2 compononets of generalizability theory

A

generalizability and dependability

111
Q

generalizability

A

can we generalize this observed test score to all the possible universal scores to that person
ex: husband and wife test drove a prius one time, said it was great, they are generalizing saying that all prius’s are good
-when testing someone one time does their observed score represent their true score after testing.

112
Q

dependability

A

will the observed score remain constant even if we change the testing parameters
ex: they have a new prius, it does great without crazy weather but it doesn’t work well when its raining, will it remain constant in how it drives if the aspect of the road changes

113
Q

generalizability closer to 1 means

A

the closer it is to 1, it means that we are more confident that the observed score can be generalizable to all the possible scores for that particular person.

114
Q

dependability closer to 1 means

A

the closer to 1, the observed score will remain constant irrespective of the testing parameters

115
Q

within generalizability theory it allows us to look measurement error which could be

A

items on test, raters, setting, assessment, time
ex: setting in a prison, could give different responses

116
Q

problems with classical testing theory is that they only recognize

A

two sources of variance (test-retest and internal consistenty)

117
Q

variance and error in classical testing theory according to generalizability theory is that these are

A

synonmous words

118
Q

how does the generalizability theory extends the true score model?

A

by acknowledging that multiple factors may affect the error associated with measurement of one’ true score

119
Q

rater is another way of saying

A

assessor

120
Q

Sources of error

A

noisy room, specific items, examinee fatigue, administrator of the test (some people will have minimal experience, some will have a lot) all of these we could not address in CTT

121
Q

Fundamental equation

A

reliability= variance of T divided by variance of x (which is variance T + variance E)
the larger the variance of T in relation to X, the higher the reliability

122
Q

sources of variance

A

p= person taking the test, i= items on the test, e= random error, pi= interaction b/w person taking the test and the items on the test

123
Q

the bigger circle on the vinnediagram says what about error

A

there is more error of it

124
Q

adding another source of variance in the vindiagram

A

j= judge (evaluator)
pj= person interacting with the judge
ij= item and judge interaction (some judges might favor certain items vs. other items)
pij= interaction with the person taking the test, the items on the test and the judge

125
Q

Norm oriented perspective

A

tend to be associated with generalizability coefficients. only uses indices that have p or person involved

126
Q

Domain-oriented perspective

A

associated with the dependability coefficient, and they look at all the indices

127
Q

whenever you see a T in the formula, what is it equal to?

A

T is equivalent to P
true score if equivalent to person

128
Q

What do we use to understand item discriminability with dichotomous scoring

A

Extreme group & Point Biserial