3rd exam Flashcards

1
Q

What is the formula for classical testing theory?

A

X= T + E (x- observed score), t (true score), E (error, systematic and random)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What creates a problem for classical testing theory?

A

Guessing on an achievement test could cause the true score to be wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do we know when people guess?

A

We never know when someone is guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Abott’s formula

A

allows you to understand and calculate true score for blind guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If you are guessing wrong what happens within classical testing theory?

A

the observed score is not reflective of their true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Abbotts actual math formula

A

R (correct responses) - W (wrong responses) divided by K (number of alternatives) -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To overcome the influence of blind guessing

A

one should advise examinees to attempt every question– since not all guessing is blind. Guessing one can narrow down and get it correct and the number of times blind guessing goes on tends to be less frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an error in multiple choice questions?

A

not the question its self but the responses you chose from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the error within short-answer questions?

A

the issue is what is the question asking and how do I answer it? this affects reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ebels idea of reliability and response options

A

reliability studies have been done on the number of response options, a better way to increase test reliability is to add more items (responses should be around 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speed tests

A

best way to calculate reliability for speeded tests is to do a split half reliability on the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With speed tests how should you do reliability

A

administer half the test and give half the time to complete the test, also administer 2 weeks apart, better indicator of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Halo Effect

A

raters tendency to perceive an individual who is high (or low) in one areas is also high (or low) in other areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 kinds of halo effects

A

general impression model and salient dimension model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A

tendency of rater to allow overall impressions of an individual influence judgment of a persons performance (ex: person may rate reporter as “impressive” and thus, also rate him/her as her speech as strong)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Salient dimension model

A

take one quality from the person and that affects the rating of another quality of the person (ex: people rated as attractive are also rate as more honest) (make inferences about an individual based on one salient trait or quality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Simpson paradox

A

aggregating data can change the meaning of the data, can obscure the conclusions because of a third variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Percentages are at the heart of the simpson paradox, why are they bad?

A

because they obscure the relationship between the numerator and denominator (ex: 8/10 is 80% but also 80/100 80% is the same but number of people who reviewed a restaurant is different)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is important in knowing the percentage?

A

you need to know what the numerator and denominator are, or you are misinterpreting the percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What happens when you disaggregate the data?

A

you can truly see if the phonomenon is actually occurring in simpson paradox

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Clinical Decision-Making

A

make decisions on own clinical experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mechanical decision-making

A

make decisions based on data or statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Clinical psychologists often feel that their decision making is

A

absolute, but it is flawed because there are biases that we pull that affect our decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Robin Dawes

A

asserts that mechanical prediction is better than clinical prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Dawes example

A

asked faculty to rate students in graduate program from 1964-1967. Asked them to rate each student on a 5pt scale , however was very low correlation between current faculty ratings and ratings by the admissions committee, but ratings were correlated with GRE and Undergrad GPA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

quantitative data (mechanical decisions) were

A

more predictive than clinical judgment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When can mechanical or quantitative prediction work?

A

when people highlight what variable to examine to determine prediction-people are necessary to choose what variables to examine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

dawes crude mechanical decision making

A

ex: marital relationship satisfaction was determined based on higher sex versus argument rations-people tend to rate relationships higher if have more sex and less fights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

People are not good at what with the data according to Dawes?

A

integrating the data in unbiased ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

There is resistance to what prediction

A

mechanical prediction, our belief in prediction is reinforced by isolated incidents we can access (we rely on testing which is quantitative data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

136 studies, mechanical decision making was better in

A

33 out of 34 studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Always need to know the base rate?

A

to make sure to not make clinical judgment errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Clinical decision making always has to be balanced by

A

Mechanical decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When people seek out treatment, they seek it out when they are most

A

Severe, or something is really impacting them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When you start of severe, you generally don’t get more severe, which relates to the

A

Regression to the mean, which relates to the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Why is mechanical better than clinical prediction?

A

Dawes says that humans make errors in judgment because they ignore base rates, ignore third variable, ignore regression to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Third variable examples

A

ice cream sales go up, same as crime does in the summer, the third variable is heat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Representative thinking

A

we tend to make decisions based on the information we readily have access to. we use this as shortcuts to live our life, but with diagnosis we need to do more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Using representative thinking

A

can sometimes cause errors in thinking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Heuristic

A

simple rule to make decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Factor analysis goes under

A

Nondichotomous scoring systems

42
Q

Item response theory goes under both

A

Item analysis for both dichotomous and nondichotomous

43
Q

Generalize ability theory goes under the

A

Overall test

44
Q

Factor analysis

A

allows us to determine which items are associated with latent constructs, these are constructs that cannot be measured directly, we do this mathematically (allows us to look at item quality).

45
Q

Anxiety as a latent construct

A

3 buckets (overarching constructs): physical, emotional/psychological and cognitive (every disorder has buckets)

46
Q

Within anxiety the latent construct, what would the 3 overarching constructs contain?

A

Physical (heart rate, sweating, shaking, GI distress), Emotional/psychological (irritability, worry, nervousness), Cognitive (poor concentration, rumination)

47
Q

3 necessary conditions to write a factor analysis

A
  1. factor structure represents what we know about the construct
  2. factor structure can be replicated
  3. factor structure is clearly interpretable with precise scaling
48
Q

what type of sample does a factor analysis require?

A

need a an over-inclusive larger sample between 200-500 subjects

49
Q

facets

A

defined-homogenous item clusters that directly map onto the larger order factors

50
Q

What happens when there are more items in a factor analysis?

A

created ability to tap into the constructs that you may have not anticipated, it can also produce facets or sub-constructs

51
Q

With item format, where can you not do it?

A

cannot use dichotomous item response formats because it can cause a serious disturbance in the correlation matrix

52
Q

why do authors suggest having rating scales or likert scales from 5 to 7 points?

A

more response items greater amount of variance can be captured

53
Q

Who should you sample for factor analysis?

A

Heterogeneity is needed, researchers should get a sample that can represent all trait dimensions

54
Q

one of the reasons for conducting a factor analysis

A

develop and identify a hierarchical factor structure

55
Q

Hierarchical factor structure

A

allows us to statistical identify those items that appear to be relevant to the construct, may identify another area or construct that was not thought of before putting together the items

56
Q

Major criticism of factor analysis

A

develop these items on constructs that may or may not have a measurable criterion

57
Q

the second reason for conducting factor analysis

A

improving psychometric properties of a test

58
Q

how to improve psychometric properties of a test?

A

factor analysis can help developers determine items to remove, revise, or add more items to improve the internal consistency reliability of items

59
Q

all tests with sound items should have a strong?

A

Internal consistency

60
Q

with the sample size if the factors are well defined you can use a

A

smaller sample of between 100-200

61
Q

The third reason for conducting a factor analysis is developing items that discriminate between samples

A

some items maybe endorsed by certain groups and them you may need to revise those same items so they are more discriminating for another group

62
Q

The fourth reason for conducting factor analysis, developing more unique items- decreasing redundancy

A

having identical items are inefficient- whatever error is present will be associated with both items

63
Q

Why are short forms good?

A

more efficient, less time consuming, easier for examinee and assessor

64
Q

2 primary objections to short form development

A

1) can the short form give the appropriate information for an appropriate assessment
2) is the short form accurate and valid

65
Q

General problems for short forms

A

1) there is an assumption that all the reliability and validity of the long form automatically applies to the abbreviated short form
(due to the reduced coverage can not assume there is similar reliability and validity)
2) there is an assumption that the new shorter measure requires less validity evidence (primary problem when you have less items and content coverage you will compromise the validity of the measure as well)

66
Q

Empirical evidence of short forms (Smith, McCarthy & Andersen)

A

Examined 12 short forms to examine equivalence to longer original form,
-found that if large measure does not have good validity, how can a short one?
-by reducing the items the content coverage maybe compromised
-significant reduction in reliability coefficients
-many researchers do not run another factor analysis on short forms
-need to administer short form to an independent sample to determine validity
-need to use short form to classify clinical populations and compare to long form
-need to establish genuine time and money savings with a short form

67
Q

Item response theory 2 types

A

difficulty and discriminability

68
Q

Item Response Theory

A

a mathematical and statistical tool to determine item quality, to see how items look differently based on specific groups or individuals who are apart of a group

69
Q

Classical testing theory is limited because

A

all error is lumped together in one term E (in formula), we can’t determine error at the individual item level

70
Q

Item Response theory relating to error from Classical Testing theory

A

allows to examine error at the item level using a hiearachial mathematical modeling to observe scoring patterns.

71
Q

Two types of item analysis

A

item difficulty and discriminability

72
Q

How do we know what a good item is on a test

A

First we did factor analysis, but sometimes problems with this, according to IRT we do item difficulty or discriminability

73
Q

Item difficulty Dichotomous

A

defined by the number of people who get a particular item correct ex: if 84% of people get item #24 correct than the difficulty level for that is .84

74
Q

Item difficulty levels based on higher or lower Dichotomous

A

the higher the number the easier the item, the lower the number the harder the item

75
Q

Item difficulty is based on

A

Chance

76
Q

What should item difficulty be set at?

A

should be set at a moderate level of difficulty by whose average difficulty should equal .50

77
Q

When deciding difficulty levels need to consider what

A

depends on who you are testing ex: medical students should be .2 vs. disabled students .7-.9 (level of skill set is limited)

78
Q

What are the best level of difficulty?

A

best tests choose items that are between .3-.7 in difficulty

79
Q

Test floor

A

you should have a sufficient amount of easy items for disabled, testing the floor

80
Q

Test ceiling

A

sufficient amount of hard items (for doctoral level students, medical students)

81
Q

item discriminability Dichotomous

A

determines whether people who have done well on a particular item have also done well on the entire test

82
Q

extreme group method

A

compares people who have done very well with those who have done very poorly on a test

83
Q

How is discrimination found? Dichotomous

A

discriminating between the upper group and the lower group means its a very good item, because its able to discriminate between groups

84
Q

difference between higher and lower numbers for discrimination Dichotomous

A

the higher the number the more discrimination, the lower the number the less discrimination

85
Q

overthinking the problem

A

when there is a negative number in discrimination

86
Q

D= index of discrimination

A

number of persons passing in Upper and Lower limits are expressed in percentages and the difference between those percentages is the index of discrimination

87
Q

how do we know it is dichotomous?

A

whenever we have the word correct, because dichotomous is right or wrong

88
Q

Point Biserial Method

A

find the correlation between the performance on the item and compare it with the entire test

89
Q

Point Biserial positive meaning

A

ranges from -1 to +1, if the number is positive or closer to one, it tells us that it discriminates in that those that scored higher on the test also got this particular question or item correct

90
Q

Point Biserial negative meaning

A

ranges from -1 to +1 if there is a negative point biserial, it indicates that their may be a problem with the item

91
Q

Point Biserial chart explanation

A

showing higher number relationship to difficulty of question, amount of those individuals are getting it correct

92
Q

item characteristic curves are

A

dichotomous and let you know if the item is good

93
Q

overthinking representation

A

when the item starts going up and goes down on a item characteristic curve

ex: upper group goes up and goes down

94
Q

we focus on which group?

A

the upper group for dichotomous

95
Q

item response function

A

a mathematical function describing the relation between where an individual falls on the continuum of a given construct such as depression and the probability that he/she will give a particular response to a scale item designed to measure that construct

96
Q

difficulty for non-dichotomous

A

is symptom severity, looking at L= mild, M= moderate and U= severe, the farther away from y-axis more severe

97
Q

Discriminability nondichotomous

A

means that the item discriminates between individuals that have severe symptoms and mild symptoms

98
Q

item difficulty curve non-dichotomous

A

the curve that is furtherest from y-axis is considered the most difficult

99
Q

item difficulty and discrimination for non-dichotomous will always provide

A

mathematical model will always provide a curve to show these

100
Q

item discrimination curve for non-dichotomous

A

the curve that has the steepest slope is most discriminating item

101
Q

advantages of IRT over CTT

A

IRT can look at the probability of getting an item correctly based on test takers ability, qualities. Can adapt to computer administration to give specific items related to ability level, IRT lets us better test those at higher and lower abilities and it lets us compare different groups (ethnicities, gender) on same items to examine patterns of responding, allows us to move away from bias questions and greater accuracy at the item level