3rd exam Flashcards

Question

Dawes example

Answer 1

asked faculty to rate students in graduate program from 1964-1967. Asked them to rate each student on a 5pt scale , however was very low correlation between current faculty ratings and ratings by the admissions committee, but ratings were correlated with GRE and Undergrad GPA

Answer 2

more predictive than clinical judgment

Answer 3

when people highlight what variable to examine to determine prediction-people are necessary to choose what variables to examine

Answer 4

ex: marital relationship satisfaction was determined based on higher sex versus argument rations-people tend to rate relationships higher if have more sex and less fights

Answer 5

integrating the data in unbiased ways

Answer 6

mechanical prediction, our belief in prediction is reinforced by isolated incidents we can access (we rely on testing which is quantitative data)

Answer 7

to make sure to not make clinical judgment errors

Answer 8

Mechanical decision making

Answer 9

Severe, or something is really impacting them

Answer 10

Regression to the mean, which relates to the middle

Answer 11

Dawes says that humans make errors in judgment because they ignore base rates, ignore third variable, ignore regression to the mean

Answer 12

ice cream sales go up, same as crime does in the summer, the third variable is heat

Answer 13

we tend to make decisions based on the information we readily have access to. we use this as shortcuts to live our life, but with diagnosis we need to do more.

Answer 14

can sometimes cause errors in thinking.

Answer 15

simple rule to make decisions

Answer 16

Nondichotomous scoring systems

Answer 17

Item analysis for both dichotomous and nondichotomous

Answer 18

Overall test

Answer 19

determine which items are associated with latent constructs, these are constructs that cannot be measured directly, we do this mathematically (allows us to look at item quality).

Answer 20

3 buckets (overarching constructs): physical, emotional/psychological and cognitive (every disorder has buckets)

Answer 21

Physical (heart rate, sweating, shaking, GI distress), Emotional/psychological (irritability, worry, nervousness), Cognitive (poor concentration, rumination)

Answer 22

1. factor structure represents what we know about the construct 2. factor structure can be replicated 3. factor structure is clearly interpretable with precise scaling

Answer 23

need a an over-inclusive larger sample between 200-500 subjects

Answer 24

defined-homogenous item clusters that directly map onto the larger order factors

Answer 25

created ability to tap into the constructs that you may have not anticipated, it can also produce facets or sub-constructs

Answer 26

cannot use dichotomous item response formats because it can cause a serious disturbance in the correlation matrix

Answer 27

more response items greater amount of variance can be captured

Answer 28

Heterogeneity is needed, researchers should get a sample that can represent all trait dimensions

Answer 29

develop and identify a hierarchical factor structure

Answer 30

allows us to statistical identify those items that appear to be relevant to the construct, may identify another area or construct that was not thought of before putting together the items

Answer 31

develop these items on constructs that may or may not have a measurable criterion

Answer 32

improving psychometric properties of a test

Answer 33

factor analysis can help developers determine items to remove, revise, or add more items to improve the internal consistency reliability of items

Answer 34

Internal consistency

Answer 35

smaller sample of between 100-200

Answer 36

some items maybe endorsed by certain groups and them you may need to revise those same items so they are more discriminating for another group

Answer 37

having identical items are inefficient- whatever error is present will be associated with both items

Answer 38

more efficient, less time consuming, easier for examinee and assessor

Answer 39

1) can the short form give the appropriate information for an appropriate assessment 2) is the short form accurate and valid

Answer 40

1) there is an assumption that all the reliability and validity of the long form automatically applies to the abbreviated short form (due to the reduced coverage can not assume there is similar reliability and validity) 2) there is an assumption that the new shorter measure requires less validity evidence (primary problem when you have less items and content coverage you will compromise the validity of the measure as well)

Answer 41

Examined 12 short forms to examine equivalence to longer original form, -found that if large measure does not have good validity, how can a short one? -by reducing the items the content coverage maybe compromised -significant reduction in reliability coefficients -many researchers do not run another factor analysis on short forms -need to administer short form to an independent sample to determine validity -need to use short form to classify clinical populations and compare to long form -need to establish genuine time and money savings with a short form

Answer 42

difficulty and discriminability

Answer 43

a mathematical and statistical tool to determine item quality, to see how items look differently based on specific groups or individuals who are apart of a group

Answer 44

all error is lumped together in one term E (in formula), we can't determine error at the individual item level

Answer 45

allows to examine error at the item level using a hiearachial mathematical modeling to observe scoring patterns.

Answer 46

item difficulty and discriminability

Answer 47

First we did factor analysis, but sometimes problems with this, according to IRT we do item difficulty or discriminability

Answer 48

defined by the number of people who get a particular item correct ex: if 84% of people get item #24 correct than the difficulty level for that is .84

Answer 49

the higher the number the easier the item, the lower the number the harder the item

Answer 50

should be set at a moderate level of difficulty by whose average difficulty should equal .50

Answer 51

depends on who you are testing ex: medical students should be .2 vs. disabled students .7-.9 (level of skill set is limited)

Answer 52

best tests choose items that are between .3-.7 in difficulty

Answer 53

you should have a sufficient amount of easy items for disabled, testing the floor

Answer 54

sufficient amount of hard items (for doctoral level students, medical students)

Answer 55

determines whether people who have done well on a particular item have also done well on the entire test

Answer 56

compares people who have done very well with those who have done very poorly on a test

Answer 57

discriminating between the upper group and the lower group means its a very good item, because its able to discriminate between groups

Answer 58

the higher the number the more discrimination, the lower the number the less discrimination

Answer 59

when there is a negative number in discrimination

Answer 60

number of persons passing in Upper and Lower limits are expressed in percentages and the difference between those percentages is the index of discrimination

Answer 61

whenever we have the word correct, because dichotomous is right or wrong

Answer 62

find the correlation between the performance on the item and compare it with the entire test

Answer 63

ranges from -1 to +1, if the number is positive or closer to one, it tells us that it discriminates in that those that scored higher on the test also got this particular question or item correct

Answer 64

ranges from -1 to +1 if there is a negative point biserial, it indicates that their may be a problem with the item

Answer 65

showing higher number relationship to difficulty of question, amount of those individuals are getting it correct

Answer 66

dichotomous and let you know if the item is good

Answer 67

when the item starts going up and goes down on a item characteristic curve ex: upper group goes up and goes down

Answer 68

the upper group for dichotomous

Answer 69

a mathematical function describing the relation between where an individual falls on the continuum of a given construct such as depression and the probability that he/she will give a particular response to a scale item designed to measure that construct

Answer 70

is symptom severity, looking at L= mild, M= moderate and U= severe, the farther away from y-axis more severe

Answer 71

means that the item discriminates between individuals that have severe symptoms and mild symptoms

Answer 72

the curve that is furtherest from y-axis is considered the most difficult

Answer 73

mathematical model will always provide a curve to show these

Answer 74

the curve that has the steepest slope is most discriminating item

Answer 75

IRT can look at the probability of getting an item correctly based on test takers ability, qualities. Can adapt to computer administration to give specific items related to ability level, IRT lets us better test those at higher and lower abilities and it lets us compare different groups (ethnicities, gender) on same items to examine patterns of responding, allows us to move away from bias questions and greater accuracy at the item level

Answer 76

the overall test, a new understanding of reliability

Answer 77

to understand how reliability is affected by various sources of error

Answer 78

random and systematic error

Answer 79

error thats associated when we try to quantify a specific construct or concept

Answer 80

procedural error, instrumental error and evaluator error

Answer 81

a non-standardized administration, this is not chance based because the more you practice the less you will commit this error

Answer 82

error associated with the instrument or the items on the test

Answer 83

any error that is committed by the assessor, one could be making problematic interpretations about the data, or not scoring correctly

Answer 84

accounting for all possibilities

Answer 85

generalizability and dependability

Answer 86

can we generalize this observed test score to all the possible universal scores to that person ex: husband and wife test drove a prius one time, said it was great, they are generalizing saying that all prius's are good -when testing someone one time does their observed score represent their true score after testing.

Answer 87

will the observed score remain constant even if we change the testing parameters ex: they have a new prius, it does great without crazy weather but it doesn't work well when its raining, will it remain constant in how it drives if the aspect of the road changes

Answer 88

the closer it is to 1, it means that we are more confident that the observed score can be generalizable to all the possible scores for that particular person.

Answer 89

the closer to 1, the observed score will remain constant irrespective of the testing parameters

Answer 90

items on test, raters, setting, assessment, time ex: setting in a prison, could give different responses

Answer 91

two sources of variance (test-retest and internal consistenty)

Answer 92

synonmous words

Answer 93

by acknowledging that multiple factors may affect the error associated with measurement of one' true score

Answer 94

noisy room, specific items, examinee fatigue, administrator of the test (some people will have minimal experience, some will have a lot) all of these we could not address in CTT

Answer 95

reliability= variance of T divided by variance of x (which is variance T + variance E) the larger the variance of T in relation to X, the higher the reliability

Answer 96

p= person taking the test, i= items on the test, e= random error, pi= interaction b/w person taking the test and the items on the test

Answer 97

there is more error of it

Answer 98

j= judge (evaluator) pj= person interacting with the judge ij= item and judge interaction (some judges might favor certain items vs. other items) pij= interaction with the person taking the test, the items on the test and the judge

Answer 99

tend to be associated with generalizability coefficients. only uses indices that have p or person involved

Answer 100

associated with the dependability coefficient, and they look at all the indices

Answer 101

T is equivalent to P true score if equivalent to person

Answer 102

Extreme group & Point Biserial

3rd exam Flashcards

(128 cards)