Psychometrics Flashcards

Question

List 3 exceptions to having optimum difficulty levels

Answer 1

- need for difficult items (e.g selection process) - need easier items (e.g special education) - need to consider other factors (e.g boost confidence/morale at start of test)

Answer 2

...the intrinsic characteristics of an item. It's value is related to a given sample

Answer 3

people who did well on the test overall get the item correct (and vice versa)

Answer 4

calculated by looking at proportion of people in the upper quartile who got the item correct minus the proportion of people in the lower quartile who got the item correct {in other words, the difference in item difficulty when comparing the top and bottom 25%} Di = U/Nu-L/Nl *Should be a positive number of item has good discriminability

Answer 5

A negative number

Answer 6

Calculate an item-total correlation (if test-taker fails the item but does well on the overall test, i-tc will be negative)

Answer 7

which items to remove from the questionnaire

Answer 8

the relationship between performance on an item and performance on the overall test

Answer 9

x-axis = total score on test y-axis = proportion {of test takers who got the item} correct

Answer 10

1. Define categories of test performance (eg specific total scores/percentages) 2. Determine what proportion of people w/in each category got the item correct 3. Plot your ICC

Answer 11

Test difficulty is tailored to the individual - wrong answer = decrease difficulty, right answer = increase difficulty. Test performance is defined by the level of difficulty of items answered correctly

Answer 12

The Adaptive Computer-based test (ACT)

Answer 13

- increase morale - quicker tests - decrease chance of cheating

Answer 14

1. Peaked conventional 2. Rectangular conventional 3. Adaptive

Answer 15

- Test individuals at average ability. - Doesn't assess high or low levels well - high precision for average ability levels, low precision at either end

Answer 16

- equal number of items assessing all ability levels - relatively low precision across the board

Answer 17

- test focuses on the range that challenges each individual test-taker - precision is high at every ability level

Answer 18

The test is developed based on learning outcomes - compares performance with some objectively defined criterion (What should the test-taker be able to do?)

Answer 19

2 Groups - one given the learning unit and one not given the learning unit. Graph should look like a V

Answer 20

1. tell you you got something wrong, but not why 2. Emphasis on ranking students rather than identifying gaps in knowledge 3. Teaching to the test - not to education

Answer 21

The test specifications

Answer 22

1 Test (response) format 2 Item format 3 Total number of test items (test length) 4 Content areas of the construct(s) tested 5 Whether items or prompts will contain visual stimuli 6 How test scores will be interpreted 7 Time limits

Answer 23

1. Selected response (eg Likert scale/MCQ/dichotomous) 2. Constructed response (eg essay/fill-in-the-blank) 3. Performance response (eg block design task)

Answer 24

Obj - MCQ or Likert Subj - Essays, projective tests

Answer 25

1. Open-ended - eg open ended essay q (no limitations on the test taker) 2. Forced-choice items - MCQS, true/false qs. 3. Ipsative forced choice (leads the test-taker into a certain direction, but still somewhat open. e.g I find work from home....) 4. Sentence completion 5. Performance based items

Answer 26

1. Amount of administration time available 2. Purpose of the measure (eg screening vs comprehensive)

Answer 27

decreases; fatigued and bored

Answer 28

Content areas

Answer 29

manifestations

Answer 30

consistency or reproducibility; error

Answer 31

Participants learn skills from the first administration of the test

Answer 32

internal consistency

Answer 33

1. Each person has a true score we could obtain if there was no measurement error 2. There is measurement error - but this error is random 3. The true score of an individual doesn’t change with repeated applications of the same test, even though their observed score does 4. The distribution of random errors and thus observed test scores will be the same for all people

Answer 34

1. If we construct a test on something, we can’t ask all possible questions - So we only use a few test items (sample) 2. Using fewer items can lead to the introduction of error

Answer 35

X = variance of true score * this is a logic estimation - not a calculation we can actually fo

Answer 36

....Standard Error of Measurement (SEM) SEM= SD√1-r (r=reliability of the test)

Answer 37

The z-score for a 95% confidence interval = 1.96 Therefore: Lower bound = x-1.96*(SEM) *Upper bound* = x+1.96*(SEM)

Answer 38

1. Test-retest rel 2. Parallel forms rel 3. Inter-rater rel 4. Internal consistency - split-half - coefficient/cronbach's alpha

Answer 39

- the correlation of stability - source of error = time sampling

Answer 40

1. Carry over effects (attitude or performance at T2 influenced be performance at T1 2. Practice effects 3. Time between testing (too little time = remember responses, too much time = maturation)

Answer 41

Name = coefficient of equivalence Source of error = item sampling

Answer 42

1. response alternatives can be reworded 2. order of questions changed 3. change wording of question 4. different items altogether

Answer 43

- IRR = how consistently multiple rates agree (more raters = more reliability) - correlation between 2 rates = Cohen's Kappa, between more than 2 raters = Fleiss' Kappa - >.75 = excellent agreement - .50 - .75 = satisfactory - >.40 = poor

Answer 44

IC = the extent to which different items within a test measure the same thing Source of error = internal consistency

Answer 45

ADV = only need 1 test DISADV = how do we divide the test into equivalent halves (correlation with change each time depending on which items go to each half)

Answer 46

halving the length of the test also decreases the reliability (domain sampling model says fewer items = lower reliability)

Answer 47

Spearman-Brown correction

Answer 48

the error associated with each test item as well as error associated with how well the test items fit together

Answer 49

≥ 0.70 = exploratory research ≥ 0.80 = basic research ≥ 0.90 = applied scenarios

Answer 50

When there are too many items - as this artificially inflates your CA scores

Answer 51

1. Number of items in a test 2. Variability of the sample 3. Extraneous variable (testing situation, ambiguous items, unstandardized procedures, demand effect etc)

Answer 52

1. Increase/decrease the number of items 2. Item analysis 3. Inter-rater training 4. Pilot-testing 5. Clear conceptualisation

Answer 53

1. Number of test items 2. Bad test items (too broad, ambiguous, easy etc) 3. Multi-dimensionality

Answer 54

IC = how inter-related the items are HG = unidimensionality, the extent to which it is only made up on one thing

Answer 55

1. The mean of all split-half reliabilities (not accurate) 2. A measure of first-factor saturation 3. The lower bound of the reliability of a test 4. It is equal to reliability in conditions of essential tau-equivalence 5. A more general version of the KR coefficient of equivalence

Answer 56

CA -> Deals with variance (how much scores vary) and covariance(the amount by which items vary together (co-vary)) SIA-> deals with inter-item correlation (the correlation of each item with every other item)(SIA derived from CA)

Answer 57

Think about your group of friends – all of you probably ‘fit together’ pretty well You co-vary a lot: You have a lot of shared variance and little unshared variance As a group, you are internally consistent Now think about the PSY3007S class as a whole There is a fair amount of shared variance between people in the class, but the class probably has a lot more varied people in it than your group of friends The PSY3007S class therefore has more variance and less covariance than your group of friends As a class, you are less internally consistent than your group of friends

Answer 58

SH reliability relies on inter-item covariance, but doesn't take variance into account. CA takes variance into account, which accounts for error of measurement. CA will therefore be smaller than split-half rel, and is a better estimate

Answer 59

unidimensional multi-dimensional

Answer 60

FALSE Multi-dimensional tests can have high CA values too. (EG the WAIS-III measures 2 factors - Verbal IQ and Performance IQ, yet it has very good reliability and CA scores) People assume the above to be true because the confuse the terms INTERNAL CONSISTENCY and HOMOGENEITY

Answer 61

CA measure how well items fit together (the co-varience of items). It makes sense that some people assume that the more covariance items have, the more they should fit together to make up one thing (i.e the more they should measure one factor only). BUT, internal consistency and unidimensionality are not the same thing!

Answer 62

is the test measuring what it claims to measure?

Answer 63

1. Gives meaning to a test score 2. Indication of the usefulness of a test

Answer 64

moot also not valid

Answer 65

1. Face validity 2. Content validity 3. Criterion validity - Concurrent & Predictive 4. Construct validity - convergent & divergent

Answer 66

On the surface (its face) the measure seems to measure what it claims to measure. Determined through a review of the items, not through a statistical analysis

Answer 67

...degree to which a test measures an intended content area

Answer 68

It is established through judgement by expert judges and statistical analysis such as factor analysis

Answer 69

1. Construct under-representation: A test does not capture important components of the construct 2. Construct-irrelevant variance: When test scores are influenced by things other than the construct the test is supposed to measure (e.g test score influenced by reading ability or performance anxiety

Answer 70

Reliability

Answer 71

how well a test score estimates or predicts a criterion behaviour or outcome, now or in future

Answer 72

Concurrent criterion validity: The extent to which test scores can correctly identify the current state of individuals Predictive validity: How well does performance on one test predict future performance on some other measure?

Answer 73

the relationship between the construct we want to measure and other constructs (to what other constructs is it similar or different

Answer 74

A hypothetical attribute Something we think exists, but is not directly measurable or observable (e.g., anxiety)

Answer 75

1. Convergent validity: High correlations with between tests that measure similar constructs 2. Divergent/discriminant validity Scores on a test have low correlations with other tests that measure different constructs

Answer 76

1. Reliability (Any form of measurement error can reduce validity But you can have reliability without validity, but your test would just then be useless) 2. Social Diversity (Tests may not be equally valid for different social/cultural groups E.g., a test of superstition in one culture might be a test of religiosity in another)

Answer 77

Multitrait-Multimethod (MTMM) matrix: A correlation matrix which shows correlations between tests measuring different traits/factors, measured according to different methods

Answer 78

Rule 1: The values in the validity diagonal should be more than 0, and large enough to encourage further exploration of validity (evidence of convergent validity) Rule 2: A value in the validity diagonal should be higher than the values lying in its column and row, in the heterotrait-heteromethod triangles (HTHM triangles are divergent validity values, validity diagonal values are convergent validity values - conv val must be > then div val) Rule 3: A value in the validity diagonal should be higher than the values lying in its column and row, in the heterotrait-monomethod triangles (HTMM triangles also = divergent val values) Rule 4: There should be more or less the same pattern of correlations in all the different triangles

Answer 79

correlate it with an already established scale via a MTMM matrix

Answer 80

monotrait-monomethod values

Answer 81

monotrait-heteromethod values

Answer 82

Validity diagonal and the triangles (HTHM values)

Answer 83

Reliability diagonals and the triangles (HTMM values)

Answer 84

This allows us to see if the pattern of convergent and divergent validity is about the same

Answer 85

1. The psychometric approach (structure of a test, its correlates and underlying dimensions) 2. The information processing approach (how we learn and solve problems) 3. The cognitive approach (how we adapt to real-world demands)

Answer 86

Ability to adapt to new situations Ability to learn new things Ability to solve problems Ability for abstraction

Answer 87

multiple intelligences

Answer 88

1. Age differentiation - older kids greater ability than younger kids, and mental and actual age can be differentiated 2. General mental ability - which is the total product of different and distinct elements of intelligence

Answer 89

Intelligence has certain specific functions Intelligence is related to separate abilities

Answer 90

1. Binet scale was not appropriate for adults 2. Non-intellective factors were not emphasized (e.g social skills and motivation) 3. Binet did not take into account the decline of performance that should be expected with aging 4. Mental age norms do not apply to adults

Answer 91

Fluid intelligence (gf) Abilities that allow us to think, reason, problem-solve, and acquire new knowledge Crystallized intelligence (gc) The knowledge and understanding already acquired

Answer 92

Children - school placement Adults - Neuropsychological assessment Forensic assessment Disability grants Work placement

Answer 93

Weschler intelligence tests Full scale IQ (FSIQ)

Answer 94

Verbal IQ (VIQ): - Verbal comprehension index (VCI); test = vocabulary - Working memory index (WMI); test = arithmetic Performance IQ (PIQ): - Perceptual organization index (POI); test= picture completion - Processing speed index (PSI); test = digit-symbol coding * breakdown of this on lecture 10 slide 10/11

Answer 95

Vocabluary

Answer 96

Arithmetic

Answer 97

Information

Answer 98

Comprehension 1. Situational action 2. Logical explanation 3. Proverb definition

Answer 99

No. Different subtests have different ranges, and same raw scores for people of different ages not comparable. Raw scores are converted to scale scores with set means and SDs

Answer 100

A picture in which an important detail is missing Missing details become smaller and harder to spot

Answer 101

Digit-symbol coding

Answer 102

Matrix reasoning This measures: Fluid intelligence Information processing Abstract reasoning

Answer 103

VIQ = PIQ If both are low, can provide evidence for intellectual disability PIQ > VIQ Cultural, language, and/or educational factors Possible language deficits (e.g., dyslexia) VIQ > PIQ Common for Caucasians and African-Americans

Answer 104

1. Sanguine 2. Phlegmatic 3. Choleric 4. Melancholic

Answer 105

1. Stable characteristics (personality traits, basic behavioural/emotional tendancies) 2. Personal projects and concerns - what a person is doing and wants to achieve 3. Life story/narrative - construction of integrated identity

Answer 106

1. Basic tendencies/predispositions to act in a certain way 2. Consistencies in behaviour 3. Influence behaviour across a variety of situations

Answer 107

structured personality measures

Answer 108

Openness Conscientiousness Extraversion Agreeableness Neuroticism

Answer 109

1. The Big 5 Test 2. The 16 personality factor test 3. Myers-Briggs type indicator 4. Minnesota multiphasic personality inventory (MMPI)

Answer 110

...personality disorders

Answer 111

The Minnesota multiphasic personality inventory (MMPI)

Answer 112

1. Help develop treatment plans 2. Help with diagnosis 3. Help answer legal questions 4. Screen job candidates 5. Part of therapeutic assessment

Answer 113

Unstructured ...motives that underlie behaviour

Answer 114

1. Thematic Apperception Test (TAT) 2. The Rorschach test 3. Draw a person test

Answer 115

One must make up a dramatic story about ambiguous black and white pictures, describing the feelings and thoughts of characters 1. Achievement motive Need to do better 2. Power motive Need to make an impact on people 3. Intimacy motive The need to feel close to people

Answer 116

1. Constructs must have the same meaning across cultures 2. Bias analysis must be done

Answer 117

1. Caution in interpretation 2. Cross-cultural adaption of test