GIGA PRACTICE Flashcards

1
Q

2 main categories of tests

A

Ability tests vs Personality tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ability test def

A

Measure skills in terms of speed, accuracy, or both.
=> The faster or the more accurate your responses, the better your scores on a particular characteristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 types of ability tests?

A

Achievement, Aptitude and Intelligence tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Achievement test def

A

Measures previous learning.
- E.g. A test that measures or evaluates how many words you can spell correctly is called a spelling achievement test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aptitude test def

A

Measures potential for acquiring a specific skill.
- A spelling aptitude test measures how many words you might be able to spell given a certain amount of training, education, and experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Intelligence test def

A

Measures potential to solve problems, adapt to changing circumstances, and profit from experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of personality tests

A

Structured (objective) and Projective tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Structured personality tests def

A

Provides a self-report statement which require the subject to choose between two or more alternative responses such as “True” or “False”; “Yes” or “No”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reliability def

A

Degree to which test scores are FREE OF MEASUREMENT ERRORS.
-> There are many ways a test can be reliable (e.g., test results may be reliable over time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A psychological test must be (3)

A

(1) Objective: reflect reality - not what we want reality to be
(2) Reliable: provide us with the same reading anytime, use instrument under the same conditions
(3) Valid: measure what we want to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do Psychological Tests differ from Other Measurement Tools? (2)

A

(1) Focus on intangible, theoretical CONSTRUCTS (e.g. psychological attributes) unlike tools measuring physical properties (e.g. rules, scales).
(2) For most of them, you need to have some SPECIALIZED KNOWLEDGE for proper interpretation unlike physical measurements (e.g. ruler).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Construct def

A

Unobservable, theoretical abstract concept. Measured indirectly through behaviours, responses or test results
E.g. intelligence, anxiety, self-esteem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Defining Characteristics of Psychological Tests (5)

A

(1) Representative SAMPLE behaviors
(2) OBSERVABLE and MEASURABLE actions
(3) Thought to measure a PSYCHOLOGICAL ATTRIBUTE
(4) Behavioral samples obtained under STANDARDIZED conditions
(5) Have results for SCORING.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A construct is hypothesized to explain _________________________________

A

the covariation between observed behaviors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kinds of Purposes for Testing (4)

A

(1) Classification
(2) Promoting Self- Understanding and Self-Improvement
(3) Planning, Evaluation and Modification of Treatments and Programs
(4) Scientific Inquiry (Quantification, Hypothesis testing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of scales (4)

A

Nominal, Ordinal, Interval, Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of Norms (3)

A

(1) DEVELOPMENTAL Norms
(2) WITHIN-GROUP Norms
Norms without a Norm Sample
(3) CRITERION-REFERENCED Norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Developmental Norms def

A

Typical level of performance in each of the AGE group or grade levels that the test’s target population comprises.
-> Age-equivalent or grade-equivalent scores are assigned based on the MEDIAN RAW SCORE for that chronological age or grade level.
-> Median = TYPICAL score = norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Within-Group Norms (3)

A

(1) Percentiles
(2) Z-scores
(3) Transformed standard scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard Deviation def

A

A measure of the average distance of scores from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Transformed Standard Score formula

A

Bz + A
B = desired SD
A = desired Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Percentiles disadvantages (2)

A

(1) Magnifies differences near mean; minimizes differences at extremes
(2) Some common statistical analyses are NOT possible with percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard score disadvantages (2)

A

(1) Unfamiliar to many non-specialists
(2) Interpretation difficult when distribution not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Criterion-Referenced Norms def

A

Evaluate performance relative to an absolute criterion or standard rather than performance of other individuals.
-> An absolute vs relative evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Within-Group Norms: Criticisms (2)

A

(1) Only meaningful if the standardization (norm) sample is representative
(2) Within-group comparisons encourage competition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Requirement for Criterion-Referenced Norms

A

Define content of domain narrowly and specifically.
E.g. Driving skills, 8th grade math curriculum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Criterion-Referenced Norms: Issues (3)

A

(1) Can elements of performance be specifically defined?
-> Hard to clearly define what “good” or “bad” performance looks like.
-> Criterion-referenced norms require a clear standard (e.g., scoring 80% on a test to pass), but creating these standards can be challenging because it’s hard to decide what knowledge or skills are essential.
(2) Focus on minimum standards
-> e.g., “Did you pass?”
-> Ignore how much better one person is compared to others.
(3) Absence of relative knowledge
-> You don’t know how someone performs compared to others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Developmental norms cons

A

Often interpreted inappropriately
-> Overgeneralization, misinterpreting median…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is an elevated score?

A

2 z-scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Properties of scales (3)

A

(1) Magnitude
(2) Equal Intervals
(3) Absolute 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

McCall’s T/T-score

A

Same as standard scores (Z scores), except that the M=50 and SD=10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Interquartile range

A

Interval of scores bounded by the 25th and 75th percentiles.
-> bounded by the range of scores that represents the middle 50% of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Stanine system

A

Converts any set of scores into a transformed scale, which ranges from 1 to 9.
M = 5, SD = 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Overselection

A

Selecting a higher percentage from a particular group than would be expected on the basis of the representation of that group in the applicant pool.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Tracking

A

Developmental norms. Tendency to stay at about the same level relative to one’s peers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Big Data

A

Revolution in social science research.
= Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Pearson Correlation Coefficient def

A

QUANTITATIVE description of the DIRECTION and STRENGTH of a straight-line relationship between 2 variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Correlation Coefficient Range

A

-1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

We cannot use Pearson’s r for _____

A

Non-linear relationships
-> Non-linear relationships cannot be described, regardless of their strength.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Classical Test Theory (CTT): Assumptions (4)

A

(1) Each person has a true score that would be obtained if there were no errors in measurement. Observed test score (X) = True test score (T) + Error (E)
(2) Measurement errors are random
(3) Measurement error is normally distributed
(4) Variance of OBSERVED scores = Variance of true scores + Error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

A person’s true score def

A

The hypothetical or ideal measure of a person’s attribute we aim to capture with a psychological test.
=> FREE FROM ERROR
Expected score over an INFINITE number of independent administrations of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Mean error of measurement = ____
Errors are ____ with each other
True scores and errors are _______

A

0; UNcorrelated; UNcorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Two tests are parallel if: (3)

A

(1) EQUAL observed score MEANS
-> Comes from the assumption that True scores would be the same
(2) EQUAL ERROR VARIANCE
(3) SAME CORRELATIONS with other tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Random error characteristics (3)

A

(1) Random
(2) Cancels itself out
(3) Lowers reliability of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Systematic error characteristic

A

Occurs when source of error always increases or decreases a true score
-> DOESN’T LOWER RELIABILITY of a test since the test is RELIABLY INACCURATE by the same amount each time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Sources of Measurement Error (3)

A

(1) CONTENT Sampling Error
(2) TIME Sampling Error
(3) Other Sources of Error (e.g. observer differences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Reliability Coefficient def

A

Proportion of OBSERVED test scores accounted for by variability in TRUE scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Standard Error of Measurement (SEM) def

A

Amount of uncertainty/error expected in an individual’s observed test score.
**=> Corresponds to the SD of the distribution of scores one would obtain by repeatedly testing a person. **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Spearman-Brown formula def

A

Predicts the effect of lengthening or shortening a test on reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Test reliability is usually estimated with what methods? (4)

A

(1) Test-retest
(2) Alternate (Parallel) Forms
(3) Internal consistency
(4) Interrater/Raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Test-Retest method is an example of ____ sampling

A

time
-> Higher when construct being measured is expected to be STABLE than when construct expected to CHANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Alternate (Parallel) Forms method is an example of ____ sampling

A

item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

How High Should INTERNAL CONSISTENCY Coefficients Be? (*confond pas avec d’autres coeff)

A

Higher for “narrow” constructs
Lower for “broader constructs
-> Very high may indicate insufficient sampling in the domain
E.g. Medium internal consistency is bad for a narrow construct (panic disorder), but not so bad for a broad construct (Neuroticism)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What’s the older approach used to estimate the internal consistency of a test?

A

Split-half method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What’s the contemporary approach used to estimate the internal consistency of a test?

A

CRONBACH’S ALPHA = AVERAGE OF ALL POSSIBLE SPLIT-HALF RELIABILITIES
Unaffected by how items are arranged in the test
-> Most general method of finding estimates of reliability through internal consistency.
(Kuder-Richardson also a possibility)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Kappa formula

A

Interrater Agreement
Proportion of the potential agreement following CORRECTION FOR CHANCE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Domain Sampling Model conceptualizes reliability as the ratio of the variance of the observed score on the _____ test and the variance of the _______.

A

shorter, long-run true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Test-Retest Method: Problems

A

CARRYOVER EFFECTS: Occurs when the first testing session influences scores from the second session.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

When there are carryover effects, the test-retest correlation usually ________ the true reliability.

A

OVERESTIMATES
-> This can happen because the participant REMEMBERS items or patterns from the first test, so their performance on the second test is less independent than it should be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What method provides one of the most rigorous assessments of reliability commonly in use?

A

Parallel Forms Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Problems with Split-Half method (2)

A

(1) The two halves may have different variances.
(2) The split-half method also requires that each half be scored separately, possibly creating additional work.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

KR20 Formula

A

Equivalent of alpha for dichotomous test (e.g. right/wrong)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Sources of measurement error: (3)

A

(1) Time sampling: The same test given at different points in time may produce different scores, even if given to the same test takers.
(2) Item sampling: The same construct or attribute may be assessed using a wide pool of items.
(3) When different observers record the same behavior: Different judges observing the same event may record different numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

How do we assess measurement error associated with item sampling?

A

Parallel forms, Internal consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What to Do about Low Reliability? (3)

A

(1) Increase the # of Items
(2) Throw out items that run down the reliability (by running a factor/discriminability analysis)
(3) Estimate what the true correlation would have been (CORRECTION FOR ATTENUATION)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Kappa stat range

A

0-1.
Kappa = 0 is considered poor -> means the agreement is basically by chance.
Kappa = 1 represents perfect, complete agreement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

When random error is HIGH on both tests, the correlation between the scores will be _____ compared to when the random error is ___.

A

lower; small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Difference Score def

A

Subtracting one test score from another
-> two different attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Why are difference score unreliable?

A

Difference scores are unreliable because the random error from both scores is compounded and the true score is cancelled out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What do we mean when we say that “Validity is NOT a yes/no decision”

A
  • It comes in degrees and applies to a particular USE and a particular POPULATION
  • It is a process: An ongoing, dynamic effort to accumulate evidence for a sound scientific basis for proposed test score interpretations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

3 Types of Validity

A

Content, Criterion, Construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Subtypes of Criterion validity

A

Concurrent, Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Subtypes of Construct validity

A

Convergent, Divergent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

A test with high face validity may: (3)

A

(1) Induce cooperation and positive motivation before and during test administration
(2) Reduce dissatisfaction and feelings of injustice among low scorers
(3) Convince policymakers, employers, and administrators to implement the test
-> but sometimes a test with low face validity elicit more honest responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Types of criterion

A

Objective & Subjective criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Objective criterion

A

Observable and Measurable
E.g., Number of accidents, days of absence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Subjective criterion

A

Based on a person’s judgement
E.g., Supervisor ratings, peer ratings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

What happens if the criterion measures FEWER dimensions than those measured by the test?

A

This decreases the evidence of validity based on its content because it has underrepresented some important characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Criterion contamination def

A

If the criterion measures MORE dimensions than those measured by the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Validity coefficient def

A

Relationship between a test and a criterion.
Correlation between test and criterion
-> Tells the extent to which the test is valid for making statements about the criterion.

81
Q

Validity coefficient: range

A

Correlation. So between -1 and 1

82
Q

Validity coefficients are rarely greater than ____

A

r=.60

-> If higher than that → alternative test

83
Q

Factors Limiting Validity Coefficients (3)

A

(1) Range of Scores (diminishes the Test score & criterion score correlation)
(2) Unreliability of Test Scores
(3) Unreliability in Criterion

84
Q

How we deal with a test that’s not reliable (validity wise)

A

Correction for attenuation - validity coefficient if we had perfect realibility of test scores

85
Q

How we deal with a test that’s not reliable AND a criterion that’s not reliable (validity wise)

A

Correction for attenuation - Correcting for unreliability in test (predictor) & criterion

86
Q

How to gather Evidence of Construct Validity (2)

A

(1) Gathering Theoretical evidence
(2) Gather Psychometric evidence

87
Q

Explain how we gather THEORETICAL evidence of construct validity (2)

A

(1) Establish nomological network - identifying all possible relationships
(2) Based on this theoretical work → Propose experimental hypotheses
-> If what we think is true, what would be the evidence to support this relationship

88
Q

Nomological Network consists of (3)

A

(1) Constructs (e.g. job satisfaction)
(2) Their observable manifestations (e.g. smiles, productivity, positive feedback)
(3) The relations within and between constructs and their observable manifestations (e.g. positive feedback related to productivity)

89
Q

Explain how we gather PSYCHOMETRIC evidence (6)

A

(1) Content validity
(2) Criterion validity
(3) Reliability of the test
(4) Experimental interventions
(5) Convergent evidence of validity
(6) Discriminant evidence of validity

90
Q

Evidence of validity based on content (2)

A

(1) No construct underrepresentation: Does the test sample adequately from the construct domain?
(2) No irrelevant construct representation: Does the test properly exclude content that is unrelated to the construct?

91
Q

Evidence of validity based on reliability of the test

A

E.g. test-retest/internal consistency not too low or too high given the construct

92
Q

Gatheting psychometric evidence: Convergent Validity (2)

A

Extent to which two measures that are supposed to be related are actually correlated
When a test scores correlate with:
(1) Other measures of the SAME construct, or
(2) Measures of constructs to which the test should be related based on theory (think nomologic net)

93
Q

Problems with Content validity (3)

A

(1) Educational setting: content validity has been of greatest concern in educational testing (score on this test represent comprehension of subject) BUT many factors can limit performance on test
(2) Unclear boundaries: hard to separate types of validities
-> It’s often hard to separate “content coverage” (content validity) from whether the test actually measures the underlying concept (construct validity), leading to blurred boundaries.
(3) Doesn’t consider the relationship of contruct w external variables/constructs

94
Q

Construct-irrelevant variance

A

CONTENT validity. Occurs when scores are influenced by factors irrelevant to the construct.

95
Q

Several issues of concern when interpreting validity coefficients (9)

A

(1) All validity coefficient don’t have the same meaning
(2) The conditions of a validity study are never exactly reproduced. E.g. If you take the GRE to gain admission to graduate school, the conditions under which you take the test may not be exactly the same as those in the studies that established the validity of the GRE.
(3) Criterion-related validity studies mean nothing UNLESS the criterion is valid and reliable.
(4) Validity study might have been done on a population that does not represent the group to which inferences will be made.
(5) Be sure the sample size was adequate
(6) Never Confuse the Criterion with the Predictor (GRE & success in grad school example)
(7) Check for Restricted Range on Both Predictor and Criterion: Correlation requires that there be variability in both the predictor and the criterion.
(8) Review Evidence for Validity Generalization (may not be generalized to other similar situations)
(9) Consider Differential Prediction: Predictive relationships may not be the same for all demographic groups.

96
Q

Differential Prediction

A

Predictive relationships may NOT be the same for all demographic groups.
-> The validity for men could differ in some circumstances from the validity for women.
-> Under these circumstances, separate validity studies for different groups may be necessary.

97
Q

MTMM acronym

A

Multitrait-Multimethod Matrix

98
Q

What does Method variance represent?

A

SYSTEMATIC error
Characteristics of method that will influence how responders will respond to questions important for our attributes

99
Q

Test score variance is composed of ____ (3)

A

True score variance + Method variance + Random error

100
Q

TYPES OF VARIANCE IN MTMM

A

TRAIT, METHOD, IRRELEVANT

101
Q

If everything is good, we’re looking for: (3) variance

A

(1) High “trait variance”
(2) Low “method variance”
(3) Low “irrelevant variance”

102
Q

Irrelevant variance def

A

Variance shared with theoretically unrelated measures

103
Q

MTMM: Regions (6)

A

(1) Monomethod block
(2) Monotrait-monomethod values
(3) Heterotrait-monomethod triangle
(4) Heteromethod block
(5) Monotrait-heteromethod values
(6) 2 Heterotrait-heteromethod triangles

104
Q

What is the Reliability diagonal?

A

MONOnotrait-MONOmethod values (in monomethod block).
Tell how reliably each construct (A, B, C) can be measured with each method.

105
Q

What is the Validity diagonal?

A

Monotrait-heteromethod values
Tell how well a construct is measured using different methods.
-> CONVERGENT validity coefficients

106
Q

What represents the discriminant validity in the MTMM matrix?

A

Heterotrait-heteromethod & Heterotrait-monomethod

107
Q

IRT’s desirable objectives (2)

A

(1) Administer SHORTER measures
(2) Compare scores across: DIFF measures of the SAME constructs in DISTINCT groups

108
Q

Limitations of CTT (3)

A

(1) Adding/deleting items changes true score (because the true score is TEST-DEPENDENT, so comparison not possible across diff test forms)
(2) True score is interpretable ONLY in reference to NORM sample’s distribution of scores: SAMPLE-DEPENDENT
(3) Reliability of true score is function of the items used:** All items of EQUALLY reliable, measure SAME RANGE of scores, reliability CONSTANT across scores**

109
Q

Item Response Theory (IRT) Assumptions (4)

A

(1) True score defined on the LATENT trait dimension rather than observed score
(2) Knowing **PROPERTIES OF ITEM **a person endorses tell us the TRAIT LEVEL the person possesses
(3) Properties of an item do NOT change if we were to administer the item using different samples
(4) True score of the person does NOT change regardless of which sets of items we administer.

110
Q

Variables in Item Response Function

A

Y = Probability of item endorsement (“yes”) = HOW MUCH TRAIT LEVEL YOU POSSESS => limited by 0 and 1 (proba)
X = Theta (latent trait) - e.g. entire range of math level
Theta is a CONTINUUM (from -infinity to +infinity)

111
Q

Theta def + values

A

Entire range of latent trait.
=> CONTINUUM (from -infinity to +infinity)
=> Negative values = LOW levels
=> Positive values = HIGH levels

112
Q

Whare are item characteristics/parameters?

A

Item DIFFICULTY & Item DISCRIMINATION

113
Q

ICC: In the middle of the curve, ____ changes in theta correspond with ___ changes in probability

A

small; large

114
Q

Item Difficulty def

A

b
The point in theta (X axis) where probability of endorsing an item is 50%.
=> To find it, start by checking 0.5 in the Y axis
=> Then you find what’s the level of theta (X) that correspond to item difficulty

115
Q

Item difficulty typically range between ______

A

– 2 and + 2
(-/+ 2 = Arbitrary z-score)

116
Q

Item difficulty:
=> NEGATIVE difficulties = _____
=> POSITIVE difficulties = ______

A

Items are “EASIER”, more frequently endorsed (doesn’t take much of the trait level to endorse);
Items are more “DIFFICULT”, less frequently endorsed

117
Q

Item difficulty: What does it mean if Theta > b

A

Items more likely to be endorsed
=> When theta level is HIGHER than difficulty of the item

118
Q

Item difficulty: What does it mean if Theta < b

A

Items less likely to be endorsed
=> When level of underlying trait LOWER than item difficulty

119
Q

Theta = b

A

= 50%; item difficulty

120
Q

Item Discrimination

A

a
Value of the slope at the STEEPEST point of the curve, i.e.,b= 50%;
-> Point in the curve where the increases in Y are the highest.
To find it: find theta for difficulty -> this is the point where beta is the most elevated
=> The steeper the line, the closer it is to VERTICAL.

121
Q

Item Discrimination tells us ________

A

at which levels of data the item is most likely to differentiate best
=> Discriminates levels of theta

122
Q

Discrimination typically ranges between _____

A

.5 and 1.5

123
Q

Items would be most effective in measuring underlying trait at the level that correspond with _______.

A

item difficulty
→ Hard questions are more effective at measuring high levels of the trait.

124
Q

Item difficulty → Location on the latent trait where information is _____
Item discrimination → _____ an item provides

A

MAXIMIZED; HOW MUCH INFO

125
Q

When talking about Test Information Curve (TIC), we’re talking about Validity or Reliability? Why?

A

We’re talking about RELIABILITY (NOT VALIDITY)
Bc it focuses on how precisely a test measures the latent trait ACROSS DIFF LEVELS OF THAT TRAIT.
=> *THE HIGHER THE CURVE, THE BETTER YOUR ASSESSMENT OF THE TRAIT (mountain)

126
Q

In IRT, SEM is different for different latent trait values; how is that different from CTT?

A

CTT: 1 score of reliability for entire set of items
IRT: 1 item = 1 reliability coefficient; Measurement error is NOT equal across the entire range of data

127
Q

How does IRT Help us Improve Psychological Tests? (4)

A

(1) IDENTIFY item characteristics (i.e., difficulty, discrimination)
(2) CHOOSE items with higher discrimination covering the entire range of the latent continuum
(3) INCREASE RELIABILITY with fewer items
(3) COMPARE itemps across DIFF MEASURES of SAME CONSTRUCT + Compare group differences

128
Q

Differential Item Functioning (DIF) examines ______

A

Whether scales and items function differently across different discrete groups.
-> Occurs when groups (such as defined by gender, ethnicity, age, or education) have different probabilities of endorsing a given item (controlling for overall score)

129
Q

Differential Item Functioning (DIF) occurs when _________________

A

individuals from diff groups who have EQUAL levels of the UNDERLYING TRAIT, have diff probabilities of endorsing or agreeing with an item.

130
Q

DIF analysis helps determine if items are ____ by _____________.

A

fair; examining group differences in responses while controlling for the trait level

131
Q

Questionnaire characteristics (3)

A

(1) Written series of questions
(2) Structured stimuli (i.e. questions)
(3) Structured responses (i.e. response format)

132
Q

Questionnaires Advantages (4)

A

(1) Presentation of stimuli is well controlled
(2) Scoring highly reliable
(3) Efficient to administer to large numbers
(4) Inexpensive

133
Q

Dichotomous formats are often seen in ______ tests

A

Personality, e.g. MMPI

134
Q

Compared to other formats, the dichotomous format is ________

A

less reliable

135
Q

Polytomous Format examples (4)

A

(1) Likert Format (e.g. agree/disagree)
(2) Category Format (e.g., rating pain on a scale of 1–10); Visual Analogue Scale, Rating scale
(3) Checklists (select multiple items from a list that apply to them)
(4) Q-Sorts (rank or categorize items into multiple predefined groups)

136
Q

What’s the problem with rating scales formats?

A

Number of points? How many options?
More options = more variability
At what point is it too much?
Middle point? Acknowledges that pple might NOT have an opinion, but can be an easy way out
Often 10, between 4 and 7 it’s good.

137
Q

Forced Choice Formats

A

Person is presented with 2 to 4 stimuli and asked to choose among them.

138
Q

Q-sort Formats

A

Forced distribution of items into categories
E.g. E.g. Give person list of 100 characteristics. Group the characteristics according to how like the person the characteristics are:

139
Q

Four Steps in the Question-Answer Process

A

(1) Comprehension: Attending to questions and instructions
(2) Retrieval: Retrieval of relevant information
(3) Judgment: Integration of retrieved information
(4) Response: Mapping the judgment on the response category

140
Q

Issues for Questionnaires (3)

A

RESPONSE SET: Tendency for people to respond to questions in a way that paints a certain picture of themselves instead of providing honest answers
(1) Acquiescence = Tendency to agree, say true, say often.
(2) Social desirability = Tendency to present self in a socially favorable manner
(3) Random responding = Ignoring or paying insufficient attention to item content

141
Q

How can we combat Acquiescence bias?

A

Use reverse-score items

142
Q

What are the 2 components of Social desirability?

A

Impression management & Self deception

143
Q

How can we combat social desirability? (3)

A

(1) Measure influence: assess discriminant validity
(2) Marlowe-Crowne social desirability scale
(3) Change response format (forced choice; Q-sort)

144
Q

How can we detect random responding? (4)

A

(1) INSTRUCTED response items: Ask for a specific answer (e.g. “choose strongly disagree”)
(2) BOGUS items: Ask about impossible or improbable scenarios (e.g. “I was born before 1920”)
(3) SELF-REPORT items: Ask participants about their care and engagement DURING the survey
(4) RESPONSE TIME: Computed after data collection but must be considered before starting

145
Q

Writing Good items (12)

A
  • Single idea per item stem
  • Write each item in a clear and direct manner
  • Avoid long items
  • Avoid double negatives
  • Reading level appropriate for intended test-takers
  • Avoid slang or colloquial language
  • Make all items independent
  • Ask someone else to review items to reduce ambiguity and inaccuracies
  • Make all responses similar in length and detail
  • Make sure the item has only one best answer
  • Avoid words such as “always” and “never”
  • Avoid overlapping responses
146
Q

Correspondence Strategy was dominant in the _______

A

early 20th century

147
Q

Assumptions in Correspondence Strategy (4)

A

(1) Each item corresponds to a specific construct
(2) Item has COMMON MEANING for all test-takers
(3) A test-taker is able to accurately assess the requested information
(4) A test-taker will honestly report requested information

148
Q

Empirical Strategy def

A

Items selected on basis of relations to external criteria. (e.g., contrasted groups)

149
Q

In the empirical strategy, the meaning of an item is NOT equal to _________. It is determined by ______

A

the verbal content of the item; groups who endorse the item.
=> Interpretation of scores is by ‘cookbook“: EMPIRICALLY KNOWN CORRELATES of high and low scores

150
Q

Concerns in Empirical strategy (3)

A

(1) Unintended group differences
(2) Problem of generalization
(3) Item overlap

151
Q

Construct (Theoretical) Strategy originated in the ________

A

1950s; prominent from the 1960s to the present.

152
Q

Assumptions in Construct strategy (3)

A

(1) A person possesses some degree of a construct (e.g., sociability)
(2) Nontest behaviours can be identified which are referents (indicator) for the construct
(3) Test responses are referents (indicator) for the construct.

153
Q

How does the construct strategy evaluates the adequacy of a test?

A

Evaluates adequacy of test by how well test fits in with theoretical (nomologic) net for the construct

154
Q

7 Typical Steps in Theoretical Scale Construction Approach

A

(1) Define construct: Consider the literature for definition + theoretical relations with other constructs
(2) Gather/Write items
(3) Evaluate content validity: expert judgment to know if items are relevant
(4) Pre-testing of items: Administer initial pool to a small sample and conduct cognitive interviews
(5) Item reduction: Consider endorsement rate for items.
(6) Factor analysis: Determine the optimal number of factors underlying item response patterns
(7) Scale evaluation: test of dimensionality, reliability and validity

155
Q

Content of personality: (3)

A

Behavior, Affect, Cognition

156
Q

Lexical tradition of personality

A

Most important traits represented by single words. Origin of NEO-PR relies on lexical tradition.

157
Q

Lexical hypothesis

A

If an idea is important for pple, they’ll have a word that will express this concept.
-> The more important the concept, the more word exist for this concept

158
Q

NEO: Construction (history)

A
  • 1978: Included only 3 factors - N, E, O (no scales for A and C) and 18 facets
  • 1985: A and C added: First NEO-PR
  • 1992 manual: Facet scales available for all factors + Included the short version (NEO-FFI) + Rational Scale Construction (supported by factor analysis)
159
Q

NEO-PI-R - Psychometric properties: Internal consistency (traits vs facets)

A

Traits: .86-.92
Facets: .56-.81 -> Cuz fewer items to measure each of the facets (normal range)

160
Q

NEO-PI-R - Psychometric properties: Test-retest reliability

A

High but bit weaker as time interval extends.
- 3 month → .75-.83
- 6-year N,E,O → .68-.83
- 3-year A,C → .63 & .79

161
Q

NEO-PI-R - Psychometric properties: Convergent Validity

A

Self-spouse agreement (2 forms of NEO-PR: Self and Other rated)
N,E,O,A,C → .60, .73, .65, .62, .63: Moderate to large convergent validity

162
Q

NEO-PI-R - Psychometric properties: Discriminant Validity

A

Sometimes scales are NOT independent (C & A)!

163
Q

Latest version of the test? Date & characteristics

A

NEO-PI-3. Published in 2005.
- 240 items, description of behaviors rated on 5-point scale (strongly disagree to strongly agree)
- Age range: **14-99 **(norms for adolescents)

164
Q

In NEO-PI, raw scores converted to ______

A

T-scores (M=50, SD=10)

165
Q

Application of NEO-PI (2)

A

(1) Mostly research on basic personality
(2) Limited usefulness in clinical or other applied settings

166
Q

NEO-PI issues (2)

A

(1) Aquiescence: Tentency to agree with statements
-> Half reverse keyed items
(2) Social desirability: Tendency to portray self in a socially desirable way - Construct validity problem

167
Q

What’s NEO-PI’s recommendation to combat acquiescence bias?

A

If more than 150 items are ‘agree or strongly agee’ profile must be interpreted with caution.

168
Q

Current MMPI version = ___

A

MMPI-3 (2020)

169
Q

General steps in Development of most clinical scales

A

Empirical approach
Choose 2 groups.
- Administer item pool to large group: psychiatric & normative
- Select a diagnostic group
- Compare endorsement for each item of selected group to normative group

170
Q

Scales: MMPI (10)

A

Scale 0: Social introversion
Scale 1: Hypochondriasis
Scale 2: Depression
Scale 3: Hysteria
Scale 4: Psychopathic Deviate
Scale 5: Masculinity-Femininity
Scale 6: Paranoia
Scale 7: Psychasthenia
Scale 8: Schizophrenia
Scale 9: Hypomania

171
Q

Validity scales in MMPI (6)

A

(1) L(Lie) Scale: Endorse too few items which express common frailties - SOCIAL DESIRABILITY BIAS (denying common human weaknesses)
(2) F Scale (Infrequency scale): Endorse items which few people endorse
(3) K Scale (Defensiveness Scale): Denial of more subtle, personal, or psychological difficulties that may be less obvious but still significant.
-> More about defensiveness—the person may be hiding or minimizing psychological problems or discomfort.
-> defensively hiding emotional or personal struggles.
(4) new-FB (Frequent back): infrequent responding in 2nd half of the test
(5) new-VRIN: Assesses random responding (if they don’t answer in similar way to similar questions, random responding)
(6) new-TRIN: Acquiescence bias (pairs of items with opposite content => should have different responses)

172
Q

Interpretation MMPI scores before vs after

A
  • Before: 70+ = may have clinical significance; Today: 65
  • Before: look at any elevation in any of the scale (1 most elevated scale); now: Interpret scores in multiple subscales
173
Q

MMPI-2 (4)

A

(1) Re-standardization: More appropriate normative sample
(2) Updated item content (567 items)
(3) Same clinical scales but 5 and 0 not psychopathology
(4) 3 new validity scales (FB, VRIN, TRIN)

174
Q

MMPI also have ________ (~60) based on _______

A

Content scales (measuring particular constructs); Rational test construction (Based on judgment of what items seem to be measuring)
=> E.g. anxiety, alcoholism scale, Obsessiveness, Family problems, Negative treatment indicators

175
Q

Sample methods of projective personality tests (3)

A
  • Perceptions of inkblots
  • Telling stories about pictures
  • Completing sentence stems
176
Q

Projective test Assumptions (3)

A
  • Responses to ambiguous stimuli are determined by personality characteristics
  • Reveal characteristics beneath the surface (bypass defences, unaffected by social desirability/context)
  • Provide broad coverage of personality characteristics
177
Q

Rorschach - Administration

A

1st phase = Free association phase
- Presents 10 carts one by one; what might this be?
2nd phase =** Inquiry phase**
- Examiner examines responses

178
Q

What are the 2 categories of scoring for the Rorschach test?

A
  • Informal: Interpretation of content, e.g. odd uses of words; thematic patterns - examiner searches for anything that stands out
  • Formal scoring: 5 dimensions
179
Q

What are the formal scoring dimensions of Rorschach test? (5)

A

(1) Location: part of the inkblot the individual focuses on when giving their response (e.g., a whole blot, a specific detail)
(2) Determinant: specific feature or characteristic of the inkblot that influenced the person’s response
(3) Form quality: how well the respondent’s perception matches the actual shape or structure of the inkblot (the more closely the response fits the inkblot’s form, the better the form quality).
(4) Content: What the person sees in the inkblot
(5) Frequency of occurrence

180
Q

(Formal testing of Rorschach): What’s in the “determinant” dimension? (4)

A
  • Form
  • Colour
  • Texture
  • Movement
181
Q

Rorschach Indices (6)
-> scores

A
  • Perceptual Thinking Index (disturbed thinking and perceptions)
  • Depression Index
  • Coping Deficit Index (interpersonal and/or emotional deficits)
  • **Suicide constellation **(risk)
  • Hypervigilance index
    -** Obsessive style index** (obsessive info processing)
182
Q

Rorschach reliability: Interrater reliability (Exner)

A

For determinants: 88-97%

183
Q

Rorschach reliability: Test-retest

A

Depends on studies:
- Meyer & Archer: 1 month= .50-.77
- Exner: 1y=.74-.91; 3y = .70-.87

184
Q

Rorschach Validity: Criterion validity (compared to other tests)

A

Meta analyses:
- Rorchach: .27-.30
- MMPI: .23-.28
- WAIS: .32-.36

185
Q

Some successful forms of Rorschach predicts (2)

A
  • Psychotherapy outcome
  • Differentiate psychotic and non-psychotic patients
186
Q

Biggest “cons” of Rorschach (5)

A
  • Serious problem of norms
  • Absence of a standardized method of administration
  • Limits on validity evidence
  • Time intensive
  • Does the test give useful information?
187
Q

Assumptions of TAT (2)

A

(1) Respondents interpret stimuli in accord with their personality and life experiences
(2) Respondents identify with the “hero” of the story

188
Q

TAT: Big Three motives/needs

A

(1) Achievement
(2) Power
(3) Affiliation

189
Q

TAT Scoring

A

(1) Informal interpretation (themes, patterns, sequences)
(2) Formal scoring using manual

190
Q

TAT Norms

A

No good normative sample
-> Implication: Cannot interpret individual’s score

191
Q

Why improvement in predicting job performance for structured vs unstructured interviews? (4)

A
  • Standardized questions
  • Analysis of job domain
  • Well-defined rating scales
  • Mechanical combination of ratings
192
Q

```

~~~

How can we assess Reliability of SCID (3)

A
  • Joint interviews: Participant is interviewed by one clinician, others observe and make independent ratings
  • Test-retest design: Interrater agreement (Kappa stat - 50-70): Participant interviewed at 2 diff times by 2 diff interviewers
    Note: Fair to good reliability for many disorders
193
Q

How should validity of the SCID be evaluated?

A

Best estimate diagnosis = “LEAD” standard:

194
Q

What Validity Evidence is there for the SCID? (3)

A
  • Content: Close correspondence between SCID questions and DSM criteria
  • Criterion: Meh - LEAD criterion may be possible but not comprehensively studied
  • Construct: Problems with discriminant validity; high co-occurrence of diagnoses
    Conclusion: Excellent content validity, but limited other forms of validity
195
Q

Major Characteristics of Interviewer Bias (4)

A
  1. A priori beliefs about occurrence
  2. Collection of confirmatory evidence ONLY
  3. Failure to test alternative hypotheses
  4. Ignore discrepant evidence
196
Q

Components of Suggestive Interviews (7)

A
  • Information introduced by the interviewer that has NOT been mentioned by the interviewee
  • Few open-ended questions
  • Leading/misleading questions
  • Repeating questions
  • Emotional tone of the interview
    • Selective reinforcement
    • Brives, threats, rewards
  • Aggrandizement of interviewer status
  • Visualization procedures/pretending (what could it be?)
197
Q

NICHD Protocol Kinds of questions - 3 steps

A
  • Main invitation(“Tell me everything that happened from the beginning to the end)
  • Follow-up invitations (“Tell me more about that.”; “Then what happened?”)
  • Follow up and cued invitations(“Earlier you mentioned ___. Tell me everything about that”)
198
Q

Stages of the NICHD protocol (9)

A
  • Introduction: Establish ground rules for truthfulness and control transfer.
  • Build rapport with the interviewee.
  • Conduct a practice interview for memory training.
    Transition to substantive phase:
  • Investigate incidents using open-ended prompts and separating incidents.
  • Ask focused questions about undisclosed information followed by open-ended prompts.
  • Discuss disclosure information (initial disclosures and who else knows).
  • Conclude the interview, inviting additional information or questions.
  • End with a** neutral topic** for closure.
199
Q

NICHD Protocol - Types of questions (2)

A
  1. Directive questions (“Wh” questions about previously mentioned details)
    • ”When did it happen?” or “What color was his car?”
  2. Option-posing questions (yes/no questions referencing new issues)
    • “Did he touch any part of your body when he was talking to you?”
      => Suggestive utterances are strongly discouraged: “At that time he was laying on top of you, wasn’t he?”