GIGA PRACTICE Flashcards

1
Q

2 main categories of tests

A

Ability tests vs Personality tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ability test def

A

Measure skills in terms of speed, accuracy, or both.
=> The faster or the more accurate your responses, the better your scores on a particular characteristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 types of ability tests?

A

Achievement, Aptitude and Intelligence tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Achievement test def

A

Measures previous learning.
- E.g. A test that measures or evaluates how many words you can spell correctly is called a spelling achievement test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aptitude test def

A

Measures potential for acquiring a specific skill.
- A spelling aptitude test measures how many words you might be able to spell given a certain amount of training, education, and experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Intelligence test def

A

Measures potential to solve problems, adapt to changing circumstances, and profit from experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of personality tests

A

Structured (objective) and Projective tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Structured personality tests def

A

Provides a self-report statement which require the subject to choose between two or more alternative responses such as “True” or “False”; “Yes” or “No”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reliability def

A

Degree to which test scores are FREE OF MEASUREMENT ERRORS.
-> There are many ways a test can be reliable (e.g., test results may be reliable over time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A psychological test must be (3)

A

(1) Objective: reflect reality - not what we want reality to be
(2) Reliable: provide us with the same reading anytime, use instrument under the same conditions
(3) Valid: measure what we want to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do Psychological Tests differ from Other Measurement Tools? (2)

A

(1) Focus on intangible, theoretical CONSTRUCTS (e.g. psychological attributes) unlike tools measuring physical properties (e.g. rules, scales).
(2) For most of them, you need to have some SPECIALIZED KNOWLEDGE for proper interpretation unlike physical measurements (e.g. ruler).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Construct def

A

Unobservable, theoretical abstract concept. Measured indirectly through behaviours, responses or test results
E.g. intelligence, anxiety, self-esteem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Defining Characteristics of Psychological Tests (5)

A

(1) Representative SAMPLE behaviors
(2) OBSERVABLE and MEASURABLE actions
(3) Thought to measure a PSYCHOLOGICAL ATTRIBUTE
(4) Behavioral samples obtained under STANDARDIZED conditions
(5) Have results for SCORING.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A construct is hypothesized to explain _________________________________

A

the covariation between observed behaviors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kinds of Purposes for Testing (4)

A

(1) Classification
(2) Promoting Self- Understanding and Self-Improvement
(3) Planning, Evaluation and Modification of Treatments and Programs
(4) Scientific Inquiry (Quantification, Hypothesis testing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Types of scales (4)

A

Nominal, Ordinal, Interval, Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of Norms (3)

A

(1) DEVELOPMENTAL Norms
(2) WITHIN-GROUP Norms
Norms without a Norm Sample
(3) CRITERION-REFERENCED Norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Developmental Norms def

A

Typical level of performance in each of the AGE group or grade levels that the test’s target population comprises.
-> Age-equivalent or grade-equivalent scores are assigned based on the MEDIAN RAW SCORE for that chronological age or grade level.
-> Median = TYPICAL score = norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Within-Group Norms (3)

A

(1) Percentiles
(2) Z-scores
(3) Transformed standard scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard Deviation def

A

A measure of the average distance of scores from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Transformed Standard Score formula

A

Bz + A
B = desired SD
A = desired Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Percentiles disadvantages (2)

A

(1) Magnifies differences near mean; minimizes differences at extremes
(2) Some common statistical analyses are NOT possible with percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard score disadvantages (2)

A

(1) Unfamiliar to many non-specialists
(2) Interpretation difficult when distribution not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Criterion-Referenced Norms def

A

Evaluate performance relative to an absolute criterion or standard rather than performance of other individuals.
-> An absolute vs relative evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Within-Group Norms: Criticisms (2)
(1) Only meaningful if the standardization (norm) sample is representative (2) Within-group comparisons encourage competition
26
Requirement for Criterion-Referenced Norms
Define content of domain narrowly and specifically. E.g. Driving skills, 8th grade math curriculum
27
Criterion-Referenced Norms: Issues (3)
(1) Can elements of performance be specifically defined? -> Hard to clearly define what “good” or “bad” performance looks like. -> Criterion-referenced norms require a clear standard (e.g., scoring 80% on a test to pass), but creating these standards can be challenging because it’s hard to decide what knowledge or skills are essential. (2) Focus on minimum standards -> e.g., “Did you pass?” -> Ignore how much better one person is compared to others. (3) Absence of relative knowledge -> You don’t know how someone performs compared to others.
28
Developmental norms cons
Often interpreted inappropriately -> Overgeneralization, misinterpreting median…
29
What is an elevated score?
2 z-scores
30
Properties of scales (3)
(1) Magnitude (2) Equal Intervals (3) Absolute 0
31
McCall’s T/T-score
Same as standard scores (Z scores), except that the M=50 and SD=10.
32
Interquartile range
Interval of scores bounded by the 25th and 75th percentiles. -> bounded by the range of scores that represents the middle 50% of the distribution.
33
Stanine system
Converts any set of scores into a transformed scale, which ranges from 1 to 9. M = 5, SD = 2
34
Overselection
Selecting a higher percentage from a particular group than would be expected on the basis of the representation of that group in the applicant pool.
35
Tracking
Developmental norms. Tendency to stay at about the same level relative to one’s peers.
36
Big Data
Revolution in social science research. = Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.
37
Pearson Correlation Coefficient def
QUANTITATIVE description of the DIRECTION and STRENGTH of a straight-line relationship between 2 variables.
38
Correlation Coefficient Range
-1 to 1
39
We cannot use Pearson's r for _____
Non-linear relationships -> Non-linear relationships cannot be described, regardless of their strength.
40
Classical Test Theory (CTT): Assumptions (4)
(1) Each person has a true score that would be obtained if there were no errors in measurement. Observed test score (X) = True test score (T) + Error (E) (2) Measurement errors are random (3) Measurement error is normally distributed (4) Variance of OBSERVED scores = Variance of true scores + Error variance
41
A person's true score def
The hypothetical or ideal measure of a person's attribute we aim to capture with a psychological test. => FREE FROM ERROR Expected score over an INFINITE number of independent administrations of the test
42
Mean error of measurement = ____ Errors are ____ with each other True scores and errors are _______
0; UNcorrelated; UNcorrelated
43
Two tests are parallel if: (3)
(1) EQUAL observed score MEANS -> Comes from the assumption that True scores would be the same (2) EQUAL ERROR VARIANCE (3) SAME CORRELATIONS with other tests
44
Random error characteristics (3)
(1) Random (2) Cancels itself out (3) Lowers reliability of the test
45
Systematic error characteristic
Occurs when source of error always increases or decreases a true score -> DOESN'T LOWER RELIABILITY of a test since the test is RELIABLY INACCURATE by the same amount each time
46
Sources of Measurement Error (3)
(1) CONTENT Sampling Error (2) TIME Sampling Error (3) Other Sources of Error (e.g. observer differences)
47
Reliability Coefficient def
Proportion of OBSERVED test scores accounted for by variability in TRUE scores.
48
Standard Error of Measurement (SEM) def
Amount of uncertainty/error expected in an individual's observed test score. **=> Corresponds to the SD of the distribution of scores one would obtain by repeatedly testing a person. **
49
Spearman-Brown formula def
Predicts the effect of lengthening or shortening a test on reliability.
50
Test reliability is usually estimated with what methods? (4)
(1) Test-retest (2) Alternate (Parallel) Forms (3) Internal consistency (4) Interrater/Raters
51
Test-Retest method is an example of ____ sampling
time -> Higher when construct being measured is expected to be STABLE than when construct expected to CHANGE
52
Alternate (Parallel) Forms method is an example of ____ sampling
item
53
How High Should INTERNAL CONSISTENCY Coefficients Be? (*confond pas avec d'autres coeff)
Higher for "narrow" constructs Lower for "broader constructs -> Very high may indicate insufficient sampling in the domain E.g. Medium internal consistency is bad for a narrow construct (panic disorder), but not so bad for a broad construct (Neuroticism)
54
What's the older approach used to estimate the internal consistency of a test?
Split-half method
55
What's the contemporary approach used to estimate the internal consistency of a test?
CRONBACH'S ALPHA = AVERAGE OF ALL POSSIBLE SPLIT-HALF RELIABILITIES Unaffected by how items are arranged in the test -> Most general method of finding estimates of reliability through internal consistency. (Kuder-Richardson also a possibility)
56
Kappa formula
Interrater Agreement Proportion of the potential agreement following **CORRECTION FOR CHANCE**.
57
Domain Sampling Model conceptualizes reliability as the ratio of the variance of the observed score on the _____ test and the variance of the _______.
shorter, long-run true score
58
Test-Retest Method: Problems
CARRYOVER EFFECTS: Occurs when the first testing session influences scores from the second session.
59
When there are carryover effects, the test-retest correlation usually ________ the true reliability.
OVERESTIMATES -> This can happen because the participant REMEMBERS items or patterns from the first test, so their performance on the second test is less independent than it should be.
60
What method provides one of the most rigorous assessments of reliability commonly in use?
Parallel Forms Method
61
Problems with Split-Half method (2)
(1) The two halves may have different variances. (2) The split-half method also requires that each half be scored separately, possibly creating additional work.
62
KR20 Formula
Equivalent of alpha for dichotomous test (e.g. right/wrong)
63
Sources of measurement error: (3)
(1) Time sampling: The same test given at different points in time may produce different scores, even if given to the same test takers. (2) Item sampling: The same construct or attribute may be assessed using a wide pool of items. (3) When different observers record the same behavior: Different judges observing the same event may record different numbers.
64
How do we assess measurement error associated with item sampling?
Parallel forms, Internal consistency
65
What to Do about Low Reliability? (3)
(1) Increase the # of Items (2) Throw out items that run down the reliability (by running a factor/discriminability analysis) (3) Estimate what the true correlation would have been (CORRECTION FOR ATTENUATION)
66
Kappa stat range
0-1. Kappa = 0 is considered poor -> means the agreement is basically by chance. Kappa = 1 represents perfect, complete agreement.
67
When random error is HIGH on both tests, the correlation between the scores will be _____ compared to when the random error is ___.
lower; small
68
Difference Score def
Subtracting one test score from another -> two different attributes
69
Why are difference score unreliable?
Difference scores are unreliable because the random error from both scores is compounded and the true score is cancelled out.
70
What do we mean when we say that "Validity is NOT a yes/no decision"
- It comes in degrees and applies to a particular USE and a particular POPULATION - It is a process: An ongoing, dynamic effort to accumulate evidence for a sound scientific basis for proposed test score interpretations
71
3 Types of Validity
Content, Criterion, Construct
72
Subtypes of Criterion validity
Concurrent, Predictive
73
Subtypes of Construct validity
Convergent, Divergent
74
A test with high face validity may: (3)
(1) Induce cooperation and positive motivation before and during test administration (2) Reduce dissatisfaction and feelings of injustice among low scorers (3) Convince policymakers, employers, and administrators to implement the test -> but sometimes a test with low face validity elicit more honest responses
75
Types of criterion
Objective & Subjective criterion
76
Objective criterion
Observable and Measurable E.g., Number of accidents, days of absence
77
Subjective criterion
Based on a person's judgement E.g., Supervisor ratings, peer ratings
78
What happens if the criterion measures FEWER dimensions than those measured by the test?
This decreases the evidence of validity based on its content because it has underrepresented some important characteristics
79
Criterion contamination def
If the criterion measures MORE dimensions than those measured by the test
80
Validity coefficient def
Relationship between a test and a **criterion**. **Correlation** between test and criterion -> Tells the extent to which the test is valid for making statements about the criterion.
81
Validity coefficient: range
Correlation. So between -1 and 1
82
Validity coefficients are rarely greater than ____
r=.60 | -> If higher than that → alternative test
83
Factors Limiting Validity Coefficients (3)
(1) Range of Scores (diminishes the Test score & criterion score correlation) (2) Unreliability of Test Scores (3) Unreliability in Criterion
84
How we deal with a test that's not reliable (validity wise)
**Correction for attenuation** - validity coefficient if we had **perfect realibility** of test scores
85
How we deal with a test that's not reliable AND a criterion that's not reliable (validity wise)
Correction for attenuation - Correcting for unreliability in **test** (**predictor**) & **criterion**
86
How to gather Evidence of Construct Validity (2)
(1) Gathering Theoretical evidence (2) Gather Psychometric evidence
87
Explain how we gather THEORETICAL evidence of construct validity (2)
(1) Establish nomological network - identifying all possible relationships (2) Based on this theoretical work → Propose experimental hypotheses -> If what we think is true, what would be the evidence to support this relationship
88
Nomological Network consists of (3)
(1) **Constructs** (e.g. job satisfaction) (2) Their **observable manifestations** (e.g. smiles, productivity, positive feedback) (3) The **relations** within and between **constructs** and their **observable manifestations** (e.g. positive feedback related to productivity)
89
Explain how we gather PSYCHOMETRIC evidence (6)
(1) **Content** validity (2) **Criterion** validity (3) **Reliability** of the test (4) Experimental interventions (5) **Convergent** evidence of validity (6) **Discriminant** evidence of validity
90
Evidence of validity based on content (2)
(1) No **construct underrepresentation**: Does the test sample adequately from the construct domain? (2) No **irrelevant construct representation**: Does the test properly exclude content that is unrelated to the construct?
91
Evidence of validity based on reliability of the test
E.g. test-retest/internal consistency not too low or too high given the construct
92
Gatheting psychometric evidence: Convergent Validity (2)
Extent to which two measures that are supposed to be related are actually correlated When a test scores correlate with: (1) **Other measures of the SAME construct**, or (2) Measures of **constructs to which the test should be related based on theory** (think nomologic net)
93
Problems with Content validity (3)
(1) Educational setting: content validity has been of greatest concern in educational testing (score on this test represent comprehension of subject) BUT **many factors can limit performance on test** (2) Unclear boundaries: **hard to separate types of validities** -> It’s often hard to separate "content coverage" (content validity) from whether the test actually measures the underlying concept (construct validity), leading to blurred boundaries. (3) Doesn't consider the relationship of contruct w **external** variables/constructs
94
Construct-irrelevant variance
CONTENT validity. Occurs when scores are **influenced by factors irrelevant to the construct**.
95
Several issues of concern when interpreting validity coefficients (9)
(1) **All validity coefficient don't have the same meaning** (2) The **conditions** of a validity study are **never exactly reproduced**. E.g. If you take the GRE to gain admission to graduate school, the conditions under which you take the test may not be exactly the same as those in the studies that established the validity of the GRE. (3) Criterion-related validity studies mean nothing UNLESS the **criterion** is **valid** and reliable. (4) Validity study might have been done on a **population** that **does not represent the group to which inferences will be made**. (5) Be sure the **sample size** was adequate (6) Never Confuse the **Criterion** with the **Predictor** (GRE & success in grad school example) (7) Check for **Restricted Range** on Both Predictor and Criterion: Correlation requires that there be variability in **both** the predictor and the criterion. (8) Review Evidence for **Validity Generalization** (may not be generalized to other similar **situations**) (9) Consider **Differential Prediction**: Predictive relationships may not be the same for all demographic groups.
96
Differential Prediction
Predictive relationships may NOT be the same for all demographic groups. -> The validity for men could differ in some circumstances from the validity for women. -> Under these circumstances, separate validity studies for different groups may be necessary.
97
MTMM acronym
Multitrait-Multimethod Matrix
98
What does Method variance represent?
**SYSTEMATIC error** Characteristics of method that will influence how responders will respond to questions important for our attributes
99
Test score variance is composed of ____ (3)
True score variance + Method variance + Random error
100
TYPES OF VARIANCE IN MTMM
TRAIT, METHOD, IRRELEVANT
101
If everything is good, we're looking for: (3) variance
(1) High "trait variance" (2) Low "method variance" (3) Low "irrelevant variance"
102
Irrelevant variance def
Variance shared with theoretically unrelated measures
103
MTMM: Regions (6)
(1) Monomethod block (2) Monotrait-monomethod values (3) Heterotrait-monomethod triangle (4) Heteromethod block (5) Monotrait-heteromethod values (6) 2 Heterotrait-heteromethod triangles
104
What is the Reliability diagonal?
MONOnotrait-MONOmethod values (in monomethod block). Tell how reliably each construct (A, B, C) can be measured with each method.
105
What is the Validity diagonal?
**Monotrait-heteromethod** values Tell how well a construct is measured using different methods. -> **CONVERGENT** validity coefficients
106
What represents the discriminant validity in the MTMM matrix?
**Heterotrait**-heteromethod & **Heterotrait**-monomethod
107
IRT's desirable objectives (2)
(1) Administer SHORTER measures (2) Compare scores across: DIFF measures of the SAME constructs in DISTINCT groups
108
Limitations of CTT (3)
(1) **Adding/deleting items changes true score** (because the true score is **TEST-DEPENDENT,** so comparison not possible across diff test forms) (2) True score is interpretable ONLY in reference to **NORM** sample's distribution of scores: SAMPLE-DEPENDENT (3) Reliability of true score is function of the items used:** All items of EQUALLY reliable, measure SAME RANGE of scores, reliability CONSTANT across scores**
109
Item Response Theory (IRT) Assumptions (4)
(1) True score defined on the **LATENT trait dimension** rather than observed score (2) Knowing **PROPERTIES OF ITEM **a person endorses tell us the **TRAIT LEVEL** the person possesses (3) Properties of an item do **NOT** change if we were to administer the item using different samples (4) True score of the person does **NOT** change regardless of which sets of items we administer.
110
Variables in Item Response Function
Y = Probability of item endorsement ("yes") = HOW MUCH TRAIT LEVEL YOU POSSESS => **limited by 0 and 1** (proba) X = Theta (latent trait) - e.g. entire range of math level Theta is a CONTINUUM (from -infinity to +infinity)
111
Theta def + values
Entire range of latent trait. => CONTINUUM (from -infinity to +infinity) => Negative values = LOW levels => Positive values = HIGH levels
112
Whare are item characteristics/parameters?
Item DIFFICULTY & Item DISCRIMINATION
113
ICC: In the middle of the curve, ____ changes in theta correspond with ___ changes in probability
small; large
114
Item Difficulty def
**b** The point in theta (X axis) where probability of endorsing an item is 50%. => To find it, start by checking 0.5 in the Y axis => Then you find what's the level of theta (X) that correspond to item difficulty
115
Item difficulty typically range between ______
– 2 and + 2 (-/+ 2 = Arbitrary z-score)
116
Item difficulty: => NEGATIVE difficulties = _____ => POSITIVE difficulties = ______
Items are “EASIER”, more frequently endorsed (doesn't take much of the trait level to endorse); Items are more “DIFFICULT”, less frequently endorsed
117
Item difficulty: What does it mean if Theta > b
Items more likely to be endorsed => When theta level is HIGHER than difficulty of the item
118
Item difficulty: What does it mean if Theta < b
Items less likely to be endorsed => When level of underlying trait LOWER than item difficulty
119
Theta = b
= 50%; item difficulty
120
Item Discrimination
**a** Value of the slope at the STEEPEST point of the curve, i.e., b = 50%; -> Point in the curve where the increases in Y are the highest. To find it: find theta for difficulty -> this is the point where beta is the most elevated => The steeper the line, the closer it is to VERTICAL.
121
Item Discrimination tells us ________
at which levels of data the item is most likely to differentiate best => Discriminates levels of theta
122
Discrimination typically ranges between _____
.5 and 1.5
123
Items would be most effective in measuring underlying trait at the level that correspond with _______.
item difficulty → Hard questions are more effective at measuring high levels of the trait.
124
Item difficulty → Location on the latent trait where information is _____ Item discrimination → _____ an item provides
MAXIMIZED; HOW MUCH INFO
125
When talking about Test Information Curve (TIC), we're talking about Validity or Reliability? Why?
We're talking about RELIABILITY (NOT VALIDITY) Bc it focuses on how precisely a test measures the latent trait ACROSS DIFF LEVELS OF THAT TRAIT. => *THE HIGHER THE CURVE, THE BETTER YOUR ASSESSMENT OF THE TRAIT (mountain)
126
In IRT, SEM is different for different latent trait values; how is that different from CTT?
CTT: 1 score of reliability for entire set of items IRT: 1 item = 1 reliability coefficient; Measurement error is NOT equal across the entire range of data
127
How does IRT Help us Improve Psychological Tests? (4)
(1) IDENTIFY item characteristics (i.e., difficulty, discrimination) (2) CHOOSE items with higher discrimination covering the entire range of the latent continuum (3) INCREASE RELIABILITY with fewer items (3) COMPARE itemps across DIFF MEASURES of SAME CONSTRUCT + Compare group differences
128
Differential Item Functioning (DIF) examines ______
Whether scales and items function differently across different discrete groups. -> Occurs when groups (such as defined by gender, ethnicity, age, or education) have different probabilities of endorsing a given item (controlling for overall score)
129
Differential Item Functioning (DIF) occurs when _________________
individuals from diff groups who have EQUAL levels of the UNDERLYING TRAIT, have diff probabilities of endorsing or agreeing with an item.
130
DIF analysis helps determine if items are ____ by _____________.
fair; examining group differences in responses while controlling for the trait level
131
Questionnaire characteristics (3)
(1) Written series of questions (2) Structured stimuli (i.e. questions) (3) Structured responses (i.e. response format)
132
Questionnaires Advantages (4)
(1) Presentation of stimuli is well controlled (2) **Scoring highly reliable** (3) Efficient to administer to large numbers (4) Inexpensive
133
Dichotomous formats are often seen in ______ tests
Personality, e.g. MMPI
134
Compared to other formats, the dichotomous format is ________
less reliable
135
Polytomous Format examples (4)
(1) Likert Format (e.g. agree/disagree) (2) Category Format (e.g., rating pain on a scale of 1–10); Visual Analogue Scale, Rating scale (3) Checklists (select multiple items from a list that apply to them) (4) Q-Sorts (rank or categorize items into multiple predefined groups)
136
What's the problem with rating scales formats?
Number of points? How many options? More options = more variability At what point is it too much? Middle point? Acknowledges that pple might NOT have an opinion, but can be an easy way out Often 10, between 4 and 7 it's good.
137
Forced Choice Formats
Person is presented with 2 to 4 stimuli and asked to choose among them.
138
Q-sort Formats
Forced distribution of items into categories E.g. E.g. Give person list of 100 characteristics. Group the characteristics according to how like the person the characteristics are:
139
Four Steps in the Question-Answer Process
(1) Comprehension: Attending to questions and instructions (2) Retrieval: Retrieval of relevant information (3) Judgment: Integration of retrieved information (4) Response: Mapping the judgment on the response category
140
Issues for Questionnaires (3)
RESPONSE SET: Tendency for people to respond to questions in a way that paints a certain picture of themselves instead of providing honest answers (1) Acquiescence = Tendency to agree, say true, say often. (2) Social desirability = Tendency to present self in a socially favorable manner (3) Random responding = Ignoring or paying insufficient attention to item content
141
How can we combat Acquiescence bias?
Use reverse-score items
142
What are the 2 components of Social desirability?
Impression management & Self deception
143
How can we combat social desirability? (3)
(1) Measure influence: assess discriminant validity (2) Marlowe-Crowne social desirability scale (3) Change response format (forced choice; Q-sort)
144
How can we detect random responding? (4)
(1) **INSTRUCTED** response items: Ask for a specific answer (e.g. “choose strongly disagree") (2) **BOGUS** items: Ask about impossible or improbable scenarios (e.g. “I was born before 1920”) (3) **SELF-REPORT** items: Ask participants about their care and engagement DURING the survey (4) **RESPONSE TIME**: Computed after data collection but must be considered before starting
145
Writing Good items (12)
- Single idea per item stem - Write each item in a clear and direct manner - Avoid long items - Avoid double negatives - Reading level appropriate for intended test-takers - Avoid slang or colloquial language - Make all items independent - Ask someone else to review items to reduce ambiguity and inaccuracies - Make all responses similar in length and detail - Make sure the item has only one best answer - Avoid words such as “always” and “never” - Avoid overlapping responses
146
Correspondence Strategy was dominant in the _______
early 20th century
147
Assumptions in Correspondence Strategy (4)
(1) Each item corresponds to a specific construct (2) Item has COMMON MEANING for all test-takers (3) A test-taker is able to accurately assess the requested information (4) A test-taker will honestly report requested information
148
Empirical Strategy def
Items selected on basis of relations to external criteria. (e.g., contrasted groups)
149
In the empirical strategy, the meaning of an item is NOT equal to _________. It is determined by ______
the verbal content of the item; groups who endorse the item. => Interpretation of scores is by 'cookbook“: **EMPIRICALLY KNOWN CORRELATES of high and low scores**
150
Concerns in Empirical strategy (3)
(1) Unintended group differences (2) Problem of generalization (3) Item overlap
151
Construct (Theoretical) Strategy originated in the ________
1950s; prominent from the 1960s to the present.
152
Assumptions in Construct strategy (3)
(1) A person possesses some degree of a construct (e.g., sociability) (2) Nontest behaviours can be identified which are referents (indicator) for the construct (3) Test responses are referents (indicator) for the construct.
153
How does the construct strategy evaluates the adequacy of a test?
Evaluates adequacy of test by how well test fits in with theoretical (nomologic) net for the construct
154
7 Typical Steps in Theoretical Scale Construction Approach
(1) Define construct: Consider the literature for definition + theoretical relations with other constructs (2) Gather/Write items (3) Evaluate content validity: expert judgment to know if items are relevant (4) Pre-testing of items: Administer initial pool to a small sample and conduct cognitive interviews (5) Item reduction: Consider endorsement rate for items. (6) Factor analysis: Determine the optimal number of factors underlying item response patterns (7) Scale evaluation: test of dimensionality, reliability and validity
155
Content of personality: (3)
Behavior, Affect, Cognition
156
Lexical tradition of personality
Most important traits represented by single words. Origin of NEO-PR relies on lexical tradition.
157
Lexical hypothesis
If an idea is important for pple, they'll have a word that will express this concept. -> The more important the concept, the more word exist for this concept
158
NEO: Construction (history)
- 1978: Included only 3 factors - N, E, O (no scales for A and C) and 18 facets - 1985: A and C added: First NEO-PR - 1992 manual: Facet scales available for all factors + Included the short version (NEO-FFI) + Rational Scale Construction (supported by factor analysis)
159
NEO-PI-R - Psychometric properties: Internal consistency (traits vs facets)
Traits: .86-.92 Facets: .56-.81 -> Cuz fewer items to measure each of the facets (normal range)
160
NEO-PI-R - Psychometric properties: Test-retest reliability
High but bit weaker as time interval extends. - 3 month → .75-.83 - 6-year N,E,O → .68-.83 - 3-year A,C → .63 & .79
161
NEO-PI-R - Psychometric properties: Convergent Validity
Self-spouse agreement (2 forms of NEO-PR: Self and Other rated) N,E,O,A,C → .60, .73, .65, .62, .63: Moderate to large convergent validity
162
NEO-PI-R - Psychometric properties: Discriminant Validity
Sometimes scales are NOT independent (C & A)!
163
Latest version of the test? Date & characteristics
NEO-PI-**3**. Published in 2005. - 240 items, description of behaviors rated on **5-point scale** (strongly disagree to strongly agree) - Age range: **14-99 **(norms for adolescents)
164
In NEO-PI, raw scores converted to ______
T-scores (M=50, SD=10)
165
Application of NEO-PI (2)
(1) Mostly research on basic personality (2) Limited usefulness in clinical or other applied settings
166
NEO-PI issues (2)
(1) **Aquiescence**: Tentency to agree with statements -> Half reverse keyed items (2) **Social desirability**: Tendency to portray self in a socially desirable way - Construct validity problem
167
What's NEO-PI's recommendation to combat acquiescence bias?
If more than 150 items are 'agree or strongly agee' profile must be interpreted with caution.
168
Current MMPI version = ___
MMPI-3 (2020)
169
General steps in Development of most clinical scales
Empirical approach Choose 2 groups. - Administer item pool to large group: psychiatric & normative - Select a diagnostic group - Compare endorsement for each item of selected group to normative group
170
Scales: MMPI (10)
Scale 0: Social introversion Scale 1: Hypochondriasis Scale 2: Depression Scale 3: Hysteria Scale 4: Psychopathic Deviate Scale 5: Masculinity-Femininity Scale 6: Paranoia Scale 7: Psychasthenia Scale 8: Schizophrenia Scale 9: Hypomania
171
Validity scales in MMPI (6)
(1) L(Lie) Scale: Endorse too few items which express common frailties - SOCIAL DESIRABILITY BIAS (denying common human weaknesses) (2) F Scale (Infrequency scale): Endorse items which few people endorse (3) K Scale (Defensiveness Scale): Denial of more subtle, personal, or psychological difficulties that may be less obvious but still significant. -> More about defensiveness—the person may be hiding or minimizing psychological problems or discomfort. -> defensively hiding emotional or personal struggles. (4) new-FB (Frequent back): infrequent responding in 2nd half of the test (5) new-VRIN: Assesses random responding (if they don't answer in similar way to similar questions, random responding) (6) new-TRIN: Acquiescence bias (pairs of items with opposite content => should have different responses)
172
Interpretation MMPI scores before vs after
- Before: 70+ = may have clinical significance; Today: **65** - Before: look at any elevation in any of the scale (1 most elevated scale); now: Interpret scores in **multiple subscales**
173
MMPI-2 (4)
(1) Re-standardization: More appropriate normative sample (2) Updated item content (567 items) (3) Same clinical scales but **5 and 0 not psychopathology** (4) 3 new validity scales (FB, VRIN, TRIN)
174
MMPI also have ________ (~60) based on _______
Content scales (measuring particular constructs); Rational test construction (Based on judgment of what items seem to be measuring) => E.g. anxiety, alcoholism scale, Obsessiveness, Family problems, Negative treatment indicators
175
Sample methods of projective personality tests (3)
- Perceptions of inkblots - Telling stories about pictures - Completing sentence stems
176
Projective test Assumptions (3)
- Responses to ambiguous stimuli are determined by personality characteristics - Reveal characteristics beneath the surface (bypass defences, unaffected by social desirability/context) - Provide broad coverage of personality characteristics
177
Rorschach - Administration
1st phase = **Free association phase** - Presents 10 carts one by one; what might this be? 2nd phase =** Inquiry phase** - Examiner examines responses
178
What are the 2 categories of scoring for the Rorschach test?
- Informal: **Interpretation of content**, e.g. odd uses of words; thematic patterns - examiner searches for anything that stands out - Formal scoring: 5 dimensions
179
What are the formal scoring dimensions of Rorschach test? (5)
(1) **Location**: part of the inkblot the individual focuses on when giving their response (e.g., a whole blot, a specific detail) (2) **Determinant**: specific feature or characteristic of the inkblot that influenced the person's response (3) **Form quality**: how well the respondent’s perception matches the actual shape or structure of the inkblot (the more closely the response fits the inkblot's form, the better the form quality). (4) **Content**: What the person sees in the inkblot (5) **Frequency** of occurrence
180
(Formal testing of Rorschach): What's in the "determinant" dimension? (4)
- Form - Colour - Texture - Movement
181
Rorschach Indices (6) -> scores
- **Perceptual Thinking** Index (disturbed thinking and perceptions) - **Depression** Index - **Coping Deficit** Index (interpersonal and/or emotional deficits) - **Suicide constellation **(risk) - **Hypervigilance** index -** Obsessive style index** (obsessive info processing)
182
Rorschach reliability: Interrater reliability (Exner)
For determinants: 88-97%
183
Rorschach reliability: Test-retest
Depends on studies: - Meyer & Archer: 1 month= .50-.77 - Exner: 1y=.74-.91; 3y = .70-.87
184
Rorschach Validity: Criterion validity (compared to other tests)
Meta analyses: - Rorchach: .27-.30 - MMPI: .23-.28 - WAIS: .32-.36
185
Some successful forms of Rorschach predicts (2)
- Psychotherapy outcome - Differentiate psychotic and non-psychotic patients
186
Biggest "cons" of Rorschach (5)
- Serious problem of norms - Absence of a standardized method of administration - Limits on **validity** evidence - Time intensive - Does the test give useful information?
187
Assumptions of TAT (2)
(1) Respondents interpret stimuli in accord with their personality and life experiences (2) Respondents identify with the “hero” of the story
188
TAT: Big Three motives/needs
(1) Achievement (2) Power (3) Affiliation
189
TAT Scoring
(1) Informal interpretation (themes, patterns, sequences) (2) Formal scoring using manual
190
TAT Norms
No good normative sample -> Implication: Cannot interpret individual's score
191
Why improvement in predicting job performance for structured vs unstructured interviews? (4)
- Standardized questions - Analysis of job domain - Well-defined rating scales - Mechanical combination of ratings
192
# ``` ``` How can we assess Reliability of SCID (3)
- Joint interviews: Participant is interviewed by one clinician, others observe and make independent ratings - Test-retest design: Interrater agreement (Kappa stat - 50-70): Participant interviewed at 2 diff times by 2 diff interviewers Note: Fair to good reliability for many disorders
193
How should validity of the SCID be evaluated?
Best estimate diagnosis = “LEAD” standard:
194
What Validity Evidence is there for the SCID? (3)
- Content: Close correspondence between SCID questions and DSM criteria - Criterion: Meh - LEAD criterion may be possible but not comprehensively studied - Construct: Problems with **discriminant** validity; **high co-occurrence of diagnoses** Conclusion: Excellent content validity, but limited other forms of validity
195
Major Characteristics of Interviewer Bias (4)
1. A priori beliefs about occurrence 2. Collection of confirmatory evidence ONLY 3. Failure to test alternative hypotheses 4. Ignore discrepant evidence
196
Components of Suggestive Interviews (7)
- Information introduced by the interviewer that has NOT been mentioned by the interviewee - Few open-ended questions - Leading/misleading questions - Repeating questions - Emotional tone of the interview - Selective reinforcement - Brives, threats, rewards - Aggrandizement of interviewer status - Visualization procedures/pretending (what could it be?)
197
NICHD Protocol Kinds of questions - 3 steps
- Main invitation (“Tell me everything that happened from the beginning to the end) - Follow-up invitations (“Tell me more about that.”; “Then what happened?”) - Follow up and cued invitations (“Earlier you mentioned ___. Tell me everything about that”)
198
Stages of the NICHD protocol (9)
- Introduction: Establish ground rules for truthfulness and control transfer. - Build rapport with the interviewee. - Conduct a practice interview for memory training. Transition to substantive phase: - Investigate incidents using open-ended prompts and separating incidents. - Ask focused questions about undisclosed information followed by open-ended prompts. - Discuss disclosure information (initial disclosures and who else knows). - Conclude the interview, **inviting additional information** or questions. - End with a** neutral topic** for closure.
199
NICHD Protocol - Types of questions (2)
1. **Directive** questions (“**Wh**” questions about previously mentioned details) - ”When did it happen?” or “What color was his car?” 2. **Option-posing** questions (yes/no questions referencing new issues) - “Did he touch any part of your body when he was talking to you?” => Suggestive utterances are strongly discouraged: “At that time he was laying on top of you, wasn’t he?”