Psych Testing Flashcards
What is validity?
-
Test Standards Definition: “Validity refers to the degree to which evidence and theory support the interpretations of tests scores entailed by proposed uses of tests”
- Test standards = current framework/operational guidelines
- Validation is the joint responsibility of the test developer and the test user
- Test developer should present a rationale for recomended use and interpretation, accompanied by evidence and theory.
What are some past definitions of validity?
-
1954 Criterion-based view: A test is valid for anything it corrolates with. Validity was a static property.
- Problems: Some tests used for different purposes in different groups.
-
1966 Tripartite view: Criterion validity (concurrent and predictive), Content validity (relevant and representative of domain) and Construct validity (convergent/discriminant)
- Problems: Based on nomological network. If the test isn’t valid, is the theory or the test wrong? Overemphasis on different forms of validity (often indistinct) and correlations as proof.
- Updated in 1985 to included consequences of testing.
What is the current 5 source view of validity?
-
Unchanged from 1999: Unitary form of validity based on evidence from multiple sources to support an argument of what test scores mean.
- No diff types of validity, validity is a property of interpretation not test
- Evidence from 5 sources:
- Content: Relevance and representativeness of content
- Response Processes: if intended to measure a process, this should be provable ie not affected by manipulations
- Internal Structure: Factor analysis should match theory
- Relationship to other variables: Convergent/discriminant, test-criterion, and generalisation across situation (pop, conditions) and purpose (type of jobs)
- Consequences of testing: consider intended and unintended consequences of testing (eg naplan funding vs competition)
What are some of the major purposes of psychological testing?
- Classification: Selection (education and employment), Screening, Certification, Placement
- Diagnosis/Treatment planning: Clinical, Educational (giftedness/learning difficulties), neuropsychological deficits
- Coaching/training: Insight (self-knowledge), career-counselling, coaching
- Legal Application: Diminished responsibility, special dispensations, compensation claims
- Research
- Program Evaluation
What are some of the main tests used to measure intelligence and aptitude?
- Aptitude tests differ a bit from intelligence since they relate to trainable output
-
Individually administered tests: Primarily for children, diagnosis, emphasis on rapport
- Stanford Binet: Verbal and non-verbal factors 5 areas.
- Weschler scales: 3 tests for different ages, subscales and tests vary.
- Woodcock Johnson: achievement and intelligence test batteries.
-
Group administered tests:
- ASVAB: armed services aptitude
- GAMSAT: graduate australiam medical admissions test
- Ravens Progressive Matrices
What is Hollands vocational interest model?
- Holland’s vocational interest tests 6 domains
- Realistic: practical, hands-on, tool-oriented
- Investigative: analytical, intellectual, scientific, explorative
- Artistic: creative, independent, chaotic
- Social: cooperative, supporting, helping, healing
- Enterprising: competitive, leadership, persuading
- Conventional: detail-oriented, organising, clerical.
- These domains are arranged in a hexagon, in order of correlations between them (lowest correlations opposite)
What are some applications of psychological tests?
-
Neuropsychology: Checklists for frontal lobe dysfunction
- Luria-Nebraska Neuropsychological battery (attenton, language, memory, spatial, executive function), the mini-mental state exam (MMSE)
-
Health Psychology: McGill Pain questionnaire, Beck Depression inventory
- Alcholism: TWEAK (tolerance, worry, eye-opener, amnesia, cut-down)
- Forensic Assessment: malingering, assessment for insanity plea, child custody
How can psychometric tests be used for selection and training in the workplace?
-
Selection: Important to match selection criteria with job requirements. Steps:
- Job analysis: What tasks are required?
- Write job description: What qualities does the person need?
- Test candidate pool,
- Select best candidate
-
Score Feedback for training: Focus on profile not raw scores, focus on developmental planning
- Compensatory strategies: reshape problem, externalise
- Developmental activities: 1. Deliberate practice 2. training 3. Mentoring, 4. Goal setting, 5. Plan, monitor, evaluate.
What four factors affect reliability?
-
People taking the test: Reliability is based on variability people people: large SD = strong reliability
- Match person to test to avoid floor/ceiling effects
-
Test Characteristics: Bandwidth vs fidelity (more specific test = higher reliability)
- Don’t sacrifice content coverage for reliability
-
Item Characteristics: internal consistency affected by # of items and correlation between them.
- A reliable test either has many items with small rs, or few with strong rs.
- Method used to estimate reliability: test-retest vs internal consistency etc. Consider appropriateness of method (ie if construct expected to change over time).
What is reliability?
-
Reliability = the ratio of the true score variance to the observed score variance inclusive of error. Aim for
- .9 for high stakes, .7 for research, .6 if multiple measures
- Validity is dependent on reliability: the maximum correlation between 2 variables is determined by the error in the test.
How does reliability relate to test length?
-
Reliability increases as the number of items increases:
- Spearman Brown formula: predicted reliability is a function of test length and existing reliability (assuming equal reliability of items).
-
Balancing test length:
- Too long a test causes boredom, exhaustion, loss of motivation
- Problems with short tests: previous item exposure, inadequate domain sampling.
- Solution: Adaptive testing and Computerised adaptive testing
What is adaptive testing? What are the advantages and disadvantages?
-
Testing is adapted to the persons level of ability: previous responses determine next questions.
- Used in major batteries like stanford binet
-
Computerised adaptive testing (CAT): Computer algorithm used to select futher items according to a rule.
- Used in large scale testing where security is important (ASVAB, TOEFL)
-
CAT Advantages: Tests are shorter but just as reliable:
- economic advantage, fewer problems with motivation, easier to maintain test security
- CAT Disadvantages: Substantial preparation and outlay needed (v.large item pool, analysis of difficulty, algorithms), requires computers.
What are anchoring vignettes?
-
Problems with self-rating scales: there are significant variations in responding styles individually and culturally
- extreme vs conservative responders, tendency to ‘agree’ with statements
-
Anchoring vignettes: vignettes of hyperthetical people are given to be rated.
- The average rating is then subtracted from self-rating
-
Examples: significant cross-cultural discrepancies have been solved using anchoring vignettes such as:
- relationship between teacher helpfulness and achievement
- relationship between conscientiousness and life expectancy
What are situational judgement tests?
-
SJTs give situations and require the respondent to choose the best response option
- can be typical or maximal performance (would/should)
- Seen as more engaging (higher face-validity)
- Show lower adverse impacts than IQ tests
-
Development of SJTs: use subject matter experts
- Collect critical situations from SMEs, and summarise these into items
- Collect responses from everday and SME
- Score answers based on SME opinions
- test items and select the most reliable ones
What are the different motivations and situations that influence response distortion?
-
High stakes situations are prone to faking:
- Faking Good: employment selection, internet dating and educational selection. NEO-PI-R example “I strive for excellence”
-
Faking Bad: Legal (benefits/diminished resp), Education (special comp), military (discharge, special duties, conscription)
- Estimated faking in 30% personal injury cases, instructions on faking dropped to opposing military in WWII (both sides)
-
Types of faking: Conscious and unconscious biases
- Self-deceptive enhancement: Linked to Egoistic Bias. Value = agency, strong, competent, exaggeration of status (social, physical etc)
- Self-deceptive denial: Linked to Moralistic Bias. Value = communion, good kind. Deny socially deviant impulses/behaviours.
- Impression management: conscious bias
What are some methods for detecting faking?
-
Lie Scales: Paulhus balanced index of desirable responding (BID-IR). Ask socially aversive but universal questions. eg “I’ve never wanted to swear”
- Problem: May be measuring personality. Neuroticism, conscientiousness and aggreableness all corrolate strongly.
-
Response time rubrics: Longer response time = faking
- But faking totally can be quicker for some people
-
Over-claiming technique: Paulhus - rate familiarity with concepts, some terms don’t exist (act as foils). Compare confidence of real terms to foils.
- Works well but limited in concepts you can test
- Bayesian truth serum: For each item, estimate proportion of people who would give same answer. False consensus effect - honest answers will over-estimate number of others who share belief
What are some methods for reducing faking?
- Warnings: Best warnings are consequence based but can also be based on detection, reasoning (best interest), educational (validity of test) or moral.
-
Forced Choice: Test takers forced to choose between 2 desirable alternatives (which is more like you).
- Results are only relative though (ipsative) cannot compare actual levels
- Verifiable statements: Less likely to fake information that is easily verified. I work more than needed vs how many hours overtime did you work
- Other reports: referees or friends lower levels of faking (but still there)
- Implicit measurement techniques: eg implicit associations test
What are three paradigms in faking research? What has been found about faking levels?
-
Group Comparison: compare job applicants to other samples
- Measure lower limits of faking (not everyone will fake)
- There may be real group differences
- Findings: changes in OCEAN levels found, largest variation for N and C
-
Instructed faking: Compare scores under “answer honestly” to “maximise your scores” Instruction type varies - imagine you’re applying to X (can affect result)
- Honest answers still have self-deceptive biases.
- Findings: Huge changes found in OCEAN, particularly for N and C
-
Incentive manipulation: Compare no stakes conditions to reward conditions. eg top 10% get $10.
- But hard to mimic real life reward levels.
- Conclusion: people fake but not maximally
What are the reccomendations for dealing with faking in high stakes situations?
- Social desirability scales should not be useed as indicators of faking:
- can indicate real personality factors and exclude good candidates
- If faking is detected, re-test or interpret with caution:
- risk of false positives
- Try to minimise rather than detect faking
- Use personality to screen-out lowest scorers rather than screen in best
- Neutralise evaluative content of items
What are the reasons for measuring job performance?
- Decision making about individuals: high performance (promotions, bonuses, probational periods) as well as low performance (retention, termination, layoffs)
- Organisational Planning: Benchmarking performance, identifying developmental needs, assisting in goal identification
- Legal requirements for the profession: legal requirements for certain levels of performance (eg doctors), legal defense of hiring/firing decisions
- Feedback: individual, team and organisational
- Evaluation of procedures or changes: did selection processes work? did training work? other changes
What are some examples of subjective measurements of job performance?
-
Subjective measures: rating scales filled out by employee or supervisor
- Graphic rating scales: tick along a physical scale
- Behaviourally anchored rating scales (BARS): developed for a specific job dimension within a specific job. Each scale point lists example behaviours. Can the employee do X?
- Behavioural observation scale: Developed for specific job, have you observed the worker perform these behaviours?
- Checklists: list of behaviours, tick the ones that are observed
What are some objective measures of job performance? What are some problems with them?
-
Objective measures of job performance:
- Production Counts: eg number of bricks laid
- Biodata: eg absenteeism
-
Problems with objective data:
- Production counts sometimes not possible (eg a nanny)
- Doesn’t always take quality into account
- Production is dependant on situational variables as well as the worker (eg # customers served)
What are some issues with rating measures of job performance?
-
Correlation between raters: meta-analyses have shown variations
- Harris and Schaubroeck: self/peer = .36, self/supervisor = .35 but peer/supervisor =.62 (reasonable)
-
Conway and Huffcutt: Both reliability and agreement higher for low complexity jobs, non-managerial jobs.
- Reliability highest for supervisors (lowest subordinates)
- correlations between sources are lower than harris but same pattern
-
Sources of error in rating scales
- Social desireablitiy (faking)
- Leniency/severity errors (response styles personal thresholds for high/low ratings)
- “Halo” or “horns” effect: impression based on one quality
- Recency effects
- Causal attribution errors: effort>ability, actor/observer bias
- Personal bias (pregnant, race, age)
What is the difference between task and contextual performance?
-
Task Performance: activities that contribute to an organisations technical core
- tasks required by formal job role
- Lower correlations with personality
-
Contextual performance: Activities that contribute to the social and psychological core of the organisation
- tasks are discretionary and not explicitly stated
- Higher correlations with personality
What is job satisfaction and how can it be improved?
- Job satisfaction is the positive and negative feelings and attitudes about ones job.
- Shows small correlation to job performance (r=.30)
-
Measurements of job satisfaction: Global or Specific measures
- Job description index: 5 facets (job, supervision, pay, promotions, coworkers.
-
Ways to increase job satisfaction:
- Work factors: 1.Job rotation 2.Job enlargement (add more tasks) 3. Job enrichment (add responsibility)
- Pay factors: 1. Perception of fairness, 2. skill/knowledge-based pay, 5. merit based pay (bonuses, commission), 4. profit sharing.
- Hours/flexibility: 1. compressed work weeks (3x12hr days), 2. flexitime.
What is the best way to conduct a performance review?
- Two parts: 1. Performance Assessment 2. Performance Feedback
-
8 Feedback Principles:
- Descriptive (not evaluative)
- Specific (not general)
- Appropriate (considers needs of employer, worker + situation)
- Directed toward changable behaviours
- Well-timed (immediate is better)
- Honest (not manipulative, self-serving)
- Understood by both parties
- Pro-active (specific directions for change)
How do personality factors relate to work performance? What 4 factors influence this relationship?
-
Job proficiency vs training proficiency:
- Only C predicts job proficiency at a non-trivial level
- E,C and O all predict training proficiency (still small effect)
-
Productivity vs subjective ratings:
- Only C shows non-trivial relationship with productivity
- C and E show non-trivial relationships for subjective ratings
- Inverted ratings for A and N
-
Task vs contextual performance:
- All facets predict contextual more than task (except for O)
- C is strongest predictor (particularly ach striving, dutifulness)
-
Cultural factors: Salgado compared metadata from europe and america
- Similarity: C is strongest predictor
- Differences: A rather than E predicted training proficiency, Low N predicted job performance and proficiency
How do EI and intelligence relate to job performance? What factors moderate this?
- Correlations: Intelligence is highly related to job performance (>.50). All streams of EI are lower; ability is lowest followed by self-efficacy and trait being strongest.
- EI is moderated by the degree of emotional labour - trait EI in high emotional labour jobs is stronger than IQ
-
Job performance facets: Task performance vs Organisation citizenship behaviours (OCB) vs Counterproductive workplace behaviours(CWB):
- Stream 1: lowest for all 3, slightly higher for task
- Stream 2: Strongest for OCB, still strong for other 2
- Stream 3: Strongest for OCB, but stronger than 2 for CWB
What is the affective pathways model?
- A proposed mechanism through which EI relates to work performance
- correlations between facets is linked to the emotions felt at work (.39)
-
EI predicts regulation of emotion:
- Pathway through positive affect leads to OCB: (moderate effect sizes, significant for streams 2,3)
- Pathway through negative affect leads to reduction of CWB (significant for all streams effect size varies though)
What did Schmidt and Hunter find to be the best predictors of workplace performance?