Unit 4 Flashcards
Personality assessment
the measurement of the individual characteristics of a person.
What makes a personality test good? (or what is the difference b/w a test online and a legitimate test?):
Reliability
Validity
Specified conditions, populations, and cultures the test applies to
Proof that the test is related to certain outcomes
Findings published and peer reviewed in a scientific journal
Results can be replicated
Not readily available online
Reliability
estimate of the consistency of a test. It describes the extent to which test scores are consistent and reproducible with repeated measurements (across time, items, and raters). Reliability is a prerequisite to validity; a measure must be consistent in order to be valid.
Testing for reliability:
Temporal consistency (time)
Internal consistency (items)
Rater consistency (raters)
Temporal consistency (time)
demonstrates test-retest reliability: respondents take the test twice to see if scores are similar. Need to make sure they aren’t just remembering previous answers (memory effect) or performing better because they’ve taken the test before (practice effect). To eliminate those problems, tests must be taken long enough part.
Internal consistency (items)
demonstrates if the different items of the test give similar results. Earlier tests for this consistency were: parallel-forms reliability (compare two versions of a test and checked scores for similarity) and split-half reliability (split the test in half to see if scores on one half correlated with scores on the other). Now, a statistic is used instead called Cronbach’s Alpha (α): take the correlation b/w the scores of two halves of a test and then calculate the average correlation of all possible halves of the test. It estimates the generalizability of the score from one set of items to another. An alpha of 0.70-0.80 is good; even higher for IQ tests (0.90-0.95).
Rater consistency (raters)
demonstrates interrater reliability by having two separate judges rate the personality or behaviour of a third person, then finding the average correlation or percentage of agreement. Too low and the test could be too ambiguous or the judges not understanding what’s being rated.
Validity
the extent to which a test measures what it is supposed to measure.
Testing for validity:
Construct validity: Face validity Criterion validity Convergent validity Discriminant validity Predictive Validity
Construct validity:
Every test aims to measure an underlying concept called a construct, derived from a theory. This is the extent to which the test successfully measures the theoretical concept it was designed to be measuring.
Face validity
if the test appears to measure the construct of interest. Useful in two conditions: in situations where the cooperation/motivation of the test-taker can affect results (they see it as relevant/useful so they take it seriously) or when developing new measures to test they give the test to see which items are actually related to the concepts they want to measure. This type of validity is not good enough to determine if a test is valid. Other types are needed.
Criterion validity
determines how good a test is by comparing the results of a test to an external standard like another personality test or behavioural outcome.
Convergent validity
if the test is similar to other tests of the same or related constructs
Discriminant validity
if the test is different than unrelated concepts. To prove construct
validity, neither of the last two tests alone are sufficient; need to prove that the test does BOTH:
converges with similar concepts and discriminates b/w dissimilar ones.
Predictive Validity
if the test gives specific feedback to a person or group who share a
characteristic
Barnum Effect
people will readily believe superficial, general, ambiguous results that apply to all people (ie. lacks predictive validity).
Generalizability
For what purpose is this test valid (eg. uses, settings, population groups)? It establishes the boundaries/limitations of the test. For example: a test could be valid only for university students. “One size does not fit all.”
NEO-PI-R – a good personality test? (Yes, because it is valid and reliable.)
- Construct validity: Measures the Big 5 personality factors as tested with factor analysis
- Has acceptable Cronbach Alpha at 0.56-0.81 (with even higher internal consistency)
- Test-retest reliability: was taken 3 months apart and had high correlations
- Convergent and discriminant validity shown by correlating scores on other personality tests
- Generalizability: valid for adults, elders, all races, genders, and education levels, as well as in
clinical settings, and it has been translated for use in other cultures. But not for use in under 18’s.
two types of personality tests
self report and performance based
Self-report (objective):
espondents answer questions about themselves. May use a dichotomous two-choice scale (e.g. true/false, yes/no) or Likert Scale that uses degrees of agreement (strongly disagree-agree-etc.), similarity (very characteristic of me, not at all, etc.), or frequency (always-never-etc.), using a 5 or 7 point scale. Other formats: checklists, forced- choice scales (limited # of choices rather than a rating – example is the Machiavellianism Scale which measures the extent to which a person thinks others are easily manipulated), visual analogue scales.
performance-Based Tests (projective):
More often used in clinical settings; however validity is harder to prove. Five categories of projective techniques:
- Association techniques (e.g. Word Association Test, Rorschach inkblots)
- Construction techniques (e.g. Draw a Person Test (DAP), Thematic Apperception Test (TAT))
- Completion Techniques (e.g. sentence-completion tests)
- Arrangement or selection of stimuli (e.g. pick favourite colour, picture, etc.)
- Expression Technique (creative doll/puppet play, artwork)
Examples
- TAT: Respondents write a story in response to a picture and the themes are analysed, such as achievement motivation, need for power, etc.
- DAP: Drawings analysed, such as large eyes could equal paranoia, etc.
Pros and Cons - Self-report tests:
- easy to administer and score
- Large amount of info, may be the only way to measure certain things
- people aren’t best judges of their own skills/knowledge/etc (often overestimating),
- people want to present themselves in a good way so they may lie, jeopardizing validity (faking
good – or socially desirable responding (a type of response set)) - or present badly on purpose to get special treatment (faking bad)
- Carelessness: forgetting to answer one, mis-circling, etc.
Response sets/noncontent responding: when people have a set way of responding to self-report tests:
- Acquiescent responding: may always agree with that the question is asking (inflating scores)
- Reactant responding: disagree with everything (depressing scores)
- Extreme responding: avoiding the middle and only picking from both extreme ends (only 1 or 7)
- Moderate responding: only choosing the middle answers
- Patterned responding: marking answers in a pattern (all 3’s, 123-321-123, etc.)
- More extreme responding associated with individualistic cultures
Strategies to deal with response sets:
- including both a statement and its opposite (I am happy, I am not happy to stop acquiescence or reactant),
- write half of the statements so that a high rating = has the trait and the other half low = has the trait (ie. reverse scoring – prevents artificially high scores)
- Set up computer programs to catch patterns, or include rare questions to check for random responses (e.g. were you born on the moon?) (These scales in a personality tests to catch response sets/lies = infrequency scales).
- Social desirability: used forced-scale where both answers make them look good, include non- existent items to catch people claiming knowledge on things that don’t exist (overclaiming), structure testing to minimise pressure to look good (anonymity).
Personality Tests and Employment:
- Can help predict how people with certain characteristics will perform on average (e.g. openness is correlated with good training experiences, agreeableness with good customer service skills)
- 1/3 of employers use some form of psychological testing, including personality tests, in hiring
- 20% of Fortune 1000 companies do personality testing
- Some businesses show turnover rate decreased from 20-70% after personality testing
- Valid/reliable tests can predict job performance and enhance fairness in hiring
- Businesses with many applicants streamline by screening for counterproductive work
behaviours (accidents, absences, turnover) to reduce marginal applicants - Businesses with few applicants streamline by using personality tests to find exceptional
applicants - Employee theft costs 100$ billion worldwide each year; integrity tests can test the honesty of
job candidates to see if they are likely to steal or cheat. Two types: overt/clear-purpose
integrity tests
responders understand the purpose and often have two parts, one that directly assesses attitudes towards dishonest behaviours (“Do you think taking a pen from work is ok?”) and a second that asks about drug/illegal activities (“Have you done drugs before work?”).
Disguised-purpose integrity tests:
like personality tests assess characteristics related to behaviours, such as using trustworthiness or sociability to predict drug use or thieving.
- Outback Steakhouse = success story, low turnover rate with customised personality testing
- Legal issues:
must prove that the tests don’t discriminate against race/gender/etc. Recent
research suggests that pre-employment integrity testing is legal, valid, and useful.
- Using just personality assessment in hiring is misguided; for example, high school students might
not understand proper workplace etiquette yet and thus be unfairly eliminated
Matchmaking:
- In 1939, a Marital Rating Scale rated wives on the basis of their positive/negative qualities
- Psychological testing for matchmaking is even more popular today with online dating
- eHarmony matches couples on similarity of beliefs, interests, etc. Uses a model to predict
couple compatibility that includes advanced statistics (factor analysis, regression analysis, etc.) - eHarmony test is considered valid, but questioned as to the extent of that validity.