Unit 2 Flashcards
Correlation Coeficient
r ± 1.0
-Sign indicates direction of association
-1.0 = Perfect, 0 = none
-.66 is considered high
-WK: Doesn’t imply causation
-STR: Predicts bhvr
Scatterplot
Graphic rep. of corr.
- = Units
-Visual that shows direction & magnitude of corr.
-Helps find outliars
Pearson’s Corr (P’s r)
Most used corr. measure
-Used w/ linear corr. & cont. data
Spearman’s Rho
Corr. calculated w/ rank-order data
-If rank between variables is similar, STR + corr.
Restricted Rng
Scores are tightly clustered
-DEC corr. b/c easiness to move btwn ranks
-DEC variability
Regression
“Line of best fit” (regression line) makes predictions using corr. btwn 2 Vs
-Residual (Diff. btwn observed & predicted scores) stays at MIN
-STR of corr. = INC accuracy in predictions, DEC in error
-Observe 1 group and make predictions for another
-Tendency to overestimate relationship
Standard Error of Est.
Gives margin info of error when estimating
Coefficient of Determinism (CoD)
Coefficient^2
Coefficient of Alienation
Measures non-associations btwn Vs
-Square root of 1 - CoD
Shrinkage
Amount of DEC observed when
regression is applied to another group
Reliability
Consistency of scores
-Degree matters
-Key psychometric feat.
Rel. Coefficient
Corr. coefficient indicates rel.
Rel. (R) Tests
-Test-restest R.
-Alternate / Parallel forms R.
-Split-half / INT Consistency R.
-Inter-rater R.
Classical Test Theory
x = T + e
-x = Obt. score
-T = true score
-e = Random error
EI True score plus random error = obtained score
Systematic (NOT random) Error
Error effecting variability
Variance
Difference in scores from error & differences in ability
Sources of e (random error) in test scores
Test administration
-Test-taker Vs, ENVI & administer-related factors
Test construction
-Items used, item sampling/selection
Test scoring & interpretation
Different test = diff. e
Measurement of e & R
e DEC R & repeatability of psych test
Classical Theory
Unsystematic measurement e randomly influences
-Measurement e is random
-M e = 0
-True scores and e are not corr., rTe = 0
- e on different tests are not corr., r 12 = 0
Reliability & Classical T
True score / Total variance
-Total variance = True score + e variance
-0 (coefficient) = Diff. due to e or chance
-1 = true difference
-0 or 1 = improbable
Test-restest Method
1 test given to 2 pple on 2 different occasions
-Shows corr. btwn 1st & 2nd scores for the same test
-H corr = stable test / has test-retest R.
-L corr. = Random e OR No rel., no stability
Things to Consider w/ Test-retest R.
Time btwn testing
-Usually days-wks pending on V. Is expected to change
-Rapid changing V = needs sm btwn time, otherwise person changed
How much time is appropriate?
-Ability tests: Time needed to wear off practice & mem. effects
-practice effect = problem for academic & neuropsych settings
Test-retest = best when practice effects don’t affect/minimal
Alt./Parallel Reliability (Coefficient of Equivalence) Method
Two versions (V.a, V.b) of test given to 1 group of pple
-Group takes both tests
-V.b can be given immediately or w/ delay after V.a
Alt./Parallel Rel. Cont.
Corr. btwn V.a & V.b = Having similar qualities/concepts
Corr. btwn V.a & V.b = concepts diff.
-Maybe caused by item sampling or wording
-Use blueprint to prevent L corr.
STR & WK of Alt./Parallel R.
STR:
-DEC cheating/memory effect b/c subjects get different items w/ different form
*Practice effect possible b/c items on different forms are similar
WK:
-INC time, money, & effort to create new version of same test
Internal Consistency Reliability
Inter-corr. of items in same test
-Tests w/ heterogeneous items usually have low INT Consistency Rel.
INT consistency Rel. Method
1 test given to 1 group on 1 occasion
-Test is split in half, subtotal for each half are corr.
-Best: Odd-even, randomly, matched items
-Worst: 1st & 2nd half
Split-half
Yields corr. btwn two half tests, NOT rel. of full test
-Is longer w/ good quality
-More rel. than shorter tests b/c it fully samples bhvr
*Is pic rel. to someone’s appearance VS many pics?
Spearman-Brown Correction FOrmula
Finds rel. of a full test w/ a split-half corr. adj. UP
Split-half Cont.
Many ways to do a split-half, how done effects outcome
Coefficient Alpha
M of all possible split-half corr. of a test
-No correction needed
STR of HIGH INT Consistency
Items usually homogenous making scores easy to interpret
-W/ LOW INT Consistency, there is ambiguity & is harder to interpret
-Combine scores from many homogenous subtests to measure complex variables (ex. INT)
Inter-rater (I-R) Rel. Method
2 Raters/scorers observe & assign scores to 1 group & calculate corr.
HIGH I-R rel. needs:
-Good operational def. of bhvr measured
-In depth training w/ feedback for rater
-Occasional refresher training
I-R Reliability Cont.
Low I-R Rel. maybe from unstable characteristics
-Difference in sampling can DEC alt./parallel forms R
Factors that Affect Rel.
-Unstable characteristics affect test-RT R.
-Differences in item sampling of V.a & V.b affect Alt/ Forms Rel.
-Heterogenous items effect INT Consistency Rel.
-Restriction of Rng
*Scores clustered
*Too easy or hard tests
Understanding Rel. Coefficients
0.8-0.9+ is considered acceptable rel.
-0.8 R = 80% true difference, 0.2 = 20% random error
Item Response Theory (IRF)
Item characteristic curves (ICC), Relation of a personal trait w/ prob. of correctly scoring on a measure for said trait
-EX: Verbal ability & prob. of passing vocab test
-Look to notebook to understand how to read graph
*A is easiest, D is hardest, B&C moderately hard
Info Function
To what extent is an item different among people?
-Certain items are different for those LOW on a trait
-Some items made to discriminate those HIGH on a trait
*A in IRT tests those low
*D in IRT tests those high
Standard Error of Measurement (SEM)
e possible in a test
-Confidence intervals
-e is assumed random
-Rel DEC SEM
-Always a 68% chance score obt is ±1SEM of true score, 68% true is ±1SEM of obt score
-SEM applied to score to interpret it
*SEM = SDxSqR(1-r)
Standard Error of the Difference
Error btwn 2 scores helps understand profile of results
Validity
How well a test measures what its intended to
-How trustful is conclusion from test results?
-Info accumulates overtime w/ clinical & rsch observations
The relationship btwn Rel. & Val.
R. DEC, Val. DEC
-+ e
R. INC, ≠ V. INC
Cannot est. V w/out R.
Are there Valid tests?
No
-V. is population specific
-V needs to be documented for:
*Certain pop.
*Certain purpose
*Certain setting
-Name doesn’t matter, evidence does
Categories of Validity
Each has some overlap
-Content V.
-Criterion V.
-Construct V.
Evidence should be gathered at multiple points
Face V.
How relevant items are to laypersons
-Client understands why Q is being asked
*Type of socks you wear isn’t relevant to job interview at pizza place
-EX: Block design & Rorschach inkblot
Content V.
How well a test samples what its trying to assess
-Relation btwn sample & Qs to be asked
-Hard to determine w/ poorly defined psych Vs
-Est. w/ agreeance btwn 2 experts rating
-Watch for construct underrep. & Construct Irrelevant Diff.
Construct Underrep.
Test neglects to include key topics
-EX: If Uber driver didn’t have license
Construct Irrelevant Diff.
When test measures something irrelevant
-EX: Math test Qs reading comprehension
Criterion-Related-V (C-R-V)
How well a test measures IRL qualities/bhvr
-Criterion: Standard for eval. obt scores
-Has 2 subtypes: Concurrent & Predictive
False Positive & False Negative
F+: Test shows person has a quality they don’t
F-: Test shows person doesn’t have a quality they do have
ConCURRENT C-R-V
How well results reflect someone’s standing on a current IRL dimension
-EX: Dr.’s opinion & depression score
Predictive C-R-V
How well results predict someone’s standing on a IRL dimension in the future
-EX: GPA
Shortcut to gather info & save time, $, & effort
Using a valid test
Validity Coefficient
Corr. coefficient indicating STR of relationships btwn test scores & criterion measure
-Rarely above 0.6
-EX: Corr. btwn depression & Dr.’s rate
Criterion Contamination
Scorer for criterion also knows test scores
-Artificial corr. elevation
-EX: Prof. knows student’s GRE score & assigns grades
-Confirmation bias
Standard e of Estimate (SEE)
Stat indicates degree of e for estimated scores
-Confidence interval of e
-High corr. btwn test & criterion, DEC SEE
Decision Theory
% of “hits” / true + & true - AND % of “misses” false + & false -
-Acceptable ratio dependent on nature of decision to be made
Construct
Unobservable, underlying, hypothesized trait of interest
-A lot of psych subjects involve these
Construct Validity
The extent to which a test adequately measures a theoretical construct/trait
-EX: What does watermelon sugar high?
Sources for evidence Construct Val.
Seven:
-Test homogeneity
-Appropriate Developmental Changes
-Theory-consistent Intervention Effects
-Theory Consistent Group Difference
-Convergence Evidence
-Discriminant (Divergent) Evidence
-Factor Analysis
Test Homogeneity
INT consistency - How well a single trait is measured w/ test
Appropriate Dev. Changes
Expected changes in scores w/ age
-Principal of conservation
-EX: Grade lvl & reading score
Theory-Consistent Intervention Effects
Changes in scores in pre/post-test following known effective intervention
-EX: Known therapeutic intervention OR Bhvr changing w/ experience
Convergence Evidence
STR corr. btwn scores on new & older est. test on similar construct
-“og” log, hog BUT “int” hint, pint
Discriminant (Divergent) Evidence
Scores should theoretically be unrelated/have no corr. should show why diff.
-EX: Dr. Skelly & Hawley bringing subs to the “all cookie” party
Factor Analysis
Data reduction tech. grouping items that have something in common
-Cluster items (Factors) determined stat., interprets factors more subjective
-EX: Crockpot separates meats & veggies
Sensitivity
Accurately IDing patients w/ a particular disorder
Specificity
Accurate IDing patients w/out the disorder OR having a different one
-EX: Mini-mental exam screening for dementia in the elderly, cut-off set IDing those w/ disorder and disclude those w/out/have other
Extra Val. Concerns
Side effects & unintended consequences of testing
-EX: Going to therapy & having a reaction to being given the “batshit crazy” exam
*Value judgement & SOC consequences of tests
Afunctional Perspective
Are actions of testing beneficial?
-Giving reading tests but not having a way to help
Test Utility
Practical concern for use of test - Is it useful?