Week 3 (Reliability) Flashcards
Psychological test definition
A systematic procedure for obtaining samples of behavior, relevant to cognitive, affective, or interpersonal functioning, and for scoring and evaluating those samples according to standards
Why is validity/reliability important
Eliminating error
Why is standardization important
Ability to generalize findings
Frequency distribution
A way to organize a set of data by displaying the frequency of various outcomes in a sample
Grouping scores by intervals
Central tendency
Where does the bulk of the data lie? The average (mean, median, and mode)
Variability
Way of understanding how scores are different, or how much scatter there is in a set of scores
Ways to measure are range, variance, and standard deviation
If scores vary greatly from the mean, variance is big and vice versa.
Standard deviation
Numerical value used to indicate how widely individuals in a group vary. If scores vary greatly from the mean, standard deviation is big and vice versa
Square root of variance.
Continuous variables
Example IQ
Categorical variables
Example T/F, gender
Confidence intervals
Measurement error, have standard error of measurement to account for level of unavoidable testing error
Levels and scales 4)
Nominal (numbers that don’t actually have to do w scale, goes w categorical variables)
Ordinal (a ranking of variables)
Interval (difference between #’s should be the same)
Ratio (has natural 0 point on scale)
Correlation
How something relates to something, NOT CAUSATION
Normal curve
Bell shaped
Limits extend to infinity
Happens when mean=median=mode
Unimodal and symmetrical
A particular shape of a frequency distribution
IQ curve
Standard deviation for IQ is 15 (this is 1 standard deviation point)
95% of population is within two standard deviations of the mean
68% of population within one standard deviation of the mean
LOOK AT SLIDE 16
Standardization
Uniformity of procedures and uniformity in how tests are scored
Components are administration, content, and scores
Makes sure test is given and scored the same
Standardization of content
Inter item consistency (do items meant to measure a similar thing yield similar responses?)
Item characteristics (level of difficulty, ability of items to discriminate)
Standardization of scores
The meaning of the responses, how to convert raw score to something clinically meaningful, how to reflect performance
Norm referenced testing (group of representative individuals who take a test, represent typical behavior
Interpreting scores is comparing individual scores to the norm sample
Norm reference testing score interpretation
Within group norms are expressed by percentiles, standard scores (deviation of scores from the mean)
Flynn effect
Gradual improvement in intelligence test scores over last several decades
Criterion reference testing
Comparing a persons performance to a predetermined criterion or standard
Goal: evaluate competence in terms of pre established standard
Psychological test
Objective and standardized measure of a sample of behavior
Reliability
Way to document a tests standardization and absence of error
Refers to consistency of the data or results
Refers to the variation among scores that has to do w the trueness of the scores (how close is obtained score to true score?)
Testing reliability, classical measurement (test) theory
Theory of testing based on the idea that a persons observed or obtained score on a test is the sun of a true score (error free score) and an error score
X0= X (true) + X(error)
Reliability coefficient (r) is the ratio of true score variance to total test score variance
If test had no error reliability would be 1
Sources of error (3)
Context (location, distractions, tech issues, what is the person there for)
Test taker (language barrier, disability, sleep, hunger, health, cultural barrier
Test itself (unclear language, consistency of administration, scoring, items themselves, group norms)
Sources of error within test itself
Time sampling error (variability due to time/natural changes) ex time of day
Content sampling error (selection of items does not adequately cover the content that it is supposed to be testing) ex cultural barrier
Inter item inconsistency (error in scores that results from fluctuations in items across the test) ex consistency of testing the same thing
Content heterogeneity (results from inclusion of items or sets of items that tap content knowledge that differ from those tapped by other items in the same test)
How to test for time sampling error
Test retest reliability
How to test for content sampling error
Alternate form reliability
How to test for inter item inconsistency and heterogeneity error
Internal consistency (split half reliability, coefficient alpha)
Split half reliability
Splitting the same test in half and comparing the two halves
Measure of content sampling and internal inconsistency
K-R-20 used …
To measure internal consistency, only used when the choices are two (t/f)
Measuring internal consistency with kr20 and coefficient alpha
The correlation between performance on all the items within the test
Looks at individual items in test to see correlation between all items. Correlations then get averaged for the score
Standard error of measurement
Way to apply what we know about the reliability of a test to someone’s test score
Creates confidence interval around test score
As standard error of measurement goes down, reliability goes
Up