Psychological Testing & Assessment Flashcards
What are the 7 assumptions about psychological testing and assessment?
Psychological Trait & States Exist.
These can be quantified and measured.
Test behavior predicts non-test behavior.
Tests have strengths and weaknesses
Sources of error are part of the process.
Testing & assessment can be done in a fair and unbiased way.
Testing & assessment benefit society.
Any distinguishable, relatively enduring way in which individual varies from one another.
Trait
Distinguishes one person from another but relatively less enduring.
State
The key to insight.
Item content
Long standing assumption that factors other than what a test attempts to measure will influence performance on the test.
Error
Refers to the component of test score attributable to sources other than the trait or ability measured.
Error variance
The stability or consistency of measurement.
Reliability
Elements of error
Observer’s error
Environmental changes
Participant’s changes
Reliability coefficient should not go beyond _.
+/- 1
The __ the coefficient alpha, the higher the reliability.
Higher
Test scores gain reliability as the number of _ increases.
Items
It means dependable, consistency or stability.
Reliability
Defined as one on which test takers fall in the same positions relative to each other.
Reliable test
Reliability assumes that test scores reflect 2 factors which are:
True characteristics
Random measurement of error
Stable characteristics of the individual.
True characteristics
Chance features of the individual or the situation.
Random measurement of error
Tools used to estimate or infer the extent to which an observed score deviates from a true score.
Standard error of estimates/measurement
Mathematical representation of Random Measurement error
X= T+E
In a reliable test, the value of E should be close to _ and the value of T should be close to the _.
0
Actual test score X.
Formula for the proportion of test score reflecting a person’s true characteristics.
T/X
Formula for the proportion of test score reflecting random error.
E/X
The reliability of the test is actually based on _.
Performance of people
Difference between people in test scores reflect differences between them in _ plus differences in the effect of _ factors.
Actual knowledge/characteristics
Chance factors
What are the sources of error variance?
Test construction
Test administration
Test scoring and interpretation
Other sources of error
It correlates performance on two interval scale measures and uses that correlation to indicate the extent of true score differences.
Reliability Analysis
Used to evaluate the error associated with administering a test at two different times.
Test-retest method
The test retest method is of value only when we measure traits or characteristics that _.
Do not change overtime.
Ideally, test-retest method should have _ months or more interval.
6
When the interval between testing is greater than 6 months.
Coefficient of stability
It compares two equivalent forms of a test that measures the same attribute.
Parallel/Alternative forms method
The 2 forms in parallel/ alternative forms method use _ items while the rules used to select items of a particular difficulty level are _. (Same/Different)
Different
Same
A test is given to some and divided into halves that are scored separately.
Split half method
One subscore is obtained for the odd number items in the test and another for the even-numbered items. Used in split half method.
Odd-even system
Allows you to estimate what the correlation between the 2 halves would have been if each half had been the length of the whole test.
Spearman-Brown Formula
Refers to the degree of correlation among all the items on a scale.
Inter-item consistency
Inter item consistency is calculated using _ administration of a _ test form. (frequency)
Single
Inter item consistency is useful in assessing the _ of a test.
Homogeneity
What are the methods used to obtain estimates of interval consistency?
KR 20
Cronbach alpha
Cronbach developed a more general reliability estimate which he called _.
Cronbach alpha
Cronbach alpha typically range from _ to _, _ values are theoretically impossible.
0-1
Negative values
The degree of agreement or consistency between two or more scorers with regard to a particular measure.
Inter-scorer reliability
Inter-scorer reliability is for what kinds of test?
Creativity or projective test
Judgment or estimate of how well a test measures what it purports to measure in a particular measure.
Validity
A judgment based on evidence about the appropriateness of inferences drawn from rest scores.
Validity
Process of gathering and evaluating evidence about validity.
Validation
Validation process if test users plan to alter format, instruction, language or content of the test. Example is the national standardized test to braille
Local validation.
As reliability of the test increases, the highest possible value of _ increases.
Validity coefficient
What are the 5 measures of reliability?
Test retest method
Parallel/Alternative forms method
Split-half method
Inter-item consistency
Inter-scorer Reliability
The first theory behind validity analysis is that the true score component reflects factors producing _ in test scores while E, error reflects factors producing _ in test scores.
Stability
Instability
The second theory behind validity analysis is focused specifically on the variable producing _ differences.
True score differences.
What are the 2 components of true score?
Stable characteristics of individual relevent to the purpose of the test.
Stable characteristics of individual irrelevant to the purpose of the test.
Formula of systematic measurement error.
R+I = T
Formula of both error components (systematic and random error)
X= R + I + E
When the item looks like they measure what they are supposed to measure.
Face Validity
The judgment about the item appropriateness in face validity is made by _.
Test takers
It is important whenever a test is used to make inferences about the broader domain of knowledge or skills represented by a sample of items.
Content validity
Content validity is important to what kinds of performance test?
Maximal and Typical Performance Test
The ability of a test to predict performance in another measure.
Criterion Validity
In criterion Validity, the test is referred to as the _ labeled X and the validation measure as the _ labeled Y.
Predictor;
Criterion
It is important whenever a test is used to make decisions by predicting future performance.
Criterion Validity
A judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest- being the criterion.
Criterion- related Validity
What are the 2 types of validity evidence?
Predictive validity
Concurrent validity
An index of the degree to which a test score predicts some criterion measure. Type of validity evidence.
Predictive validity
An index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently)
Concurrent validity
Used whether a test measures what it is intended to measure. Referred to as the personality dimension of personality traits.
Construct validity
What are the 2 construct validation techniques?
Congruent validity
Discriminant or divergent validity
Construct validation technique that measures correlating the same construct.
Congruent validity
Construct validation technique that measures correlating it to inconsistent constructs.
Discriminant or divergent validity
An informed, scientific idea developed or hypothesized to describe or explain behavior.
Construct
What are the 5 types of validity?
Face validity
Content validity
Criterion Validity
Criterion- related Validity
Construct validity
What are the statistical test bias?
Intercept bias
Slope bias
A judgement resulting from the intentional or unintentional misuse or rating scale.
Rating error
Error in rating that arises from the tendency on the part of the rater to be lenient in scoring.
Leniency Error
Systematic reluctance to giving ratings at either positive or negative.
Central Tendency Error
Tendency to give a particular ratee a higher rating than he of she objectively deserves because the rater’s failure to discriminate among conceptually distinct aspects of a ratee’s behavior.
Halo effect
The extent to which a test is used in an impartial, just and equitable way.
Fairness
Refers to a group of statistics that can be calculated for individual test items.
Item analysis
What are the 2 commonly used techniques of item analysis?
Item Difficulty
Item Discrimination
A commonly used technique of item analysis that is appropriate for maximal performance test achievement and aptitude test. It requires that test items be scored as correct and incorrect.
Item Difficulty
Percentage of the pupils who got the item right. It can also be interpreted as how easy or how difficult an item is.
Difficulty Index/ Item Difficulty Index
Formula for Item Difficulty index
P= number of person answering item correctly/ N (total number of people taking the test)
What is the value for optimal item Difficulty?
Between .5-.7
What does a high p value mean in item difficulty index?
Most people are correct on the item.
What does a low p value mean in item difficulty index?
Most people answered the item incorrectly.
Table of equivalent in interpreting the difficulty index:
_= Very difficult
.21-.80= moderately difficult
.81-1.00= _.
.00-.20
Very easy
The presence of many items at the _ level limits the potential reliability and validity of the test.
0
The presence of many items at _ level reduces variability and limits the test’s potential reliability and validity.
1
A commonly used technique of item analysis that is appropriate for almost any type of test. It indicates the extent to which different types of people answer an item in different ways.
Item Discrimination
It separates the bright test takers from the poor ones.
Discrimination Index
What are the 2 approaches for measures of item Discrimination?
Item discrimination index
Item-total correlation
The proportion obtained by comparing the performance of 2 subgroups of test takers, used in maximal performance testing. Also known as D.
Extreme Group Method