Assessment and Testing Flashcards
Measurement
general process of determining the dimensions of an attribute or trait
Assessment
processes and procedures for collecting info about human behavior
- assessment tools include tests/inventories, rating scales, observation, interview data, etc.
Appraisal
implies going beyond measurement to making judgments about human attributes and behaviors; used interchangeably with evaluation
Measures of Central Tendency
a distribution of scores (measures on a number of individuals) can be examined using:
mean
median
mode
!! All three of these fall in the same place (are identical) when the distribution of scores is normally distributed (not skewed) !!
Interpretation
making a statement about the meaning or usefulness of measurement data according to the professional counselor’s knowledge and judgment
Mean
the arithmetic average (M)
Median
the middle score in a distribution of scores
1, 2, (3), 4, 5
Mode
the most frequent score in a distribution of scores
1, (2, 2), 3, 4, 5
Skew
the degree to which a distribution of scores is not normally distributed
Positive Skew
The bulk of the scores falls on the left (positive skew = the tail goes out to the more positive values)
::::
::::::
:::::::::::
:::::::::::::………………..
Mode, median, mean
Negative Skew
The bulk of the scores falls on the right (negative skews = the tail goes out to the left)
::::::::: ::::::::::: ::::::::::::::::: ............::::::::::::::::::: Mean, median, mode
This graph is messed up but you get the idea
Relationship between mean, median, mode in skewed distributions
- the mode is the top of the curve (most frequent scores)
- the mean is pulled in the direction of the extreme scores represented by the tail of a skewed distribution
Measures of Variability
Range
the highest score minus the lowest score
Measures of Variability
Inclusive range
the high score minus the low score, adding one (1)
Measures of Variability
Standard Deviation (SD)
describes the variability within a distribution of scores
the mean of all the deviations from the mean
Excellent measure of the dispersion of scores
(SD = standard deviation within a sample
sigma = population’s variability)
!! It is NOT equal to variance!! SD is the square root of variance!!!
Measures of Variability
Variance
the square of the standard deviation (SD^2)
does not describe the dispersion of scores as well as SD
- see analysis of variance
Normal Curve
Normal curve
essentially distributes the scores (individuals) into six equal parts - three above the mean and three below mean
Normal Curve
Normal curve distributions
2%, 13.5%, 34%, 34%, 13.5%, 2%
…………………=== 68% ===…………………….1 SD
………======== 95%========……………2 SD
============ 99% ===========….3 SD
Percentile
a value below which a specified percentage of cases falls
- for a score of 75% : this score is higher than 74% of the scores; 25% of the scores are higher than this score
Stanine
from standard nine
converts a distribution of scores into nine parts (1 to 9) with five in the middle and a SD of about 2
Standardized Scores
creates a common language of scores to compare several different test scores for the same individual
- occur by converting raw score distributions
- these derived scores provide for constant normative/relative meaning allowing for comparisons between individuals
- express the person’s distance from the means in terms of the standard deviation of that standard score distribution
- are continuous and have equality of units
- two most commonly used standardized scores: z-scores, t-scores
Standardized Scores
Z-score
mean is 0, SD is 1.0
- range for the SD is -3.0 to +3.0
Study tip: Z-score, Zero is the mean of the distribution
Standardized Scores
T-score
mean of this standardized score is 50 and SD is 10
by Transforming this standard score, negative scores are eliminated (unlike z-score)
Study tip: T-score, Ten is SD
Correlation coefficient
Pearson Product-Moment Correlation Coefficient (r) is most common
Correlation coefficient
- ranges from -1.0 (perfect negative correlation) to +1.0 (perfect positive correlation)
- statistical index which shows the relationships between two sets of numbers
- when a very strong correlation exists, if you know one score of an individual, you can predict (to a large degree) the other score of that person
- tells nothing about cause and effect!!! Only degree of relationship!!!
Bivariate
correlation between two variables
Multivariate
correlation between three or more variables
Reliability
the consistency of a test or measure
- the degree to which the test can be expected to provide similiar results for the same subjects on repeated administrations
- can be viewed as the extent to which a measure is free from error (if instrument has little error, it is reliable)
- correlation coefficient is used to determine reliability
- if reliability coefficient is high (about .70 or higher), test scores have little error and the instrument is reliable
- RELIABILITY IS A NECESSARY PSYCHOMETRIC PROPERTY OF TESTS AND MEASURES
- a test can have high reliability but low validity… reliability places a ceiling on validity, but validity does not set limits on reliability
- ex. a scale could read 20 lbs every time you weigh a box, but the box actually weighs 40 lbs.
Types of Reliability
Test-Retest reliability
(AKA stability reliability)
- obtained using the same instrument on both occassions
- same group tested twice
- results of the two administrations are correlated
- length of time and intervening experiences may influence test-retest reliability
- two weeks is a good time between test adminstrations
Types of Reliability
Alternate-Forms reliability
(AKA Equivalence reliability)
- alternate forms of the same test are administered to the same group and the correlation between them is calculated
- how comparable the forms of the tests are will influence this reliability
- intervening events/experiences may also influence reliability
Types of Reliability
Split-half reliability
(AKA Internal consistency)
- test is divided into two halves
- The correlation between these two halves is calculated
- because you reduce the length of the test (1/2 vs. 1/2), you may apply Spearman-Brown formula (called prophecy formula) to see how reliable the test would be had you not split it in two
True and error variance
tests measure true and error variance
- you want to measure TRUE variance, the actual psychological trait or characteristic that the test is measuring
Types of Reliability
Internal consistency (split-half)
can also be determined by measuring interitem consistency
- the more homogeneous the items, the more reliable the test
- Kuder-Richardson formulas (two formulas) are used if test contains dichotomous items (T/F, Y/N)
- Cronbach alpha coefficient is used if instrument contains nondichotomous items (multiple choice, essays)
Coefficient of Determination
the degree of common variance
- the index (81%) that results from squaring the correlation (.90)
True/Error variance example
Venn Diagram:
(E1 ( T1 T2) E2)
Two tests are administered. Each one measures true variance (T1 and T2) and error variance (E1 and E2)
- if the correlation between two tests or two forms of the same test is .90, then the amount of true variance measured in common is the correlation squared (.90^2 = 81%)
Coefficient of Nondetermination
the unique variance, not common
- for above example, it would be 19% and represents error variance
- 100 - coefficient of determination
Standard error of measurement (SEM)
another measure of reliability and is useful in interpreting the test scores of an indiviudal
- may be referred to as the Confidence Band/Limits
- helps determine the range within which an individual’s test score probably falls
Standard error of measurement (SEM) Example
a person scores a 92 on a test. The test’s SEM = 5.0. Chances are 2 in 3 (67%) that the person’s score falls between 87 and 97 (refer to normal curve - 34% and 34% of the cases fall within one standard deviation (+/-) for total of 68%
- for the same test with the same SEM of 5.0, you can say that 95% of the time, the person’s score would fall within the range of 82 and 102
- every test has it’s own unique SEM which is calculated in advance and may be reported on test’s score profile
Validity
the degree to which a test measures what it purports to measure for the specific purpose for which it is used
- situation specific - depending on purpose and population
- an instrument may be valid for some purposes and not others
Validity is considered to be more important than reliability
Types of Validity
Face validity
instrument looks valid
- ex. a math test has math items. This validity could be important from the test-taker’s perspective
Types of Validity
Content validity
the instrument contains items drawn from the domain of items which could be included
- ex. two professors of Psyc 101 devise a final exam which covers the important content that they both teach
Types of Validity
Predictive validity
also called empirical validity
the predictions made by the test are confirmed by later behavior (criterion)
- Ex. the scores on the GRE predict later grade point average
Types of Validity
Concurrent validity
the results of the test are compared with other tests’ results or behaviors (criteria) at or about the same time
- ex. scores of an art aptitude test may be compared to grades already assigned to students in an art class
Types of Validity
Construct Validity
measures some hypothetical construct such as anxiety, creativity, etc.
- usually several tests/instruments are used to measure different components of the construct or of the hypothesized relationships between the construct and other constructs
- best when multiple traits are being measured using a variety of methods
Types of Validity
Convergent validity
a type of construct validity
- occurs when there is high correlation between the construct under investigation and others
Types of Validity
Discriminant validity
a type of construct validity
- occurs when there is no significant correlation between the construct under investigation and others