Appraisal or assessment techniques Flashcards
Reliability
Reliability is commonly defined as the consistency of a test or the degree to which it yields the same results in repeated administrations to the same test group.
Statistically, a test’s reliability is expressed as a correlation or reliability coefficient.
Reliability coefficients range from 0.0 to +1.0, with scores of .75 or higher considered strong in most situations.
Types of Reliability
Test- Retest
The same subjects are given the same test twice to see if scores are consistent over a specific amount of time (e.g. two weeks).
This type of reliability controls for the content of the items (makes them identical by giving exactly the same test each time) and can directly measure temporal effects.
Types of Reliability
Parallel or Alternate forms
By varying the items on the instrument, the researcher can directly assess the influence of item content on the consistency of test results.
In the parallel or the alternate forms approach, two, nearly-identical versions of the instrument are constructed, and the same subjects complete both versions as
close together in time as possible.
Types of Reliability
Split half procedure
The examiner can artificially create parallel forms of a test by splitting the items in half and measuring the consistency of the two halves.
This approach also directly measures variability due to item content and is less susceptible to the effects of time than the parallel forms method.
The Spearman-Brown (Prophecy) Formula is used to calculate a general estimate based on Split-Half procedures (less dependent on the particular split,
functioning like an average of the estimates obtained by splitting the items multiple times).
Types of Reliability
Standard Error of Measurement SEM
Standard Error of Measurement is an alternative method to check reliability. Statistically, 68 percent of the time, an examinee’s true score would fall within +1 and -1 SEM of his/her obtained score.
A test with a small SEM has a high reliability coefficient (.75 to 1.0). A test with a large SEM has a low reliability coefficient (0.0 to .24).
Types of Tests
- Speed Tests include so many items that no one can complete them in an allotted time (Example: arithmetic portion of the WRAT; Coding or Symbol Search on the
WISC and WAIS). - Power Tests include some very difficult items which few if any subjects are able to answer correctly (Example: the final items on almost any subtest on intelligence
and achievement measures).
Factors which Affect Reliability
The factors which are known to directly affect the estimated reliability of a test are as
follows:
Test length Homegeneity Test retest interval Range constriction Other systematic and unsystematic factors
Validity
The validity of a test is generally defined as the degree to which a test measures what it is supposed to measure. Does the test actually perform acceptably its stated purpose?
Validity is measured by calculating a validity coefficient, which is practically the square of the correlation between the test scores and the criterion score.
There are four types of test validitiy
Face validity – Although not technically a type of validity as there is no validity coefficient to calculate, it is a useful idea as it speaks to the degree to which the items on the test appear to measure or to tap the construct of interest.
Content validity refers to the degree to which a sample of test items adequately represents or covers the content area the test is supposed to measure.
Criterion-related validity- determines the extent to which a test can predict, diagnose, or classify an individual’s behavior in specific situations.
Construct validity is the extent to which a test measures a concept, construct, or trait of interest. However, here there exists no “real-word,” external criterion of this construct.
Therefore, construct validity is a more conceptual or theoretical issue and is difficult to establish quickly.
There are three types of Criterion related validity:
There are three types:
1. Predictive: Predicts future outcomes
2. Diagnostic: Tries to diagnose or identify an existing state. The results are compared to a known criterion.
3. Concurrent – determines how well a test measures what it was designed to measure by comparing performance on the test to an external criterion. This approach
is frequently seen in employee selection and appraisal settings where the test has previously demonstrated a strong, positive relationship with the criterion.
Criterion Referenced versus Normative Referenced Tests
A. Criterion Referenced
The criterion referenced assessment compares group or individual performance with a predetermined set of criteria believed to be important or essential (assessment of
handwriting, creativity, art, honesty, etc.).
B. Normative Referenced
The normative referenced assessment compares individuals with each other and/or groups who took the same test previously. Most current standardized tests and most teacher-made tests are normative referenced.
Objective versus Subjective Tests
A. Subjective Tests
Subjective tests require the formation of an answer from a limited or ambiguous stimulus and are, therefore, more difficult to take and to score/grade. However, in many
cases, this may produce greater test validity.
Ex: classroom setting: essay or fill in the blank
B. Objective Tests
Objective tests require subjects to select a single answer out of already provided answers.
Ex: classroom setting: multiple choice and True/False
Individual versus Group tests
A. Individually-Administered Tests
Individually-administered tests have certain advantages and disadvantages.
1. Advantages to Individual Testing
In individual testing, the clinician can establish greater rapport with and gain a greater understanding of each client.
2. Disadvantages to Individual Testing
Individual testing is time consuming and expensive.
B. Group Tests
Group tests generally present opposing strengths and weaknesses.
- Advantages to Group Testing
Group testing tends toward more objective scoring.
Norms are better established.
Group testing is more economical. - Disadvantages to Group Testing
Group testing allows no chance to establish individual rapport.
The examiner has no way to know what factors may be influencing an individual’s answers.
Group testing is dependent on the reading skills of the examinee.
Types of Standardized Tests
I. Achievement Tests – measure the level of acquisition of information (CAT, MET, Iowa).
II. Aptitude Tests – measure the ability to learn (skills) in a specific area or to predict future behaviors (DAT, ACT, SAT, GRE).
III. Intelligence Tests – measure the ability to function in the world and to apply reasoning and verbal ability.
IV. Interests, Attitudes, and Values Tests – measure preferences or interests for a variety of activities or topics.
V. Psychopathology Tests – measure the symptoms presented or reported by a patient or rated by an
interviewer.
VI. Personality Tests – measure qualities, traits, or behaviors that characterize a person’s individuality.
Wechsler Adult Intelligence Scale - III (WAIS-III)
- The WAIS-III is an individually administered measure of a person’s capacity for intelligent behavior as part of a cognitive assessment or a general psychological or
neuropsychological assessment. - WAIS-III Details:
a. Ages: 16 - 74 years
b. Norms: Deviation IQs — M = 100, SD = 15
c. SEM = 3 on full scale
d. Working Time: Approximately 75 minutes
Comparison of the WAIS-III and the Stanford-Binet
indicates that the WAIS has achieved much larger acceptance. However, the Stanford-Binet is more sensitive to lower levels of mental ability (mental age) as is preferred in many State Schools and Rehabilitation settings.
Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV)
WISC-IV Details:
a. Ages: 6 years through 16 years, 11 months
b. Norms: Scaled scores and IQs by age; M = 100; SD = 15
c. Working Time: 50 - 85 minutes depending upon use of supplemental subtests
Wechsler Preschool and Primary Scale of Intelligence – Third Edition (WPPSI-III)
a. Ages: 3 years to 7 years, 3 months
b. Norms: Scaled scores and deviation IQ.
c. Scores by age are provided for 17 age groups divided into 3-month intervals.
d. M = 100; SD = 15
Otis-Lennon School Ability Test (OLSAT)
The OLSAT is an intelligence test which can be administered to a group. It can be used
to determine students who might qualify for gifted and talented programs
Adult Psychopathology.
Minnesota Multiphasic Personality Inventory-II (MMPI-2)
Validity and clinical scales empirically derived from patient and nonpatient
samples on pathological groups.
Interpretation is primarily based on T-distribution elevations
(T=65 or above; M = 50, SD = 10).
Sixteen Personality Factor Questionnaire (16 PF)
The 16 PF was developed by Cattell using factor analysis to classify personality traits.
It is designed for the normal population 16 years old and older.
A low literate edition is available for the disadvantaged.
Career and Vocational Testing
The SII uses Holland’s scheme to derive 6 general occupational themes that are
always listed in this order: R I A S E C
Following are the general occupational themes and the characteristics each indicates:
Realistic – Deals with the environment in a concrete, objective, and physically manipulative manner; uses minimal interpersonal skills.
Investigative – Uses intelligence, ideas, words, and symbols to deal with the environment.
Artistic – Creates art forms; prefers musical, artistic, literary, and dramatic vocations.
Social – Handles the environment by using people skills and has much concern for human welfare.
Enterprising – Has adventurous, dominant, enthusiastic, and impulsive qualities; usually prefers sales, supervisory, and leadership vocations.
Conventional – Chooses goals and activities that have high social approval; prefers clerical and computational tasks; identifies with business and values economic matters.
d
Important test reference books by John Buros
Mental Measurement yearbook and tests in print