Appraisal or assessment techniques Flashcards
Reliability
Reliability is commonly defined as the consistency of a test or the degree to which it yields the same results in repeated administrations to the same test group.
Statistically, a test’s reliability is expressed as a correlation or reliability coefficient.
Reliability coefficients range from 0.0 to +1.0, with scores of .75 or higher considered strong in most situations.
Types of Reliability
Test- Retest
The same subjects are given the same test twice to see if scores are consistent over a specific amount of time (e.g. two weeks).
This type of reliability controls for the content of the items (makes them identical by giving exactly the same test each time) and can directly measure temporal effects.
Types of Reliability
Parallel or Alternate forms
By varying the items on the instrument, the researcher can directly assess the influence of item content on the consistency of test results.
In the parallel or the alternate forms approach, two, nearly-identical versions of the instrument are constructed, and the same subjects complete both versions as
close together in time as possible.
Types of Reliability
Split half procedure
The examiner can artificially create parallel forms of a test by splitting the items in half and measuring the consistency of the two halves.
This approach also directly measures variability due to item content and is less susceptible to the effects of time than the parallel forms method.
The Spearman-Brown (Prophecy) Formula is used to calculate a general estimate based on Split-Half procedures (less dependent on the particular split,
functioning like an average of the estimates obtained by splitting the items multiple times).
Types of Reliability
Standard Error of Measurement SEM
Standard Error of Measurement is an alternative method to check reliability. Statistically, 68 percent of the time, an examinee’s true score would fall within +1 and -1 SEM of his/her obtained score.
A test with a small SEM has a high reliability coefficient (.75 to 1.0). A test with a large SEM has a low reliability coefficient (0.0 to .24).
Types of Tests
- Speed Tests include so many items that no one can complete them in an allotted time (Example: arithmetic portion of the WRAT; Coding or Symbol Search on the
WISC and WAIS). - Power Tests include some very difficult items which few if any subjects are able to answer correctly (Example: the final items on almost any subtest on intelligence
and achievement measures).
Factors which Affect Reliability
The factors which are known to directly affect the estimated reliability of a test are as
follows:
Test length Homegeneity Test retest interval Range constriction Other systematic and unsystematic factors
Validity
The validity of a test is generally defined as the degree to which a test measures what it is supposed to measure. Does the test actually perform acceptably its stated purpose?
Validity is measured by calculating a validity coefficient, which is practically the square of the correlation between the test scores and the criterion score.
There are four types of test validitiy
Face validity – Although not technically a type of validity as there is no validity coefficient to calculate, it is a useful idea as it speaks to the degree to which the items on the test appear to measure or to tap the construct of interest.
Content validity refers to the degree to which a sample of test items adequately represents or covers the content area the test is supposed to measure.
Criterion-related validity- determines the extent to which a test can predict, diagnose, or classify an individual’s behavior in specific situations.
Construct validity is the extent to which a test measures a concept, construct, or trait of interest. However, here there exists no “real-word,” external criterion of this construct.
Therefore, construct validity is a more conceptual or theoretical issue and is difficult to establish quickly.
There are three types of Criterion related validity:
There are three types:
1. Predictive: Predicts future outcomes
2. Diagnostic: Tries to diagnose or identify an existing state. The results are compared to a known criterion.
3. Concurrent – determines how well a test measures what it was designed to measure by comparing performance on the test to an external criterion. This approach
is frequently seen in employee selection and appraisal settings where the test has previously demonstrated a strong, positive relationship with the criterion.
Criterion Referenced versus Normative Referenced Tests
A. Criterion Referenced
The criterion referenced assessment compares group or individual performance with a predetermined set of criteria believed to be important or essential (assessment of
handwriting, creativity, art, honesty, etc.).
B. Normative Referenced
The normative referenced assessment compares individuals with each other and/or groups who took the same test previously. Most current standardized tests and most teacher-made tests are normative referenced.
Objective versus Subjective Tests
A. Subjective Tests
Subjective tests require the formation of an answer from a limited or ambiguous stimulus and are, therefore, more difficult to take and to score/grade. However, in many
cases, this may produce greater test validity.
Ex: classroom setting: essay or fill in the blank
B. Objective Tests
Objective tests require subjects to select a single answer out of already provided answers.
Ex: classroom setting: multiple choice and True/False
Individual versus Group tests
A. Individually-Administered Tests
Individually-administered tests have certain advantages and disadvantages.
1. Advantages to Individual Testing
In individual testing, the clinician can establish greater rapport with and gain a greater understanding of each client.
2. Disadvantages to Individual Testing
Individual testing is time consuming and expensive.
B. Group Tests
Group tests generally present opposing strengths and weaknesses.
- Advantages to Group Testing
Group testing tends toward more objective scoring.
Norms are better established.
Group testing is more economical. - Disadvantages to Group Testing
Group testing allows no chance to establish individual rapport.
The examiner has no way to know what factors may be influencing an individual’s answers.
Group testing is dependent on the reading skills of the examinee.
Types of Standardized Tests
I. Achievement Tests – measure the level of acquisition of information (CAT, MET, Iowa).
II. Aptitude Tests – measure the ability to learn (skills) in a specific area or to predict future behaviors (DAT, ACT, SAT, GRE).
III. Intelligence Tests – measure the ability to function in the world and to apply reasoning and verbal ability.
IV. Interests, Attitudes, and Values Tests – measure preferences or interests for a variety of activities or topics.
V. Psychopathology Tests – measure the symptoms presented or reported by a patient or rated by an
interviewer.
VI. Personality Tests – measure qualities, traits, or behaviors that characterize a person’s individuality.
Wechsler Adult Intelligence Scale - III (WAIS-III)
- The WAIS-III is an individually administered measure of a person’s capacity for intelligent behavior as part of a cognitive assessment or a general psychological or
neuropsychological assessment. - WAIS-III Details:
a. Ages: 16 - 74 years
b. Norms: Deviation IQs — M = 100, SD = 15
c. SEM = 3 on full scale
d. Working Time: Approximately 75 minutes