Lecture 3: Psychological Assessments Flashcards
True or false: a predictor needs to be both reliable and valid for it to be useful
True
Define Reliability
Reliability refers to the consistency, stability, and equivalency of test scores/measure of predictability. The measure should yield the same estimate on repeated uses if the measured trait has not changed, even if inaccurate.
NOT NEEDED SKIP What are the types of reliability often used in IO Psychology?
- Test-retest reliability
- Equivalence-form reliability
- Internal-Consistency reliability:
- split-half reliability
- Computing two coefficients: Cronbach’s
alpha coefficient/KR 20 (Kuder-Richardson
20)
- Inter-rater reliability
SKIP Explain the 4 types of reliability used in I-O Psychology
- Test-retest reliability: this refers to when a test is given at a certain time, and then after a certain period of time, it is re-given and the test scores’ similarity are measured using the coefficient of stability. the higher the coefficient of stability the better. An acceptable amount is 0.70 but generally 0.80 and above are better. Also, the longer the time interval between the administrations of the test, the higher the coefficient of stability, and thus, the more reliable the test is. Considered the simplest reliability test.
- Parallel/Equivalence-form reliability: this is when two forms of a tests are given to the same group of people to measure the same attribute. The correlation resulted is called the coefficient of equivalence and it reflects the extent to which the two forms are ‘sufficiently comparable measures of the same concept/attribute.’ Least used because it is difficult to create one good test let alone two. And many tests do not have a “parallel form.” In intelligence and achievement testing, equivalent forms of the same test are sometimes available. If the resulting coefficient of equivalence is high, the tests are sufficiently comparable and are viewed as reliable measures. Similar in terms of 0.70 being good, but 0.80 and above being better.
- Internal-Consistency Reliability: measures the extent to which something has homogenous content. Two types typically computed:
a) split-half reliability: the tests are just divided into two and given to two different groips of people. Their scores are then compared. But people argue that this approach is not fair as people tend to face fatigue and then do worse on the second half. this is why researchers often divide it using odd/even schemes. If the test has internal-consistency reliability, then there will be a high degree of similarity between the responses to the items from the two halves. The longer a test, the greater its reliability.
b) Compute one of two coefficients: Cronbach alpha test or Kuder-Richardon 20 (KR20). KR20 used for tests with dichotomous answers. each test is a mini test in essence. he response to each item is correlated with the response to every other item. The average of those inter-item correlations is related to the homogeneity of the test. homogenous test will have higher internal consistency in comparison to a heterogeneous test. s homogeneity of content in I-O psychology with an acceptable coefficient lying in the .70-.80 range, with higher coefficients being better (more reliable).
- Inter-Rater reliability/conspect reliability: this is when two different raters are used to observe the same behavior. the coefficient reflects the degree of agreement and high inter rater reliability establishes a base that is reliable and accurate observation.
Define validity
whether a test is correctly measuring what we intend to measure. The accuracy or appropriateness for predicting/drawing inferences from test scores. Validity depends on the use of a test as opposed to reliability which focuses on the measuring device. Scores from a given test may be highly valid for predicting employee productivity but completely invalid for predicting employee absenteeism.
rather than having types of reliability, we have sources of evidence that are used to support interpretations or inferences abput whatever it is we are measuring.
What is construct vs operationalization.
validity
construct refers to the theoretical concepts we use to explain behavior (intelligence, motivation, work ethic etc.)
operationalization refers to the ways of measuring a construct. (eg: test is a way to measure the construct of intelligence)
Reiterate what construct vs operationalization is and accordingly explain construct validity
Construct validity refers to the degree to which an actual measure is an accurate and faithful representation of its underlying construct.
To establish the construct validity of a test, you want to compare the scores on your test with other known measures of the construct. if faithful then scores should converge with other measures. If high correlation between this measure and existing measures: convergent validity coefficient is higher (reflect the degree to which scores converge when measuring a construct.
there should be very low correlation between scores from new test of a construct and other non-related constructs. the correlation for such are referred to as the divergeny validity coefficients reflect the degree to which these scores diverge. Also known as discriminant validity coefficients because the concepts are distinguishable.
SKIP Inferential linkages in construct validation
the process of construct validation involves examining the linkages among multiple concepts of interest to us.
We would want to know that the empirical measures of X and Y are faithful and accurate assessments of the constructs (1 and 2) they purport to measure. Because our empirical measures are never perfect indicators of the constructs we seek to understand, it has been suggested that researchers should devote more attention to assessing linkages.
pg 124 TB
SKIP Criterion-related validity
what are the variations in criterion-related validity
the degree to which a test.measure forecasts or is statistically related to the criterion its supposed to measure.
two major variations of criterion-related validity are:
**concurrent ** used to diagnose the existing status of some criterion
predictive used to predict the future status of the criterion.
When predictor scores are correlated with criterion data, the resulting correlation is called a validity coefficient. Whereas an acceptable reliabil-ity coefficient is in the .70–.80 range, the desired range for a validity coefficient is .30–.40. Validity coefficients less than .30 are not uncommon, but those greater than .50 are rare. Just as a predictor cannot be too reliable, it also cannot be too valid. The greater the correlation between the predictor and the criterion, the more we know about the criterion based on the predictor.
SKIP explain validity coefficient
When predictor scores are correlated with criterion data, the resulting correlation is called a validity coefficient. Whereas an acceptable reliabil-ity coefficient is in the .70–.80 range, the desired range for a validity coefficient is .30–.40. Validity coefficients less than .30 are not uncommon, but those greater than .50 are rare. Just as a predictor cannot be too reliable, it also cannot be too valid. The greater the correlation between the predictor and the criterion, the more we know about the criterion based on the predictor.
By squaring the correlation coefficient (r), we can calculate how much variance
in the criterion we can account for by using the predictor. For example, if a predictor correlates .40 with a criterion, we can explain 16% (r2) of the variance in the criterion by knowing the predictor. This particular level of predictability (16%) would be con-sidered satisfactory by most psychologists, given all the possible causes of performance variation. A correlation of 1.0 indicates perfect prediction (and complete knowl-edge). However, tests with moderate validity coefficients are not necessarily flawed or inadequate. The results attest to the complexity of human behavior. Our behavior is influenced by factors not measured by tests, such as motivation and luck. We should thus have realistic expectations regarding the validity of our tests.
SKIP Content Validity
The degree to which a predictor covers a representative sample of the behavior being assessed.
to assess content validity we use subject matter experts in the field of test covers. Civil War historians would first define the domain of the Civil War and then write test questions about it. These experts would then decide how content valid the test is. Their judgments could range from “not at all” to “highly valid.” Presumably, the test would be revised until it showed a high degree of content validity
SKIP Face Validity
This is concerned with the appearance of the test items: do they look appropriate for such a test? their face value
in contrast to content validity, estimates of its validity are made by test developers whereas face validity is estimated by test-takers*
It is possible for a test to be content valid but not face valid, and vice versa; can be used for employee testing
SKIP **Predictor development:
**1. construct vs behavioral sampling
2. past vs present characteristics
Predictor development: Predictor development refers to the creation of tools or assessments that help psychologists understand certain characteristics or qualities of an individual. these predictors can be classified along two dimensions
-
construct vs behavioral sampling: this is to see whether a predictor is seeking to measure the underlying psychological construct or a sample of the behavior that is going to be exhibited on a job.
2.** past vs present characteristics**: whether a predictor is seeking to measure something about the individual right now or in the past.
Ability Tests:
OK DO THIS **cognitive
**physical
psychomotor
sensual/perceptual
Cognitive ability/general mental ability (g)
pros and cons
explanation
many researchers believe cognitive ability is the single best predictor of future job performance.
CONS OF COGNITIVE ABILITY:
* Some researchers believe that conceptualizing intelligence merely as g encourages oversimplification of the inherent complexity of intelligence. The idea is that intelligence is not a unitary phenomenon, and other dimensions of intelligence are also worthy of our consideration.
* biases that exist for certain demo-graphic groups: A great deal of research has revealed that members of some minority groups have lower average scores on general mental ability tests compared to majority group members. thus can result in adverse impact
how to address the issue of adverse impact?
* supplement the use of cognitive ability rests with other non-cognitive predictors (such as personality inventories) which helps increase diversity and reduce adverse impact.
* you could also adjust scores based on group membership (e.g., giving bonus points to members of certain groups or setting different cutoff scores); however, this is not recommendable from a legal perspective. However, IO psychologists tend to disagree. Namely, within-group norming, a practice in which individual scores are converted to standard scores or percentile scores within one’s group, has been advo-cated as the most scientific solution to the issue of adverse impact for cognitive ability tests, yet it has been deemed unlawful for non-scientific reasons.
PROS:
* but even though it may be oversimplifying the complexity of intelligence, data has found that g accounts for approx. 80% of the variablity in criterion-related validity estimates while the remaining 20% of the variability is accounted for by other mental abilities.
HIGH VALIDTY
HIGH GENERALIZABILITY
LOW COST
Ability tests
SKIP cognitive ability
**physical ability
**psychomotor
sensu
Physical Ability
pros and cons
explanation
there are 4 critical physical abilities that relate to work/job performance:
1. static strength: ability to use muscle to lift, push, pull, carry objects.
2. explosive strength: ability to use short bursts of muscle force to propel oneself or an object
3. gross body coordination: ability to coordinate movement of arms, legs, and tose in activities where whole body is in motion
4. stamina: ability of lungs and circulatory systems of the body to perform efficiently over time
men exhibit greater static and explosive strength than women.
pros:
* work analysis should be conducted to determine how important a phys ability test is in the overall assessment of a candidate. can be used to reduce adverse impact as if a job permits employees to perform a wide variety of tasks, it may be possible to assign women to those tasks that involve less static and explosive strength.
cons:
* difficult from an assessment perspective as the test or assessment method may not necessarily be predictive of the actual sustained job description/behavior required. For example manual labor needs endurance to work 24 hours but the test will just see if he/she can lift the heavy material, not necessarily endurance.
* can cause adverse impact if used wrong.