L3: Individual Differences & Validation Flashcards

1
Q

Interpersonal skills

A

This refers to skills related to social sensitivity, relationship building, working with others, listening, and communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Generalizability theory

A

Conceptualizes the reliability of a test score as the precision with which that score represents a generalized universal value of that score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Admissible observations

A

The conditions under which examinees can be observed or tested that produce results equivalent to a specified degree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Universe score

A

The expected valued of observed scores over admissible observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are some issues with innovations of measurement?

A
  1. response styles like acquiescence, extreme responding or faking
  2. reliability estimation
  3. improving items by using cognitive pre-tests
  4. selecting items using iterative structural equation modelling
    developing items with anchoring vignettes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Situational judgment tests

A

Step removed from direct observation, but are measures of procedural knowledge in a specific domain. Can confront applicants with written or video-based scenarios and ask how they would react by choosing an alternative from a list of responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Theory of knowledge determinants underlying performance

A

SJT is a measure of procedural knowledge, which is made up of job-specific procedural knowledge and general/nonjob- specific procedural knowledge. The latter is based on experience in situations, which is what is tested in admissions tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main perspectives on the role of tests in admissions?

A
  • predicting academic outcomes is an insufficient basis for using admission tests which links to employment performance
  • identify the students who will develop the highest level of knowledge and skill
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How stable are interpersonal skills as a construct?

A

Abilities and personality traits are found to have high rank-order stability. When using training programs there can be 4 possible outcomes: improves the skills of everyone by a similar amount, an intervention could improve those with good skills, could train everyone to a common level, reducing variance and it could be differentially effective resulting in substantial change. But the last 2 more likely to pose a threat to validity, but rejected by the paper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the hypotheses of this research?

A
  • procedural knowledge about interpersonal behaviour is a valid predictor of internship performance
  • also a valid predictor of job performance
  • the relationship between procedural knowledge about interpersonal behaviour and job performance will be mediated by internship performance
  • procedural knowledge will have incremental validity over cognitive factors for predicting internship and job performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fidelity

A

The extent to which the assessment and context mirror those on the job. SJT has a low degree of fidelity but internship performance has a high-fidelity assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Saturation

A

How a construct influences a complex multidimensional measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What design was used?

A

A predictive validation design, using an internship to predict job performance, by assessing interpersonal skills which predicts performance in interactions with patients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What were the measures used?

A

Student’s scores were gathered during the admission exam. Cognitive composites were used by combining scores of different subjects and general abilities and medical texts were used. SJT: critical incidents were collected from professionals, and included vignettes, then actors were hired and videotaped. Participants had to answer questions based on these. Internship performance rating and job performance rating by supervisors was used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What were the predictors and criteria?

A

Predictors:
- SJT (videotaped vignettes of interpersonal situations that a
doctor is likely to encounter, 30 mcs with 4 options)
- Cognitive ability test (verbal + numerical)
- Medical text (multiple choice questions)
Criteria:
- Internship performance (global rating 0-20)
- Job performance (global rating 0-20 by supervisor GP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What were the results?

A

Procedural knowledge showed incremental validity over cognitive factors in internship behaviour and job performance, which showed a mediation effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Strengths and limitations?

A
  • conceptual arguments were tested
  • established evidence of long-term predictive power
  • has practical implications for school admissions etc
  • but, single testing program in single setting (limited generalizability) and small sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why are measures important in HR?

A
  • for decision making like personnel decisions and evaluation of employees
  • selecting and using psychological measurements (and application as it affects the careers of others)
  • interpret the results
  • communicate the results to others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Test

A

Any psychological measurement instrument, technique or procedure that systematically measures a sample of behaviour like interviews, rating scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In what ways is a test systematic?

A
  • content (chosen systematically from the behavioural domain to be measured)
  • administration (directions for taking the test and recording answers is identical, with distractions being minimized)
  • scoring (objective as rules are specified in advance for evaluating responses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to generate items as part of test development?

A
  1. determine purpose so why is it relevant
  2. define the attribute and content (which constructs should be included)
  3. develop measure plan
  4. write items with some reverse items (rule of thumb is to have double the items of what you need, avoid negation, be specific and concrete)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to develop pilot tests in test development?

A
  1. pilot test with a representative sample
  2. feedback from the pilot test sample on test perceptions and item clarity (content validity)
  3. item analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to analyze items?

A
  • distractor analysis is the frequency of each incorrect item-> should be equal across all distractors for each item
  • item difficulty is the number who answer correct/total-> p value should be around.5
  • item discrimination is how well the item distinguishes between better vs worse performers-> index d compares number of respondents with correct answers in high-scoring compared to low scoring groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the post-pilot activities?

A
  1. select items which should follow a normal distribution
  2. determine reliability and validity
  3. revise and update items as some items can change due to external factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is testing systematic?

A
  1. content as items are systematically chosen from behavioral domain to be measured
  2. administration is standardized procedures so that each time
    the test is given, same directions, same recording of answers,
    with same time limits & as little distractions as possible
  3. scoring is that rules are specified in advance
    All of which minimize contamination on test scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is content made up of?

A

Tasks: Performance tests are manipulating objects, arrange blocks or trace a particular pattern.
Also can be verbal or non-verbal.
Process: cognitive tests measure the products of mental ability, made up of achievement (learning that occurred during standardized sets of experiences) and aptitude (measures the effects of learning from cumulative and varied experiences in daily living). Also includes affective tests to measure aspects of personality, which are referred to as inventories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is administration made up of?

A

Efficiency which can be divided into individual tests (less efficient) and group tests (less opportunities to establish rapport)
Time limits: speed tests are easy items but difficult to finish all in the time limit, power tests lets everyone finish in the time limit but more difficult-> to avoid ceiling effect. To avoid cheating, computer adaptive testing is used to select items from a large pool based on a candidate’s responses
Standardized testing: compare scores obtained by different people and see who is similar to establish norms, non-standardized testing is more common and usually classroom tests in an informal way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Scoring

A

Objective scoring is used for employment. Can be subjective measures as well which introduces rater variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are additional issues when designing a test?

A
  • cost
  • face validity is if the test looks like it is measuring the trait, which affects motivation and reaction to the procedure by applicants
  • interpretation of the results by the examiner
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to choose predictors?

A
  • job analysis gives clues about the kinds of variables that are likely to be related to job success
  • understand performance domain
  • only measure a new measure when unavailable, then run a pilot and assess the items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Reliability

A

Looks at whether the measure is: dependable, stable, consistent over time which gives the truest picture of someone’s abilities/characteristics. This allows us to minimize unsystematic errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Validity

A

Looks at whether the measure is measuring what it is supposed to and whether the decisions based on that measure are correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Correlation coefficient

A

degree of consistency/agreement between two sets of independently
derived scores (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Coefficient of determination

A

That the reliability coefficient may be interpreted directly as the percentage of total variance attributable to difference sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Classical test theory

A

X = T + e
X is the observed score while T is true and e is the error. High reliability is needed for measurement to provide an upper bound for validity

36
Q

How to test reliability?

A
  1. Test-retest -> coefficient of stability and related to random fluctuations in performance across occasions
  2. Parallel/alternate forms-> coefficient of equivalence
  3. Stability and equivalence-> combines different sources of error and most conservative
  4. Internal consistency (e.g. Chronbach’s a) -> the degree to
    which the items in one test are intercorrelated
  5. Interrater reliability -> agreement between raters on their
    rating of some dimension
    Using different reliability estimates can lead to different conclusions
37
Q

What is stability and equivalence?

A

The correlation between two sets of scores. Counterbalancing should be used to avoid order effects, 3 other errors are taken into account:
- random response errors which are caused by variations in attention, mental efficiency and distractions
- specific factor errors are caused by different interpretations of wording
- transient errors are produced by variations in mood or feelings which influences info-processing

38
Q

What are different approaches to parallel forms?

A

This is constructing a number of parallel forms of the same procedure. Random assignment: creating a large pool of items in the same domain
Incident-isomorphism approach: change surface characteristics of items that do not determine item difficulty but change structural item features
Item-isomorphism approach: creating pairs of items that reflect the same domain but items differ in wording and grammar. Relationship between these two forms is coefficient of equivalence. Should control for order effects by half receiving one form and other half the other first

39
Q

How to estimate internal consistency?

A
  • split half which is splitting into 2 equivalent halves which should be consistent-> error variance is attributed to inconsistency in content sampling (difficult to split test in half so should be randomly done). Interpreted as coefficient of equivalence
  • cronbach’s alpha is the mean of all possible split halves which is most commonly used, the more items the higher the estimate and item intercorrelations and dimensionality (alpha will be lower with more domains measured).
40
Q

Inter-rater reliability

A
  • source of unreliability is the raters, which are different forms of instruments
  • made up of interrater consensus and interrate consistency
41
Q

What is a good reliability?

A

Depends on the use of the scores. The more important the decision, the more precise the measure needs to be. For social sciences research needs to be more than .70 and selection decisions .90

42
Q

How is validity a unitary concept?

A

There are not different kinds of validity but only different kinds of evidence for analysing validity. Validity always means the degree to which the evidence supports inferences made from the scores-> important to know why they intend to use the measure

43
Q

Content validity

A

Whether the test items cover the intended performance domain. Involves logical validity and rational judgement process. This is the extent to which SMEs have an overlap between the test and the job-performance domain. This is by looking at the degree of agreement among raters or judges about how essential an item/test is, if more than agree then it has some content validity, but more difficult for abstract constructs. Content related evidence is appropriate if the focus is on work results

44
Q

Content-validity strategy

A
  1. conduct a job analysis to cover KSAOS
  2. share the list of KSAOs with subject matter experts
  3. alternative items considered
  4. minimum qualifications should be straightforward with the same format
  5. SMEs should rate the list of potential times independently
  6. link each items back to KSAOs
  7. group items in a thoughtful way
45
Q

How to assess content validity?

A
  • content validity index which indicates whether the KSAOs measured are essential -> content-validity ratio is then determined for each item
  • substantive-validity index when panel members assign an item to its construct and its probability is analysed
  • content-adequacy procedure which is when panel members rate to which extent each item corresponds to each construct
  • analysis-of-variance approach is comparing item’s mean rating on one construct compared to other ones
46
Q

How has content validity made a positive contribution?

A
  • improved domain sampling and job analysis procedures
  • better behaviour measurement
  • expert judgement in confirming the fairness of sampling and scoring procedures
47
Q

Criterion validity

A

Whether test scores relate to valid measurements, linked to empirical relationship between predictor and criterion. Includes predictive validity which is when scores are collected before criterion data is available, and concurrent validity is when the criterion measure is available at the same time as the predictor

48
Q

How to carry out predictive studies?

A
  1. Measure candidates on predictor during selection
  2. Select candidates without using the results (need to validate first)
  3. Obtain measurement of criterion performance later-> time period depends on type of job & how much training is needed (approx. 6 months)
  4. Assess the strength of the relationship (statistics)
49
Q

What is a significant issue with criterion-related validity?

A

That power is frequently overestimated because of a failure to consider the combined effects of range restriction, criterion unreliability and other things that reduced the observed effect size. Alpha level should be increased, and sample size estimations can help achieve adequate statistical power. The length of the time interval between the test and the collection of the criterion data should also be considered

50
Q

What are concurrent studies?

A

When predictor and criterion data are gathered from the same employees, and is cross-sectional. Issues are ignoring role of motivation effects on ability and job experience effects.

51
Q

Job-related construct

A

Representation of performance or behaviour on the job that is valued by an employing organization

52
Q

Construct-relate criterion

A

Chosen due to theoretical relationship, or lack of one to the construct to be measured

53
Q

What is important to remember about criterion validity?

A

Performance domain needs to be defined before developing tests to use for predicting future performance. Criterion data should be gathered independently of predictor data. There needs to be adequate power, length of interval between measuring should be considered and the sample should be representative

54
Q

What are some factors affecting validity?

A

Range enhancement- when a predictor is validated on a group that is more heterogenous than the group for whom the predictor was intended for-> so seems like discriminates better
Restriction of range- when a particular range is measured but this underestimates the size of validity

55
Q

Types of range restriction

A
  1. Direct range restriction ->the test being validated is used for selection purposes before its validity has been established
  2. Indirect range restriction ->selection is made on the basis of some other variable that is correlated with the predictor test being validated
  3. Natural attrition -> if good or bad performers leave before criteria are measured. This further restricts the
    available range.
56
Q

How does the relationship between criterion data and performance change over time?

A

The relationship between interview scores and criterion data is linear and follows a bivariate normal distribution, with a positive correlation of about 0.50. Scores cover the full range without censorship. However, when selection is applied (e.g., only considering scores above a certain threshold), the correlation drops significantly, and the data no longer follow an elliptical shape.

57
Q

Preselection

A

When predictive validity is undertaken after a group of individuals have been hired but before criterion data has become available for them

58
Q

Incremental validity

A

Whether the test adds predictive value over and above other tests, and provides unique and non-overlapping info about performance compared to other methods

59
Q

Construct validity

A

Is whether a test measures the psychological construct it intends to measure. Requires accumulation of evidence, includes hypotheses about characteristics of people who are high vs low on the construct

60
Q

Nomological network

A

a system of interrelated concepts, propositions and laws that relates observable characteristics to other observables and theoretical constructs

61
Q

What are the sources of evidence?

A
  1. questions asked of test takers about their performance strategies or responses to items
  2. analyses of internal consistency
  3. expert judgement of content and behavioural domains
  4. correlations of a new procedure with established measures of the same construct
  5. factor analyses of a group of procedures
  6. structural equation modelling to test a measurement model that links observed variables to constructs
  7. ability of scores to separate naturally occurring or experimentally contrived groups
  8. demonstrations of systematic relationships between scores from a procedure and measures of behaviour
  9. convergent validity (scores that measure the same construct are related to scores on other measures of the same) and discriminant validity (unrelated to scores that are not measures of that construct)
62
Q

Multitrait-multimethod matrix

A

Table displaying correlations among the same trait measured by the same method (b), different traits with same method and same trait measured by different methods (c) and different traits measured by different methods (d). C correlations should be larger than - and high enough for future study and higher than b and d correlations. Does not account for reliability and assumptions underlying procedure

63
Q

How are weights assigned to predictors so that differences between observed and predicted criterion scores are minimized?

A

Regression weights from one sample often do not generalize well to another due to sample-specific factors and chance variations, leading to a decrease in predictive accuracy known as shrinkage. Shrinkage is especially large when (a) the initial sample is small, increasing sampling error, (b) a “shotgun” approach is used, selecting predictors without theoretical relevance, and (c) the number of predictors increases, capturing chance relationships. To reduce shrinkage, predictors should be chosen based on psychological theory or past research that shows a clear link to the outcome.

64
Q

Cross-validity

A

Whether weights derived from one sample can predict outcomes to the same degree in the population as a whole. If low, then assessment tools and prediction systems may not be appropriate in other samples from the same population

65
Q

Synthetic validity

A

It is the process of inferring validity in a specific situation. Assumes that different jobs involving the same kinds of behaviour should require the same KSAOS as if a test is valid for one job with that element then it will be valid for use with any job involving that element. Important to identify elements of a job and then select tests that predict performance on those elements to infer validity

66
Q

Confirmatory factor analysis

A

Define models that propose trait or method factors a priori and test the ability of models to fit the data

67
Q

For tests to be able to be transported, what are important criteria?

A
  • the results of a criterion-related validity study conducted from another location
  • the results of a test fairness analysis at another location
  • the degree of similarity between locations
  • the degree of similarity between applicants
68
Q

Face validity

A

Whether the tests look like they are measuring the trait/construct,, which is not technical but may affect applicant’s reactions to the test

69
Q

What is the self-monitoring scale?

A

Self-monitoring is a personality trait that captures differences in the extent to which people control the image they present to others in social situations. High SM= motivated and skilled at altering their behaviour to influence the impressions others have of them, low SM= focus on remaining true to inner attitudes by presenting a consistent image of themselves to others

70
Q

How to see whether self-monitoring is being measured?

A
  • should be related to scores on other measures of the same construct-> convergent validity
  • should be unrelated to scores of instruments that are not supposed to be measures of that construct -> discriminant validity
71
Q

What is a meta-analysis?

A
  • provides an overview of the field by bringing together data from all relevant studies and resolving inconsistencies, highlights moderators and limitations
  • so generalizes findings through the mean of validity coefficients
  • needed in personnel psychology as there is large variability from study to study
72
Q

Validity generalization

A

Validity coefficients vary from employer to employer, region to region etc-> situation-specificity hypothesis making it difficult to develop general principles. Studies testing this hypothesis are know as psychometric meta-analyses or validity generalization studies (obtain a mean validity coefficient and compare to a standard, and see how the scores are distributed)

73
Q

General mental ability

A

Ability to grasp and reason correctly with abstractions and solve problems like verbal, spatial ability. Intelligence is the ability to learn, so leads to more rapid learning and being able to handle more complexity. It is a strong predictor of performance in learning settings

74
Q

Why is intelligence used as a predictor?

A

Intelligence means job knowledge as higher GMA means faster learning. Workers with higher intelligence results in higher job performance due to problems on the job not covered by prior job knowledge

75
Q

How can personality be used to predict job performance?

A

Can be more job content relevant as extraversion and conscientiousness have predictive validity with managerial performance, and agreeableness important for collaborative team work. Conscientiousness is the mot consistent personality predictor across jobs and predict organizational citizenship behaviour

76
Q

How does conscientiousness not have a linear relationship with job performance?

A

Some conscientiousness results in more motivation to perform so there is better performance through planning, goal setting and persistence. Too much can result in rigid, inflexible, compulsive perfectionists. This relationship disappears depending on job complexity.

77
Q

How does emotional stability not have a linear relationship with job performance?

A

Some emotional control needed to overcome distracting emotions which can take away resources for a job task. But too much wastes cognitive resources and too little is detrimental to performance. This relationship becomes negative based on job complexity

78
Q

What is the relationship between job complexity and personality?

A

The more complex a job the higher the threshold of the level of a personality trait. High in conscientiousness in low complex jobs is not good but can be good in high complex ones. GMA is highly correlated with job performance in high complexity jobs as well

79
Q

How can unreliability be adjust for statistically?

A

rxx= rxy/ sqrt (ryy)
where r is the correlation between scores on a measure and a criterion, rxy is the observed validity coefficient and ryy is the reliability of the criterion. This is the correction for attenuation in the criterion variable only. This evaluates the validity coefficient relative to the unreliability of the criterion and allows us to see whether there is enough unexplained systematic variance in the criterion to find more and better predictors

80
Q

Validation

A

Thorough knowledge is need on the interrelationships between scores from a particular procedure which needs examinations. Validation is the investigative process of gathering or evaluating the necessary data. There are two issues involved in validation: what a test or other procedure measures and how well it does

81
Q

Measurement

A

Assigning numerals to objects or events according to rules. Psychological measurement is concerned with individual differences in psychological traits.

82
Q

Different types of measurement

A

-nominal is categorical and classes are mutually exclusive (only equality)
- ordinal is rank-ordering and unclear how large the differences are between scores (equality and transitivity)
- interval has equality of units and is usually an ordinal scale (normally distributed) (equality, transitivity, additivity which is equal-sized units)
- ratio has an absolute zero point

83
Q

Types of variation?

A

Qualitative is in terms of kinds, which results in classification. Quantitative is in terms of frequency, amount or degree

84
Q

Trait

A

group of interrelated behaviours that may be inherited or acquired (e.g. dominance, locus of control, agreeableness, social value orientation)

85
Q

Item response theory

A

Explains how individual differences on an attribute affect the behaviour of an individual when they are responding to an item-> can be assessed through item-characteristic curve. Includes: difficulty parameter, discrimination parameter, and parameter describing the probability of a correct response with low levels of ability. Can assess bias at the item level, as sees whether the item is more difficult for examinees from one group than another

86
Q

Rater/scorer variance

A

Errors attributable to the examiner or rater

87
Q

How can interrater reliability be estimated?

A

Interrater agreement
interclass correlations- when two raters rate multiple objects or individuals
intraclass correlations- estimates how much of the differences among rater is due to differences in individuals on the attribute measured