L3: Individual Differences & Validation Flashcards

Question 1

Q

Interpersonal skills

Answer

A

This refers to skills related to social sensitivity, relationship building, working with others, listening, and communication

Question 2

Q

Generalizability theory

Answer

A

Conceptualizes the reliability of a test score as the precision with which that score represents a generalized universal value of that score

Question 3

Q

Admissible observations

Answer

A

The conditions under which examinees can be observed or tested that produce results equivalent to a specified degree

Question 4

Q

Universe score

Answer

A

The expected valued of observed scores over admissible observations

Question 5

Q

what are some issues with innovations of measurement?

Answer

A

response styles like acquiescence, extreme responding or faking
reliability estimation
improving items by using cognitive pre-tests
selecting items using iterative structural equation modelling
developing items with anchoring vignettes

Question 6

Q

Situational judgment tests

Answer

A

Step removed from direct observation, but are measures of procedural knowledge in a specific domain. Can confront applicants with written or video-based scenarios and ask how they would react by choosing an alternative from a list of responses

Question 7

Q

Theory of knowledge determinants underlying performance

Answer

A

SJT is a measure of procedural knowledge, which is made up of job-specific procedural knowledge and general/nonjob- specific procedural knowledge. The latter is based on experience in situations, which is what is tested in admissions tests

Question 8

Q

What are the main perspectives on the role of tests in admissions?

Answer

A

predicting academic outcomes is an insufficient basis for using admission tests which links to employment performance
identify the students who will develop the highest level of knowledge and skill

Question 9

Q

How stable are interpersonal skills as a construct?

Answer

A

Abilities and personality traits are found to have high rank-order stability. When using training programs there can be 4 possible outcomes: improves the skills of everyone by a similar amount, an intervention could improve those with good skills, could train everyone to a common level, reducing variance and it could be differentially effective resulting in substantial change. But the last 2 more likely to pose a threat to validity, but rejected by the paper

Question 10

Q

What are the hypotheses of this research?

Answer

A

procedural knowledge about interpersonal behaviour is a valid predictor of internship performance
also a valid predictor of job performance
the relationship between procedural knowledge about interpersonal behaviour and job performance will be mediated by internship performance
procedural knowledge will have incremental validity over cognitive factors for predicting internship and job performance

Question 11

Q

Fidelity

Answer

A

The extent to which the assessment and context mirror those on the job. SJT has a low degree of fidelity but internship performance has a high-fidelity assessment

Question 12

Q

Saturation

Answer

A

How a construct influences a complex multidimensional measure

Question 13

Q

What design was used?

Answer

A

A predictive validation design, using an internship to predict job performance, by assessing interpersonal skills which predicts performance in interactions with patients

Question 14

Q

What were the measures used?

Answer

A

Student’s scores were gathered during the admission exam. Cognitive composites were used by combining scores of different subjects and general abilities and medical texts were used. SJT: critical incidents were collected from professionals, and included vignettes, then actors were hired and videotaped. Participants had to answer questions based on these. Internship performance rating and job performance rating by supervisors was used.

Question 15

Q

What were the predictors and criteria?

Answer

A

Predictors:
- SJT (videotaped vignettes of interpersonal situations that a
doctor is likely to encounter, 30 mcs with 4 options)
- Cognitive ability test (verbal + numerical)
- Medical text (multiple choice questions)
Criteria:
- Internship performance (global rating 0-20)
- Job performance (global rating 0-20 by supervisor GP)

Question 16

Q

What were the results?

Answer

A

Procedural knowledge showed incremental validity over cognitive factors in internship behaviour and job performance, which showed a mediation effect

Question 17

Q

Strengths and limitations?

Answer

A

conceptual arguments were tested
established evidence of long-term predictive power
has practical implications for school admissions etc
but, single testing program in single setting (limited generalizability) and small sample size

Question 18

Q

Why are measures important in HR?

Answer

A

for decision making like personnel decisions and evaluation of employees
selecting and using psychological measurements (and application as it affects the careers of others)
interpret the results
communicate the results to others

Question 19

Q

Test

Answer

A

Any psychological measurement instrument, technique or procedure that systematically measures a sample of behaviour like interviews, rating scales

Question 20

Q

In what ways is a test systematic?

Answer

A

content (chosen systematically from the behavioural domain to be measured)
administration (directions for taking the test and recording answers is identical, with distractions being minimized)
scoring (objective as rules are specified in advance for evaluating responses

Question 21

Q

How to generate items as part of test development?

Answer

A

determine purpose so why is it relevant
define the attribute and content (which constructs should be included)
develop measure plan
write items with some reverse items (rule of thumb is to have double the items of what you need, avoid negation, be specific and concrete)

Question 22

Q

How to develop pilot tests in test development?

Answer

A

pilot test with a representative sample
feedback from the pilot test sample on test perceptions and item clarity (content validity)
item analysis

Question 23

Q

How to analyze items?

Answer

A

distractor analysis is the frequency of each incorrect item-> should be equal across all distractors for each item
item difficulty is the number who answer correct/total-> p value should be around.5
item discrimination is how well the item distinguishes between better vs worse performers-> index d compares number of respondents with correct answers in high-scoring compared to low scoring groups

Question 24

Q

What are the post-pilot activities?

Answer

A

select items which should follow a normal distribution
determine reliability and validity
revise and update items as some items can change due to external factors

Question 25

Q

How is testing systematic?

Answer

A

content as items are systematically chosen from behavioral domain to be measured
administration is standardized procedures so that each time
the test is given, same directions, same recording of answers,
with same time limits & as little distractions as possible
scoring is that rules are specified in advance
All of which minimize contamination on test scores

Question 26

Q

What is content made up of?

Answer

A

Tasks: Performance tests are manipulating objects, arrange blocks or trace a particular pattern.
Also can be verbal or non-verbal.
Process: cognitive tests measure the products of mental ability, made up of achievement (learning that occurred during standardized sets of experiences) and aptitude (measures the effects of learning from cumulative and varied experiences in daily living). Also includes affective tests to measure aspects of personality, which are referred to as inventories

Question 27

Q

What is administration made up of?

Answer

A

Efficiency which can be divided into individual tests (less efficient) and group tests (less opportunities to establish rapport)
Time limits: speed tests are easy items but difficult to finish all in the time limit, power tests lets everyone finish in the time limit but more difficult-> to avoid ceiling effect. To avoid cheating, computer adaptive testing is used to select items from a large pool based on a candidate’s responses
Standardized testing: compare scores obtained by different people and see who is similar to establish norms, non-standardized testing is more common and usually classroom tests in an informal way

Question 28

Q

Scoring

Answer

A

Objective scoring is used for employment. Can be subjective measures as well which introduces rater variance

Question 29

Q

What are additional issues when designing a test?

Answer

A

cost
face validity is if the test looks like it is measuring the trait, which affects motivation and reaction to the procedure by applicants
interpretation of the results by the examiner

Question 30

Q

How to choose predictors?

Answer

A

job analysis gives clues about the kinds of variables that are likely to be related to job success
understand performance domain
only measure a new measure when unavailable, then run a pilot and assess the items

Question 31

Q

Reliability

Answer

A

Looks at whether the measure is: dependable, stable, consistent over time which gives the truest picture of someone’s abilities/characteristics. This allows us to minimize unsystematic errors

Question 32

Q

Validity

Answer

A

Looks at whether the measure is measuring what it is supposed to and whether the decisions based on that measure are correct

Question 33

Q

Correlation coefficient

Answer

A

degree of consistency/agreement between two sets of independently
derived scores (r)

Question 34

Q

Coefficient of determination

Answer

A

That the reliability coefficient may be interpreted directly as the percentage of total variance attributable to difference sources

Question 35

Q

Classical test theory

Answer

A

X = T + e
X is the observed score while T is true and e is the error. High reliability is needed for measurement to provide an upper bound for validity

Question 36

Q

How to test reliability?

Answer

A

Test-retest -> coefficient of stability and related to random fluctuations in performance across occasions
Parallel/alternate forms-> coefficient of equivalence
Stability and equivalence-> combines different sources of error and most conservative
Internal consistency (e.g. Chronbach’s a) -> the degree to
which the items in one test are intercorrelated
Interrater reliability -> agreement between raters on their
rating of some dimension
Using different reliability estimates can lead to different conclusions

Question 37

Q

What is stability and equivalence?

Answer

A

The correlation between two sets of scores. Counterbalancing should be used to avoid order effects, 3 other errors are taken into account:
- random response errors which are caused by variations in attention, mental efficiency and distractions
- specific factor errors are caused by different interpretations of wording
- transient errors are produced by variations in mood or feelings which influences info-processing

Question 38

Q

What are different approaches to parallel forms?

Answer

A

This is constructing a number of parallel forms of the same procedure. Random assignment: creating a large pool of items in the same domain
Incident-isomorphism approach: change surface characteristics of items that do not determine item difficulty but change structural item features
Item-isomorphism approach: creating pairs of items that reflect the same domain but items differ in wording and grammar. Relationship between these two forms is coefficient of equivalence. Should control for order effects by half receiving one form and other half the other first

Question 39

Q

How to estimate internal consistency?

Answer

A

split half which is splitting into 2 equivalent halves which should be consistent-> error variance is attributed to inconsistency in content sampling (difficult to split test in half so should be randomly done). Interpreted as coefficient of equivalence
cronbach’s alpha is the mean of all possible split halves which is most commonly used, the more items the higher the estimate and item intercorrelations and dimensionality (alpha will be lower with more domains measured).

Question 40

Q

Inter-rater reliability

Answer

A

source of unreliability is the raters, which are different forms of instruments
made up of interrater consensus and interrate consistency

Question 41

Q

What is a good reliability?

Answer

A

Depends on the use of the scores. The more important the decision, the more precise the measure needs to be. For social sciences research needs to be more than .70 and selection decisions .90

Question 42

Q

How is validity a unitary concept?

Answer

A

There are not different kinds of validity but only different kinds of evidence for analysing validity. Validity always means the degree to which the evidence supports inferences made from the scores-> important to know why they intend to use the measure

Question 43

Q

Content validity

Answer

A

Whether the test items cover the intended performance domain. Involves logical validity and rational judgement process. This is the extent to which SMEs have an overlap between the test and the job-performance domain. This is by looking at the degree of agreement among raters or judges about how essential an item/test is, if more than agree then it has some content validity, but more difficult for abstract constructs. Content related evidence is appropriate if the focus is on work results

Question 44

Q

Content-validity strategy

Answer

A

conduct a job analysis to cover KSAOS
share the list of KSAOs with subject matter experts
alternative items considered
minimum qualifications should be straightforward with the same format
SMEs should rate the list of potential times independently
link each items back to KSAOs
group items in a thoughtful way

Question 45

Q

How to assess content validity?

Answer

A

content validity index which indicates whether the KSAOs measured are essential -> content-validity ratio is then determined for each item
substantive-validity index when panel members assign an item to its construct and its probability is analysed
content-adequacy procedure which is when panel members rate to which extent each item corresponds to each construct
analysis-of-variance approach is comparing item’s mean rating on one construct compared to other ones

Question 46

Q

How has content validity made a positive contribution?

Answer

A

improved domain sampling and job analysis procedures
better behaviour measurement
expert judgement in confirming the fairness of sampling and scoring procedures

Question 47

Q

Criterion validity

Answer

A

Whether test scores relate to valid measurements, linked to empirical relationship between predictor and criterion. Includes predictive validity which is when scores are collected before criterion data is available, and concurrent validity is when the criterion measure is available at the same time as the predictor

Question 48

Q

How to carry out predictive studies?

Answer

A

Measure candidates on predictor during selection
Select candidates without using the results (need to validate first)
Obtain measurement of criterion performance later-> time period depends on type of job & how much training is needed (approx. 6 months)
Assess the strength of the relationship (statistics)

Question 49

Q

What is a significant issue with criterion-related validity?

Answer

A

That power is frequently overestimated because of a failure to consider the combined effects of range restriction, criterion unreliability and other things that reduced the observed effect size. Alpha level should be increased, and sample size estimations can help achieve adequate statistical power. The length of the time interval between the test and the collection of the criterion data should also be considered

Question 50

Q

What are concurrent studies?

Answer

A

When predictor and criterion data are gathered from the same employees, and is cross-sectional. Issues are ignoring role of motivation effects on ability and job experience effects.

Question 51

Q

Job-related construct

Answer

A

Representation of performance or behaviour on the job that is valued by an employing organization

Question 52

Q

Construct-relate criterion

Answer

A

Chosen due to theoretical relationship, or lack of one to the construct to be measured

Question 53

Q

What is important to remember about criterion validity?

Answer

A

Performance domain needs to be defined before developing tests to use for predicting future performance. Criterion data should be gathered independently of predictor data. There needs to be adequate power, length of interval between measuring should be considered and the sample should be representative

Question 54

Q

What are some factors affecting validity?

Answer

A

Range enhancement- when a predictor is validated on a group that is more heterogenous than the group for whom the predictor was intended for-> so seems like discriminates better
Restriction of range- when a particular range is measured but this underestimates the size of validity

Question 55

Q

Types of range restriction

Answer

A

Direct range restriction ->the test being validated is used for selection purposes before its validity has been established
Indirect range restriction ->selection is made on the basis of some other variable that is correlated with the predictor test being validated
Natural attrition -> if good or bad performers leave before criteria are measured. This further restricts the
available range.

Question 56

Q

How does the relationship between criterion data and performance change over time?

Answer

A

The relationship between interview scores and criterion data is linear and follows a bivariate normal distribution, with a positive correlation of about 0.50. Scores cover the full range without censorship. However, when selection is applied (e.g., only considering scores above a certain threshold), the correlation drops significantly, and the data no longer follow an elliptical shape.

Question 57

Q

Preselection

Answer

A

When predictive validity is undertaken after a group of individuals have been hired but before criterion data has become available for them

Question 58

Q

Incremental validity

Answer

A

Whether the test adds predictive value over and above other tests, and provides unique and non-overlapping info about performance compared to other methods

Question 59

Q

Construct validity

Answer

A

Is whether a test measures the psychological construct it intends to measure. Requires accumulation of evidence, includes hypotheses about characteristics of people who are high vs low on the construct

Question 60

Q

Nomological network

Answer

A

a system of interrelated concepts, propositions and laws that relates observable characteristics to other observables and theoretical constructs

Question 61

Q

What are the sources of evidence?

Answer

A

questions asked of test takers about their performance strategies or responses to items
analyses of internal consistency
expert judgement of content and behavioural domains
correlations of a new procedure with established measures of the same construct
factor analyses of a group of procedures
structural equation modelling to test a measurement model that links observed variables to constructs
ability of scores to separate naturally occurring or experimentally contrived groups
demonstrations of systematic relationships between scores from a procedure and measures of behaviour
convergent validity (scores that measure the same construct are related to scores on other measures of the same) and discriminant validity (unrelated to scores that are not measures of that construct)

Question 62

Q

Multitrait-multimethod matrix

Answer

A

Table displaying correlations among the same trait measured by the same method (b), different traits with same method and same trait measured by different methods (c) and different traits measured by different methods (d). C correlations should be larger than - and high enough for future study and higher than b and d correlations. Does not account for reliability and assumptions underlying procedure

Question 63

Q

How are weights assigned to predictors so that differences between observed and predicted criterion scores are minimized?

Answer

A

Regression weights from one sample often do not generalize well to another due to sample-specific factors and chance variations, leading to a decrease in predictive accuracy known as shrinkage. Shrinkage is especially large when (a) the initial sample is small, increasing sampling error, (b) a “shotgun” approach is used, selecting predictors without theoretical relevance, and (c) the number of predictors increases, capturing chance relationships. To reduce shrinkage, predictors should be chosen based on psychological theory or past research that shows a clear link to the outcome.

Question 64

Q

Cross-validity

Answer

A

Whether weights derived from one sample can predict outcomes to the same degree in the population as a whole. If low, then assessment tools and prediction systems may not be appropriate in other samples from the same population

Answer 65

A

It is the process of inferring validity in a specific situation. Assumes that different jobs involving the same kinds of behaviour should require the same KSAOS as if a test is valid for one job with that element then it will be valid for use with any job involving that element. Important to identify elements of a job and then select tests that predict performance on those elements to infer validity

Answer 66

A

Define models that propose trait or method factors a priori and test the ability of models to fit the data

Answer 67

A

the results of a criterion-related validity study conducted from another location
the results of a test fairness analysis at another location
the degree of similarity between locations
the degree of similarity between applicants

Answer 68

A

Whether the tests look like they are measuring the trait/construct,, which is not technical but may affect applicant’s reactions to the test

Answer 69

A

Self-monitoring is a personality trait that captures differences in the extent to which people control the image they present to others in social situations. High SM= motivated and skilled at altering their behaviour to influence the impressions others have of them, low SM= focus on remaining true to inner attitudes by presenting a consistent image of themselves to others

Answer 70

A

should be related to scores on other measures of the same construct-> convergent validity
should be unrelated to scores of instruments that are not supposed to be measures of that construct -> discriminant validity

Answer 71

A

provides an overview of the field by bringing together data from all relevant studies and resolving inconsistencies, highlights moderators and limitations
so generalizes findings through the mean of validity coefficients
needed in personnel psychology as there is large variability from study to study

Answer 72

A

Validity coefficients vary from employer to employer, region to region etc-> situation-specificity hypothesis making it difficult to develop general principles. Studies testing this hypothesis are know as psychometric meta-analyses or validity generalization studies (obtain a mean validity coefficient and compare to a standard, and see how the scores are distributed)

Answer 73

A

Ability to grasp and reason correctly with abstractions and solve problems like verbal, spatial ability. Intelligence is the ability to learn, so leads to more rapid learning and being able to handle more complexity. It is a strong predictor of performance in learning settings

Answer 74

A

Intelligence means job knowledge as higher GMA means faster learning. Workers with higher intelligence results in higher job performance due to problems on the job not covered by prior job knowledge

Answer 75

A

Can be more job content relevant as extraversion and conscientiousness have predictive validity with managerial performance, and agreeableness important for collaborative team work. Conscientiousness is the mot consistent personality predictor across jobs and predict organizational citizenship behaviour

Answer 76

A

Some conscientiousness results in more motivation to perform so there is better performance through planning, goal setting and persistence. Too much can result in rigid, inflexible, compulsive perfectionists. This relationship disappears depending on job complexity.

Answer 77

A

Some emotional control needed to overcome distracting emotions which can take away resources for a job task. But too much wastes cognitive resources and too little is detrimental to performance. This relationship becomes negative based on job complexity

Answer 78

A

The more complex a job the higher the threshold of the level of a personality trait. High in conscientiousness in low complex jobs is not good but can be good in high complex ones. GMA is highly correlated with job performance in high complexity jobs as well

Answer 79

A

rxx= rxy/ sqrt (ryy)
where r is the correlation between scores on a measure and a criterion, rxy is the observed validity coefficient and ryy is the reliability of the criterion. This is the correction for attenuation in the criterion variable only. This evaluates the validity coefficient relative to the unreliability of the criterion and allows us to see whether there is enough unexplained systematic variance in the criterion to find more and better predictors

Answer 80

A

Thorough knowledge is need on the interrelationships between scores from a particular procedure which needs examinations. Validation is the investigative process of gathering or evaluating the necessary data. There are two issues involved in validation: what a test or other procedure measures and how well it does

Answer 81

A

Assigning numerals to objects or events according to rules. Psychological measurement is concerned with individual differences in psychological traits.

Answer 82

A

-nominal is categorical and classes are mutually exclusive (only equality)
- ordinal is rank-ordering and unclear how large the differences are between scores (equality and transitivity)
- interval has equality of units and is usually an ordinal scale (normally distributed) (equality, transitivity, additivity which is equal-sized units)
- ratio has an absolute zero point

Answer 83

A

Qualitative is in terms of kinds, which results in classification. Quantitative is in terms of frequency, amount or degree

Answer 84

A

group of interrelated behaviours that may be inherited or acquired (e.g. dominance, locus of control, agreeableness, social value orientation)

Answer 85

A

Explains how individual differences on an attribute affect the behaviour of an individual when they are responding to an item-> can be assessed through item-characteristic curve. Includes: difficulty parameter, discrimination parameter, and parameter describing the probability of a correct response with low levels of ability. Can assess bias at the item level, as sees whether the item is more difficult for examinees from one group than another

Answer 86

A

Errors attributable to the examiner or rater

Answer 87

A

Interrater agreement
interclass correlations- when two raters rate multiple objects or individuals
intraclass correlations- estimates how much of the differences among rater is due to differences in individuals on the attribute measured