Exam 2 Flashcards by Paige Steele

cross-validation

process of administering a test to another sample of test takers, representative of the target population; can also simply gather a large enough data set and randomly split into 2 samples; influenced by sample and used to evaluate regression

How well did you know this?

Not at all

Perfectly

calibration sample (aka Training Set)

sample for which regression parameters are set

How well did you know this?

Not at all

Perfectly

validation sample (aka Test Set)

sample used to predict criterion scores

How well did you know this?

Not at all

Perfectly

differential validity

when a test yields significantly different validity coefficients for subgroups

How well did you know this?

Not at all

Perfectly

single-group valididty

valid for one group, but not for another

How well did you know this?

Not at all

Perfectly

measurement bias

scores on a test are taken by different subgroups in the population (ex. men & women) need to be interpreted differently because of some characteristic of the test that is not related to the construct being measured

How well did you know this?

Not at all

Perfectly

differential prediction

an outcome in which there is a significant difference between regression equations for 2 groups as indicated by differences in slopes, intercepts, or both.

How well did you know this?

Not at all

Perfectly

criterion-related validity

the extent to which scores on a test correlate with scores on a measure of PERFORMANCE or behavior; extent to which tests scores correlate with or predict independent behaviors attitudes, or events

How well did you know this?

Not at all

Perfectly

2 Methods for evidence of Criterion-Related Validity

predictive 2. concurrent

How well did you know this?

Not at all

Perfectly

Predictive Method

used to show a relationship between test scores and a future behavior
validity coeff.= a statistic used to infer the strength of the evidence of validity that the test scores might demonstrate in predicting job performance
restriction of range= asses job applicants on the predictor

How well did you know this?

Not at all

Perfectly

Concurrent Method

test administration and criterion measurement happen at the same time. does NOT involve prediction; provides information about the present & status quo.

How well did you know this?

Not at all

Perfectly

reliability/precision vs. validity

reliability/precision: the CONSISTENCY of test results that derives from 2 factors (internal consistency and test-retest reliability)

validity: depends on the INFERENCES that are going to be made from scores

How well did you know this?

Not at all

Perfectly

objective criterion

observable and measurable; verifiable with facts and no doubt

How well did you know this?

Not at all

Perfectly

subjective criterion

based on a person’s judgement; peer-ratings; well-defined objective criteria leads to less error, narrow scope

How well did you know this?

Not at all

Perfectly

criterion contamination

when criterion measures MORE DIMENSIONS than those measured by the test; when unreliable/inappropriate criteria are used for validation, the true validity coefficient might be under or overestimated

How well did you know this?

Not at all

Perfectly

Tests of Significance

“how likely is it that the correlation between the test & the criterion resulted from chance or from sampling error?”

How well did you know this?

Not at all

Perfectly

coefficient of determination

determined the amount of variance that the test and criterion share; shared variance by sharing the validity coefficient to obtain r sq.

How well did you know this?

Not at all

Perfectly

Linear Regression

one set of test scores (x) to predict one set of criterion scores; in linear regression, we refer to this line as the regression. we calculate the slope or b weight of the regression line- the expected change in Y for every one right unit change in X

How well did you know this?

Not at all

Perfectly

Range Restriction

the reduction in range of scores that results when some people are dropped from a validity study such as when low performers are not hired, causing the validity coefficients to be lower than it would be if all persons were included in the study; correlations for range restriction are available

How well did you know this?

Not at all

Perfectly

construct validity

evidence that a test relates to other tests and behaviors; a construct=behaviors, actions, that are observable and measurable

How well did you know this?

Not at all

Perfectly

Nomological network

Study These Flashcards

method for defining a construct by illustrating its relation to as many other constructs and behaviors as possible

states vs. traits

Study These Flashcards

states are a TEMPORARY condition, perhaps brought on by situational circumstances
traits are LONG-LASTING individual quality that has become an enduring part of a person

Jingle-Jangle Fallacy- JINGLE

Study These Flashcards

Jingle is 2 measures labeled w/ same construct, but uncorrelated

Jingle-Jangle Fallacy- JANGLE

Study These Flashcards

Jangle is 2 measures w/ DIFFERENT construct but CORRELATED

Convergent Validity

test scores correlated with measures of the same or similar constructs

Discriminant Validity

test scores are NOT related to unrelated constructs

Heterotrait-hetermethod Correlations

(evidence of discriminant) different things measured differently should not be correlated

Heterotrait-Monomethod Correlations

(evidence of common method variance) different things measured the same way should NOT be correlated, but they typically are to some degree

Monotrait-heteromethod correlations

(evidence of convergent validity) the same thing measured in different ways should be correlated

Monotrait-Monomethod

(evidence of reliability) same thing measured in the same way correlated with itself should be HIGHLY correlated

Exploratory Factor Analysis (EFA)

no formal hypothesis about factors, "how many underlying factors are there?"

Confirmatory Factor Analysis (CFA)

factor structure specified in advance based on theory; "how well does my data replicate the theoretical structure?"

validity

refers to whether there is evidence supporting the interpretation of the resulting test scores for their proposed use; a test must be reliable before it is valid

test-retest method

test developer gives the same test to the same group on 2 different occasions; scored for a correlation; examine the stability of test scores over time and provides an estimate of tests reliability/precision limitation is practice effects

alternate-forms method

2 forms of the same test, as much alike as possible, to the same people; 2 forms are the alternate/parallel forms; administered close in time typically the same day to guard order effects

internal consistency method

how related the items on 1 test are to one another; measuring the same attribute

split-half method

divide test into 2 halves and then compare the set of individual scores on the first half with the second; must be equal in length and content; use random assignment

homogeneous tests

measure 1 trait/characteristic

scorer reliability/ interscorer agreement

the amount of consistency among scorers' judgemetns

intrascorer reliability

whether each clinician was consistent in the way he or she assigned scores from test to test

Classical Test Theory

X= T + E

True Score (T)

can not be truly known or determined, represents score a person would obtain if they took a test an infinite number of times and then average the score and that average cancels out random error

Random Error (E)

difference between actual score (obtained) and true score

Systematic Error

obscures the true score, when a single source of error always increases or decreases the score by the same amount

Spearman-Brown Formula

used for split-half tests

KR-20

used for internal consistency tests that are based on true or false or multiple choice scored on right vs wrong

Coefficient Alpha

used for tests that have a number of answers to choose from

Pilot Testing

when developing a test we cannot simply assume a test will be as reliable as expected; pilot tests should be representative of the intended use; participants should be representative

quantitative item analysis (2)

2 main components: | item discrimination & item difficulty

item difficulty

percentage of people who answer the item correctly (p) defines the probability of occurring by chance

item discrimination

how well item separates high and low performers ; upper and lower groups; negative numbers mean people w. low ability low positive numbers mean poor discrimination high positive numbers mean good discrimination

Exam 2 Flashcards

(51 cards)