Test Construction and Interpretation Flashcards

Question

Interval Recording | Interscorer Reliability

Answer 1

Observing subject at given intervals and noting whether the target behaviour occurs Good for behaviours with no fixed beginning or end

Answer 2

Record all behaviour of the subject during the observation session

Answer 3

How much error an individual test score can be expected to have Used to construct a **confidence interval**, which is the range someone's true score is likely to fall

Answer 4

1. Length of test 2. Homogeneity of testing group 3. Floor/ceiling effects 4. Guessing correct answers

Answer 5

The extent to which the test items adequately and representatively sample the content area to be measured Shown through correlation w/ other tests that assess same content

Answer 6

Is it useful for predicting an individuals behaviour in specified situations? Criterion = 'that which is being predicted' E.g. the SAT being correlated with Uni GPA to establish relationship and determine criterion validity Used in applied situations (selecting employees, college admissions, special classes)

Answer 7

rxy (x = predictor; y = criterion) -1.0 - +1.0 score Few exceed .60

Answer 8

The square of a correlation coefficient, which shows the variability in criterion that is explained by variability in the predictor

Answer 9

The predictor and criterion data are collected at same time It predicts a current behaviour E.g. job selection test for therapists given to current therapists, and it is correlated with their current performance ratings from supervisors

Answer 10

When you need the current status of a criterion May be used over predictive for cost and convenience

Answer 11

Predictor scores are collected first, criterion data collected later E.g. does the GRE predict grad school performance?

Answer 12

Interprets an individual's predicted score on a criterion measure There will be difference between predicted criterion score and actual score, which is the **standard error of estimate** E.g. using SAT score to predict GPA via a regression equation

Answer 13

SE est = SD y 1 - r xy2 * SE est = standard error of estimate * SD y = standard deviation of criterion scores * r xy = validity coefficient This can be used to make a confidence interval | *Likely won't need to remember equation for exam. But do need to for SEM

Answer 14

**Criterion Cut off Point** Predict if someone is likely to make it above the cut off and be selected (e.g. all students w/ GPA of 3.0+)

Answer 15

Determine the increase in correct decision making that would result from using the predictor as a selection tool Calculated once predictor and criterion cut off points are made

Answer 16

1. **True Positives:** scored above cut off, and were successful 2. **False Positives:** scored above cut off, not successful 3. **True Negatives:** scored below cutoff, unsuccessful 4. **False Negatives:** scored below cutoff, successful

Answer 17

Restricted range of scores with lower the validity coefficient Homogenous groups = lower validity coefficient

Answer 18

They must both be reliable for a predictor to be valid High reliability does not guarantee good validity

Answer 19

**What are they?** an unrelated variable that affects the validity of the predictor **Differential Validity:** a test has this if there are different validity coefficients for different groups

Answer 20

After a test validated, it's re-validated with a different group of people **Shrinkage:** when the validity coefficient drops after cross-validation, because the predictor ended up being 'tailor made' by the OG sample

Answer 21

**What is it?** knowledge of someones predictor scores impacts their criterion score **Prevention:** people involved in assigning criterion ratings should not know the persons predictor score

Answer 22

**What is it?** the degree to which a test measures the construct it is intended to **How Measured?** over time, based on accumulation of evidence

Answer 23

**What is it?** different ways of measuring the same trait yield similar results

Answer 24

**What is it?** when a test does NOT correlate with another test that measures something different

Answer 25

Assessment of 2 or more traits by 2 or more methods. **Convergent Validity** if tests that measure same traits have a high correlation, even when different methods used **Discriminant Validity** when two tests that measure different traits have a low correlation, even when they use the same method

Answer 26

1. **Monotrait-monomethod:** correlate between measure & itself. RELIABILITY 2. **Monotrait-heteromethod:** correlation between two measures of same trait w/ different methods 3. **Heterotrait-monomethod:** correlation between two measures of different traits using same method 4. **Heterotrait-heteromethod:** correlation between two measures of different traits using different methods

Answer 27

A stats procedure that reduces a set of many variables to fewer 'themed' variables (underlying constructs/latent variables)

Answer 28

Correlation between a given test and a given factor +1 to -1 Can be squard to determine proportion of variability in the test accounted for by the factor

Answer 29

**Measures:** The proportion of variance of a test that is attributable to the factors **How Measured?** factor loadings are squared and added **Equation:** h2 **Common Variance:** the factors affect variance in all parts of test **Unique Variance:** variance specific to test, unrelated to factors * Subtract communality from 1.00

Answer 30

**What are they?** measure of the amount of variance in all the tests accounted for by the factor Convert to percentage: (eigenvalue 100)/(# of tests)

Answer 31

You must make inferences based on theory about what the factors are measuring (e.g. based on teh contents of items that load highly on that factor) **Rotation:** a procedure that places factors in a new position relative to the tests. Aids in interpretation

Answer 32

1. **Orthogonal:** factors are independent of each other 2. **Oblique:** factors that are correlated w/ each other to some degree *Notes*: communality only exists for orthogonal Post-rotation, eigenvalues may have changed. Eigenvalue only used for unrotated factors.

Answer 33

**What is it?** when a test correlates highly with a factor it would be expected to

Answer 34

**Similar to Factor Analysis:** * reduce large set of variables to underlying constructs * Factor matrix * Eigenvalues: square & sum factor loadings * Underlying factors ordered in terms of explanatory power **Differences to Factor Analysis:** * Factor = principle component/eigenvector * no distinction between communality and specificity (variance only due to explained and error variance) * Factors are always uncorrelated. i.e no such thing as oblique rotation

Answer 35

**Purpose:** develop a taxonomy/classification Used to divide a group into similar subtypes (e.g. types of criminals) **Differences to Factor Analysis:** * Any type of data can be used for CA, whereas only interval or ratio for FA * Clusters are just categories, not latent variables * Not used when there is a pre-existing hypothesis, where as FA has one

Answer 36

A test can be reliable but not valid For a test to be valid, it must be reliable (if it doesn't have consistent results, it's only measuring random error) Validity coefficient is either less than or equal to the square root of the relability coefficient

Answer 37

This equation can show you what would happen to the validity of a test if both the criterion and predictor had higher reliability

Answer 38

It can have them built into the test, item by item

Answer 39

* The percentage of examinees who answer it correctly (item difficult index; p) * Moderate difficulty items are most common; increase score variability which increases reliability & validity * Change based on purpose of the test * Avg difficulty should be halfway between 1.0 and level of success expected by chance

Answer 40

Ordinal scale **Why?** equivalent differences in p value do not indicate equivalent differences in difficulty e.g. we can conclude which items are easier than others, but that doesn't mean the difference in difficulty between items is equal to the difference between other items

Answer 41

Degree to which an item differentiates among examinees in terms of the behaviour it is designed to measure e.g. depressed people answer item consistently different than non-depressed people

Answer 42

**Correlate Item Response with Total Score:** those w/ highest correlation are kept. Useful when test only measures one thing **Correlate Item with Criterion Measure:** choose items that correlate with criterion but not w/ each other **Item Discrimination Index:** D Divide group into top and bottom 27%. For each item, subtract % of examiners in low scoring from % of high scoring who answered the item correctly (D = U - L) *Range*: 100 to -100

Answer 43

Difficulty level places a ceiling on discrimination index (if everybody or nobody answers it correctly, there is no discrimination) Moderate difficulty items have best discrimination

Answer 44

Based on Item Characteristic Curves, which depict items in terms of how difficult it was for individuals in different ability groups Slope on graph shows discrimination (steeper curve = less discrimination) Difficulty, discrimination, and probability of answering correctly

Answer 45

1. Performance on item is related to estimated amount of a latent trait being measured by item 2. Results of testing are sample free (*invariance of item parameters*) An item should have same difficulty & discrimination across all random samples of a population

Answer 46

Because without a reference point, tests results mean nothing

Answer 47

**Mental Age** * Compare score to the avg performance of others at different age levels * Used to calculate ratio IQ score (Mental Age/Chronological Age) x (100) **Grade Equivalent** * Primarily used for interpretation of educational achievement tests

Answer 48

* Don't allow for comparison of individuals at different age levels, because the standard deviation is not accounted

Answer 49

* Compare score to those of most similar standardization sample * E.g. percentile ranks, standard scores

Answer 50

* Indicates the percentage of people in standardization sample who fall below a given raw score * E.g. 90th percentile = you scored better than 90% of others * *Disadvantage:* ordinal data, so can't quantify difference in scores between someone in the 90th or 80th percentile rank

Answer 51

* Show a raw score's distance from the mean in standard deviation units * Can compare an individual at different ages

Answer 52

**Z-Score** * shows how many SD's above/below mean. E.g. +1.0 = one SD above mean **T-Score** * have a mean of 50, SD of 10 * T score 60 = score falls 1 SD above mean **Stanine Score** * Literally means 'standard 9', scores range 1-9 * Mean of 5, SD of 2 **Deviation IQ Score** * mean 100, SD 15 * E.g. IQ tests

Test Construction and Interpretation Flashcards

(76 cards)