Test Construction Flashcards
Item Characteristic Curve
A graphical representation of test item’s difficulty, discrimination, and chance of false positive. Difficulty (degree of attribute needed to pass item): indicated by position of curve on the X axis. Discrimination (ability to differentiate between high and low scorers): indicated by slope of the curve. Chance of false positives (probability of getting answer correct by guessing): indicated by the Y-intercept of the curve
Criterion-Related Validity Coeffecient
A value that indicates strength of a correlation between test scores and performance on a chosen construct.
Test Characteristic Curve
A graphical representation of the expected number of test items a participant answers correctly versus the constructs measured by the test
Item difficulty
AKA item difficulty index or ‘p’. Defined as the percentage of examinees that answer the item correctly (how much of the attribute and individual must possess to pass the item).
What are the item difficulty (p) ranges?
0 and 1. 0 menas that no one passed the item (too hard) and 1 means that everyone passed (too easy). Average item difficulty should be 0.5
With item difficulty, what are the floor and ceiling effects?
Floor effects refers to a test’s ability to distinguish people at the low end of a distribution, while ceiling effects refers to a test’s ability to distinguish people at the high end of a distribution.
What is item discrimination?
The ability of the item to unambiguously separate out those who fail from those who pass. Can be visually represented with discrimination as the slope of the curve. Steeper slopes indicate more discrimination.
How is item discrimination assessed?
Index D (item discrimination index): difference between the proportion of low-scoreers who answered the item correctly and high-scorers who answered the item correctly.
What are the D ranges?
1 to -1; it is desirable to have positive values of D, which would indicate that more high-scoring examinees (rather than low-scoring examinees) answered the item correctly
Ratio measure
A level of measurement describing a variable with attributes that have all the qualities of nominal, ordinal, and interval measures as well as a true zero point; measurement of physical objects is an example of ratio measure.
Interval measure
A level of measurement describing a variable whose attributes are rank-ordered and have equal distances between adjacent attributes with no true zero point; the Farenheit temp scale s an example of this, because the distance between 17 an 18 is the same as the distance between 89 and 90
Nominal scale
A variable whose attributes are simply representations for groups and have no ranked relationship; gender would be example of a nominal scale of measurement because male does not imply more gender than female.
Item Response Theory
IRT focuese on determining specific parameters of test items. Makes use of characteristic curves, which provide info about item difficulty, item discrimination, and the probability of false positives.
Assumptions of IRT
Single underlying trait, relationship between trait and item response can be displayed in item characteristic curve, and requires large sample size.
Computer Adaptive Assessment
Uses IRT; customizes test to the examinee’s ability level.
Classical Test Theory
CTT; AKA Classical Measurement Theory, is an approach to testing that assumes that individual items are as good a measure of a latent trait as other items; thus, CTT focuses on the reliability of a set of items. in CTT, item and test parameters are sample’dependent
Kappa Coefficient
Measured the degree to which judges agree. Measure of inter-rater reliability. Increases when raters are well-trained and aware of being observed. Applicable only with nominal, ordinal, or discontinuous data.
Ranges of Kappa Coefficient
-1 to +1; .80 - .90 indicates good agreement
Convergent Validity
Indicates the degree of correlation between two instruments that are intended to measure the same thing
Metric Data
A term used to refer to interval/ratio data
Continuous Data
A term used to refer to interval/ratio data
Internal Consistency
A measure indicating the extent to which items within and instrument are correlated to each other; internal consistency indicates the extent to which the given items measure the same construct
Kuder-Richardson Formula 20
A method of evaluating internal consistency reliability; used when test items are dichotomously scored; used when test items vary in difficulty; indicates the degree to which test items are homogenous; falsely elevates internal consistency when used with timed tests
Single-Subjects Designs
One or more participants and are focuses on assessing variables within and individual rather than between individuals. They are ideographic (differences within a participant) rather than nomothetic (differences between participants)
2 types of single-subject designs
Case study (describes an individual by using tests or naturalistic observation) or experimental (determine how the introduction of a factor affects behavior)
Problems with single-subject designs
Autocorrelation (when measured on the same variable multiple times, the variable becomes correlated with itself); Time-intensive (multiple assessments or intense observations are time-consuming); Generalizability (may not generalize); Practice effects (scores may increase from practice)
Nomothetic
An approach to personality that focuses on groups of individuals and tries to find the commonalities between individuals.
Multicollinearity
Very high multiple correlations among some of all predictors in an equation.
Quantitative Research
Systematic empirical exploration or relationships; deductive, rather than inductive. Involves the collection and statistical analysis of quantitative data, whose results can often be generalized.
Reliability
Refers to the consistence or repeatability of data; pertains to quantitative research
ANOVA
Test for differences in the mean scores of groups based on one or more variables. DV must be continuous and IV must be categorical. Tests the null hypothesis that the means of the group are equal.
ANOVA assumptions
Independence of observations (each participant in only one cell); Normality (distribution of scores cluster around the mean with fewer observations fallen farther from the mean; AKA bell-chaped curve); and Homogeneity of Variance (variance of every group is same as variance of every other group, AKA homoscedasticity.
2 types of ANOVA
One-Way ANOVA (test the main effect of one IV); or Two-Way ANOVA (tests main effects of first IV (A), second IV (B), and the interaction of the two (A*B))
Interaction effect (ANOVA)
The effect of one IV on the DV differs depending on the level of the other IV
What are F-ratios?
Ratios of effect variance to error variance. In One-Way ANOVA, there is one F-ratio of the effect of the IV. In a Two-Way ANOVA, there are three F-ratios (main effect A, B, and interaction effect)
Advantages of Two-Way Anova over One-Way Anova:
Includes interaction effects; increases power; reduces familywise error rate.
Heterogeneity
The violation of the assumption of homogeneity, such that the variances of the groups are not equal. ANOVA is robust to such a violation, if there are no outliers, sample sizes are large and fairly equal, sample sizes within levels are relatively equal, and the hypothesis is two-tailed.
Chi-Square Test
Statistical method of testing for an association between categorical variables; specifically, it tests for the equality of expected and observed frequencies or proportions.
MANOVA
An extension of ANOVA methods to cover cases where there is more than one DV and where the DVs cannot simply be combined. The MANOVA combines the DVs in such a way as to maximize differences between groups. In addition to identifying whether changes in the IV have a significant effect on the DV, the technique seeks to identify the interactions among the IVs and the DVs, if any.
ANCOVA
A general linear model with one continuous DV and one or more IVs, plus a covariate. ANCOVA is a merger of ANOVA and regression for continuous variables. ANCOVA test where IVs have an effect after removing the variance for which one of more covariates account; the inclusion of covariates can increase statistical power because it accounts for some of the variability.
Dichotomous/Continuous Variables
Continuous variables assume an intermediate value between two other values and there can be an infinite amount of possible values between those two values. Dichotomous variables have only two values (yes or no)
Point-biserial correlation
Examines the relationship between a dichotomous variable and a continuous variable. Can only be used with TRUE dichotomous variables.
Biserial correlation coefficients
Examine the relationship between an artificially-created (made form a continuous variable) dichotomous variable and a continuous variable