test theory & practice Flashcards
maximum performance test
asks the person to do his her best to solve one or more problems, intelligence and achievement tests
typical performance test
asks the person to respond to one or more tasks, where the responses are typical for the person. personality or attitude tests
dimensionality of a test
is equal to the number of latent attributes (variables) which effects test performance
latent
unobserved
unidimensional test
test that measures one latent attribute
multidimensional test
test that measures more than one latent attribute
mental test
consists of cognitive tasks, such as problems and questions
physical test
consists of instruments to make somatic or physiological measurements
a pure power test
consists of problems that the test taker tries to solve, test taker has ample time to work on each of the test items . emphasis is on measuring the accuracy to solve the problems
ability test
sometimes also called aptitude test, is an instrument for measuring a persons best performance in an area that is not explicitly taught in training or educational programs.
achievement test
an instrument for measuring performance that is explicitly taught in training and educational programs
other evaluation mode
to ask others to evaluate a persons ability to perform a task
description
means the the test is only used to describe performances.
dichotomous scale
where test takers responses are graded in two ordered categories, that is correct or incorrect
the ordinance polytomous scale
where test takers responses are graded in more than two ordered categories, for ex. a correct, partly correct or incorrect answer (or a b c d e )
a measurement procedure is reactive when
test takers can deliberately distort their construct value, for ex an unmotivated student who pretends to be highly motivated in a self report school attitude test
measurement procedure is nonreactive when
test takers cannot distort their construct value, for ex drivers whose record shows many traffic offences cannot disguise that their record is indicative of a negative attitude towards traffic safety
relational method
uses a loose description of the construct that is based on the knowledge of experts or members of the target population
propotypical method
asks members of the target population to think of persons having the construct and to write down their behaviour that is typical of the construct
internal method
starts with a broad pool of personality or attitude items
external method
starts from a broad pool of items and a criterion that has to be predicted (success in job)
construct method
starts from an explicit theory, and items are derived from that theory
facet design method
does not use an explicit theory, but it starts from a conceptual analysis of the construct
behavioural facets
classify types of behaviour
situational facets
classified’s the situations where the behaviours appear
endorsement response scale
asks the test taker to indicate his her degree of endorsement of the statement
dichotomous scale
scale that has only two categories
ordinal polytomous
a scale that has more than two ordered categories
bounded continuous scale
scale has two end points (bounds)
indicative item
item where a high frequency or endorsement response indicates a high level of the construct
contra indicative item
is an item where a high frequency or endorsement response indicates a low level of the construct
concurrent interview
asks the test taker to think aloud while answering the items
retrospective interview
asks the test takers to recollect their thinking after completing the items
coefficient of identity
can be applied to assess integrater agreement and intrarater consistency
response style
the differential use of the item response scale by different persons, important response styles are ; acquiescence, dissentience, extremity and midpoint response styles
acquiescence
is the tendency to agree with an endorsement statement, independently of the content of the statement (yea saying)
dissentience
tendency to disagree with an endorsement statement independently of the content of the statement (nay saying)
extremity response style
is the tendency to choose extremes of the item response scale
midpoint response style
tendency to choose the middle of the response scale
social desirability
persons tendency to deceive either oneself or others
self deception
tendency to deceive oneself
impression management
tendency to deceive others by making a good or bad impression or others
content validation
experts evaluate whether the test adequately covers all aspects of the construct
observed test score
is computed after the separate test items are scored
observed test score
derived from the item scores by taking the unweighted or weighted sum of the item scores
latent variable score
is derived from the item responses under the assumption of a latent variable item response model
item non-response
means that a test taker did not respond to some of the items of the test
imputed item score
score that is substituted for a missing item response
estimate of parameter
a value derived from empirical data
measurement precision information
applies to the test score of a single person, information is the within person aspect of measurement precision
measurement precision reliability
concerns the differentiation of test scores of different test takers from a population
true test score
the expected value of the observed test scores of the repeated test administrations in the thought experiment
error of measurement
the difference between test taker observed test score and his true score from an abritrary measurement occasion
a small amount of information (large within person error variance) means
that test taker observed test score vary widely around his true score across repeated test administrations
a large amount of information (small within person error variance) means
that test takers observed test scores do not vary widely around his true score
parallel tests
tests that measure the same true score with equal within person error variance and uncorrected errors across hypothetical repeated test administrations for each of the test takers of a population
classical test theory
is based on the definitions of test taker j’s true score, his error of measurement, and the generalisation to a randomly selected person from a population of persons
standard error or measurement of a test
is the square root of the error variance in the population of persons
the importance of a lower bound is
that a high value of lower bound implies that the theoretical reliability is high (Cronbach’s coefficient alpha)
location of the item score distribution is
the place of the scale where item scores are centered
dispersion
scatter of the item scores
shape
form of the distributions
classical item difficutly
of a maximum performance item is a parameter that indicates the location fo the item score distribution in a population of persons
classical item attractiveness
of a typical performance item is a parameter that indicates the location of the item score distribution in a population of persons
item p value
the mean of a dichotomously scored item
classical item discrimination
parameter that indicates the extent to which the item differentiates between the true test scores of a population of persons
the item-rest correlation
is the product moment correlation between the item scores and the rest scores of the test, where the studied item is deleted
popularity of a distractor of a multiple choice item
is the proportion of persons of a population who selected the distractor
item distractor-rest correlations
are the product moment of correlations of the separate dichotomous correct answer/distractor variables and the rest scores
the best thing to look at when you want to make a statement about how well an item discriminates between persons
at the item rest correlation
the difference between a correlation and a covariance
the covariance depends on the measurement scale of the variables. the correlation doesn’t
which measurements are on the same measurement scale
variance and covariance
reliability is
that part of the variance that is the true-score variance
the reliability increases as the
measurement error variance decreases
the reliability of a test increases as
the covariances between the items increase
what are the requirements that parallel tests would need to fulfil?
the parallel tests must have the same within person error variance for each of the test takers of the population, the errors of measurements of parallel tests must nog be correlated across repeated test administrations, the parallel tests must measure the same true score for each of the test takers of the population
reliability
between persons aspect of measurement precision and informs us about the consistency of the test
validity
whether a reliable test measures consistently what it is supposed to measure