Assesment 1 Flashcards
Measurement theory
A branch of applied mathematics that is useful in measurement & data analysis
The fundamental idea of MT is that measurements are not the same as the attribute being measured
Hence, if you want to draw conclusions about the attribute, you must take into account the nature of the correspondence between the attribute & the measurements
Correlation
Magnitude & direction of the relationship between an IV (predictor) & DV (criterion)
Linear regression= y=ax+b
Shows a relationship, not a comparison between variables
Not a statistical technique for hypothesis testing, can be used after hypothesis testing to calculate effect size
Curvilinear relationship
As one variable increases, so does the other variables, but only up to a certain point, after which, as one variable continues to increase, the other decreases
e.g. Yerkes Dodson law
Spurious correlation
Coincidental correlation
Can be misleading
Correlations need to be theoretically driven
Measurement theory: accuracy & precision
Accuracy= closeness of a measured value to a standard or known value e.g. sample with mean IQ score of 120 is not accurate
Precision= closeness of 2 or more measurements to each other e.g. IQ score in 3 different sessions
Latent variables
Latent variables= constructs are unobserved, hidden or latent variables inferred from the data collected on related observable variables
SEM= multivariate stats analysis technique used to analyse structural relationships among observable & unobserved (latent) variables
Implies a structure for the covariences between the observed variables
Latent variable models
The relationship between the observable & the unobservable quantities is described by a mathematical function
Classical test theory
Behavioural perspective
Measures the overall score on a test
Manifest behaviour is the unique reason representation of a construct, with no consideration to latent traits
Assumes the existence of the measurement error
Therefore aims to elaborate strategies (statistics) to control or evaluate the magnitude of error
Unit of analysis is the whole test (item sum or mean)
CTT evaluation
1) standard error of measurement applies to all scores in particular pop
2) longer tests more reliable than shorter
3) test scores obtain meaning by comparing their position in a norm group
4) unbiased assessment of item properties depends on having representative samples
Item response theory
Cognitive perspective
The answer a subject gives to an item depends on his or her level on the latent trait, the magnitude of his or her theta
Proposes the validation of items & not of tests
This favours the composition of large groups of independent items that can be used to create or customise different tests for different purposes
Unidimensional IRT
Premise that the interactions of a person with test items can be adequately represented by a mathematical expression containing a single parameter describing the characteristics of the person
Assumptions 1 & 2 of unidimensional IRT
1) unidimensionality= a single latent trait variable is sufficient to explain the common variance among item responses
2) local independence= the response of any person to any test item is assumed to depend solely on the persons single parameter & the items vector of parameters, LI is evidence for unidimensionality if the IRT model contains person parameters on only 1 dimension
Implications of local independence
Probability of a collection of responses can be determined by multiplying the probabilities of each of the individual responses
Assumptions 3 & 4 of unidimensional IRT
3) the characteristics of a test item remain constant over all of the situations where it is used
4) monotonocity= probability of correct responses to the test item increases of does not decrease as the locations of examinees increase on the coordinate dimension
IRT evaluation
1) standard error of measurement differs across scores but generalised across pops
2) shorter tests can be more reliable than longer ones
3) test score obtain meaning by comparing their distance from items
4) unbiased estimates of item properties may be obtained from unrepresentative samples
Test blueprint (specifications)
Tells you exactly what skills will be tested & how many points each question worth
May include important details
Ensures that test assesses level or depth of learning you want to measure
Development of test specs more common for skills tests
Test specs indicate what dimensions & descriptors can be evaluated in the test & in what proportions
The operationalisation of subjective concepts enables their measurement from set of descriptors, which will represent in the future of the phenomenon under investigation
Specific criteria for development of items
1) Behavioural= items must express a behaviour, not an abstraction
2) Objectivity= items should allow for a right or wrong response
3) Simplicity= items should express a single idea in order to avoid ambiguities
4) Clarity= items should be intelligible even to the lowest level of target pop, short sentences with simple & unambiguous expressions
5) Relevance= items must be consistent with the psych trait & other items covering the same construct
6) precision= items must have a position defined in the attribute continuum & be distinct from other items that cover the same continuum
7) neutrality= do not use extreme expressions e.g. excellent, miserable, the magnitude of the persons reaction is given in the response scale
Specific criteria regarding test development
1) amplitude= range of simple descriptors to more complex descriptors
2) balance of amplitude
Validity
The extent to which a test accurately measures what it is intended to measure
Validity: CTT
1st period (1900-1950)= content-related validity (content & face validity)
2nd period (1950-1970)= criterion-related validity (concurrent & predictive validity)
3rd period (1970-current)= construct-related validity (convergent & discriminant validity)
Content related validity
Systematic review of the test content to determine if items cover a representative sample of the universe behaviours to be measured & to determine if the choice of items is appropriate & relevant
Global, mostly non-statistical procedure
Content-related validity: panel of experts (raters)
A consultation with experts in area of evaluated construct, analyse the representativeness of items in relation to theory on construct & assist in adequacy of items to the target pop
Average of 5 judges
Agreement rate of 80% expected between judges
Asked to make suggestions of improvement
Content related validity: pilot study
Procedure seeks to verify whether items have been well understood by target audience
Content validity can be demonstrated statistically by item analysis
Content related validity: face validity
Not a statistical or numerical technique
Regards whether a test is an apparent measure of its associated criterion
Involves language & layout- the way content is presented
The test should look good to test takers