Assesment 1 Flashcards
Measurement theory
A branch of applied mathematics that is useful in measurement & data analysis
The fundamental idea of MT is that measurements are not the same as the attribute being measured
Hence, if you want to draw conclusions about the attribute, you must take into account the nature of the correspondence between the attribute & the measurements
Correlation
Magnitude & direction of the relationship between an IV (predictor) & DV (criterion)
Linear regression= y=ax+b
Shows a relationship, not a comparison between variables
Not a statistical technique for hypothesis testing, can be used after hypothesis testing to calculate effect size
Curvilinear relationship
As one variable increases, so does the other variables, but only up to a certain point, after which, as one variable continues to increase, the other decreases
e.g. Yerkes Dodson law
Spurious correlation
Coincidental correlation
Can be misleading
Correlations need to be theoretically driven
Measurement theory: accuracy & precision
Accuracy= closeness of a measured value to a standard or known value e.g. sample with mean IQ score of 120 is not accurate
Precision= closeness of 2 or more measurements to each other e.g. IQ score in 3 different sessions
Latent variables
Latent variables= constructs are unobserved, hidden or latent variables inferred from the data collected on related observable variables
SEM= multivariate stats analysis technique used to analyse structural relationships among observable & unobserved (latent) variables
Implies a structure for the covariences between the observed variables
Latent variable models
The relationship between the observable & the unobservable quantities is described by a mathematical function
Classical test theory
Behavioural perspective
Measures the overall score on a test
Manifest behaviour is the unique reason representation of a construct, with no consideration to latent traits
Assumes the existence of the measurement error
Therefore aims to elaborate strategies (statistics) to control or evaluate the magnitude of error
Unit of analysis is the whole test (item sum or mean)
CTT evaluation
1) standard error of measurement applies to all scores in particular pop
2) longer tests more reliable than shorter
3) test scores obtain meaning by comparing their position in a norm group
4) unbiased assessment of item properties depends on having representative samples
Item response theory
Cognitive perspective
The answer a subject gives to an item depends on his or her level on the latent trait, the magnitude of his or her theta
Proposes the validation of items & not of tests
This favours the composition of large groups of independent items that can be used to create or customise different tests for different purposes
Unidimensional IRT
Premise that the interactions of a person with test items can be adequately represented by a mathematical expression containing a single parameter describing the characteristics of the person
Assumptions 1 & 2 of unidimensional IRT
1) unidimensionality= a single latent trait variable is sufficient to explain the common variance among item responses
2) local independence= the response of any person to any test item is assumed to depend solely on the persons single parameter & the items vector of parameters, LI is evidence for unidimensionality if the IRT model contains person parameters on only 1 dimension
Implications of local independence
Probability of a collection of responses can be determined by multiplying the probabilities of each of the individual responses
Assumptions 3 & 4 of unidimensional IRT
3) the characteristics of a test item remain constant over all of the situations where it is used
4) monotonocity= probability of correct responses to the test item increases of does not decrease as the locations of examinees increase on the coordinate dimension
IRT evaluation
1) standard error of measurement differs across scores but generalised across pops
2) shorter tests can be more reliable than longer ones
3) test score obtain meaning by comparing their distance from items
4) unbiased estimates of item properties may be obtained from unrepresentative samples
Test blueprint (specifications)
Tells you exactly what skills will be tested & how many points each question worth
May include important details
Ensures that test assesses level or depth of learning you want to measure
Development of test specs more common for skills tests
Test specs indicate what dimensions & descriptors can be evaluated in the test & in what proportions
The operationalisation of subjective concepts enables their measurement from set of descriptors, which will represent in the future of the phenomenon under investigation
Specific criteria for development of items
1) Behavioural= items must express a behaviour, not an abstraction
2) Objectivity= items should allow for a right or wrong response
3) Simplicity= items should express a single idea in order to avoid ambiguities
4) Clarity= items should be intelligible even to the lowest level of target pop, short sentences with simple & unambiguous expressions
5) Relevance= items must be consistent with the psych trait & other items covering the same construct
6) precision= items must have a position defined in the attribute continuum & be distinct from other items that cover the same continuum
7) neutrality= do not use extreme expressions e.g. excellent, miserable, the magnitude of the persons reaction is given in the response scale
Specific criteria regarding test development
1) amplitude= range of simple descriptors to more complex descriptors
2) balance of amplitude
Validity
The extent to which a test accurately measures what it is intended to measure
Validity: CTT
1st period (1900-1950)= content-related validity (content & face validity)
2nd period (1950-1970)= criterion-related validity (concurrent & predictive validity)
3rd period (1970-current)= construct-related validity (convergent & discriminant validity)
Content related validity
Systematic review of the test content to determine if items cover a representative sample of the universe behaviours to be measured & to determine if the choice of items is appropriate & relevant
Global, mostly non-statistical procedure
Content-related validity: panel of experts (raters)
A consultation with experts in area of evaluated construct, analyse the representativeness of items in relation to theory on construct & assist in adequacy of items to the target pop
Average of 5 judges
Agreement rate of 80% expected between judges
Asked to make suggestions of improvement
Content related validity: pilot study
Procedure seeks to verify whether items have been well understood by target audience
Content validity can be demonstrated statistically by item analysis
Content related validity: face validity
Not a statistical or numerical technique
Regards whether a test is an apparent measure of its associated criterion
Involves language & layout- the way content is presented
The test should look good to test takers
Content validity procedures
1) setting test goals
2) selection of the universe of behaviours that appear to measure the construct (pool of items)
3) item development
4) item analysis
5) final choice of test items
Criterion-related validity
Extent to which a measure is related to either a present or future outcome
Such evidence is provided by high correlations between a test & well-defined criterion measure (ideal is above 0.5)
Criterion will depend on type of construct evaluated e.g. academic performance or group dynamics
Criterion is a standard on which a test will be compared against
Criterion-related validity: concurrent validity
Derived from assessments of simultaneous relationship between year & criterion such as between learning disability test & school performance
Involves determining the current status of a person in relation to some classification scheme, such as diagnostic categories
Criterion-related validity: predictive validity
The extent to which a score on a scale or test predicts scores on some criterion
Construct related validity; construct validity
The degree to which a test measured what it claims or purports to be measuring
Whether a scale or test measures construct adequately
From the construct validity, the degree to which a person has a certain characteristic is inferred
Failures in the validation process may stem from instrument, it’s administration or the theory
Construct validity
Content, criterion & construct
Construct-related validity: convergent validity
Degree to which 2 instruments measuring the same construct are theoretically & empirically related
Ideal correlation is above 0.5
Convergent-related validity; discrimination validity
If 2 measures of the same quality show higher correlations, then 2 measures that do not assess the same quality should not
2 tests of unrelated constructs should have low correlations- they should discriminate between 2 qualities that are not related to each over
May also be related to the ability of an instrument to discriminate groups of subjects that have a high magnitude in an attribute against subjects who have low magnitude
Ideal correlation in predictive validity is below 0.3
Also called divergent validity
IRT validity
Can be used to investigate any type of test, whether measuring abilities or attitudes
Content validity & proper representation of latent trait under measurement is one of primary concerns of IRT, hence development of specifications matrix
Items are developed via TRI in systematic- based in descriptors- rather in an intuitive way as CTT does
The remaining steps for the test development & therefore content validity are similar to those deployed by CTT
Content validity procedures
1) setting goals
2) identification of dimensions & descriptors
3) specification matrix
4) item development
5) item analysis
6) final choice of test items
Criterion related validity
Not common but adoption of specific stats methods for study of criterion validity, non-linear correlations can offer more precision for correlation tests
Criterion could be behaviour or another instrument
Factor analysis
A procedure for reducing scores on many variables to scores on a smaller number of ‘factors’
Purpose is to explore underlying variance structure of set of correlation coefficients
Thus, factor analysis is useful for exploring & verifying patterns in set of correlation coefficients
Full info factor analysis
Most modern technique for study of construct unidimensionality
Does not require computation of intercorrelations between items since they are based on individuals pattern of response rather than on correlational structure of the multivariate latent response distribution
These models are therefore ‘full info’ models
To attest to construct unidimensionality, FA oblique rotation technique used with the extraction of 2 factors in order to verify if they’re correlated
Presence of high correlation between first 2 factors indicates that, even if there is a second order factor, only a single latent trait is being assessed, so the assumption of unidimensionality is met
Acceptable correlation is atleast 0.5
Test info curve
Shows for which range of theta levels the test is particularly valid
Items above the validity range are too difficult & below the range, too easy
Factorial components of variance
The variance of the observed variables in relation to their factors can be divided into several parts
A= specific variance B= common variance C= error variance
Common variance
Covariance between items & factors
The percentage of communality (common variance) defines the quality of the behavioural representation of the latent trait from the observable variables (test items)
Communality= the sum of squared factor loadings in all factors- the proportion of each variables variance that can be explained by the factors
True variance
Reliability-> true variance
True variance= communality + specificity
Uniqueness= complement of communality, corresponds to difference between total variance & value of communality
Factor analysis advantages
Most used technique for validation of psych instruments
Reduces large number of variables to manageable item pool
Factor analysis disadvantages
Usefulness depends on ability of the researchers to develop a set of conditions that make the technique viable
Subjectivity of factors
Traditional FA assumes that the relations between the variables are typically linear
Level of measurement
In conventional factor analysis, all variables must be quantitative
Important to test normality of variables, seeking to control large deviations of normality
Reliability for CTT
For CTT; reliability was most investigated psychometric parameter & several statistical techniques were developed to estimate it
Corresponds to consistency of scores obtained by same individuals when they are re-examined with same test at different times, different sets of equivalent items or under other variable conditions of examination
Also concerns accuracy of measure or internal consistency (item-total correlation) between items of test
Error variance
Any condition that is irrelevant to objectives of a test represents error variance
Factors reducing error variance= standardisation of test administration, environmental control, instructions, true limits, normative sample
Test-retest reliability
Reliability coefficient is correlation between scores obtained by same individual in 2 different administrations of same test
Coefficient of stability or constancy
Ideal correlation is >0.6
Error variance source= interval between test administrations
Split half reliability
Reliability coefficient is correlation between scores obtained by same individuals in same test divided into 2 equivalent halves
Coefficient of internal consistency
Ideal correlation is >0.6
Error variance source= content sampling
Inter-rater reliability
Reliability coefficient is correlation between scores assigned by each rater
Error variance source= different raters perceptions
Parallel (equivalent) forms
Reliability coefficient is correlation between scores obtained by same respondents in application or equivalent forms of test
Coefficient of equivalence
Ideal correlation is >0.7
True variance greater than error variance
Error variance source= content sampling
Kuder-Richardson
Reliability coefficient is correlation between scores obtained in each item of test
Kuder-Richardson coefficient
Kuder-Richardson 20 formula used when items are dichotomous
Ideal coefficient is between 0.7-0.9
Error variance source= content sampling & content heterogeneity
Cronbach’s alpha
Reliability coefficient is correlation between scores obtained in each item of the test
Similar to Kuder-Richardson technique
Cronbach’s alpha formula is modification of Kuder-Richardson equation & reflects magnitude of covariance among items
Varies from 0-1 (1=100% internal consistency)
Ideal correlation is between 0.7-0.9
Error variance source= content sampling & content heterogeneity
Factors affecting reliability
> sample variability, >reliability= >test accuracy
> number of items, >reliability= >test accuracy
Reliability for IRT
Reliability not as important as for CTT, though is part of validation process
Reliability can be obtained by estimating item information curve, which corresponds to graphical analysis that shows level of theta to which item brings maximum information
If amount of info small, ability can not be estimated with precision & estimates will be widely scattered about true ability
Item info function depends on item parameters a,b & x
Standardisation
The meaning of test scores derives from the frames of reference we use to interpret them & from the context in which the scores are obtained
Standardisation refers to the need for uniformity in all procedures while using a valid & accurate test
Norm
Created with scores from respondents who participated in normative study
Expected to be drawn from representative sample (normative group)
Norm-referenced tests
Norms usually presented in form of tables with descriptive stats that summarise performance of groups or groups in question
When norms are collected from test performance of groups of people, these reference groups are labelled normative or standardised samples
Norms typically created from calculation of percentiles & standard scores
Mean & SD are main stats used to create norms
Norm-referenced tests continued
Norms most widely used frame of reference for interpreting test scores
Used for test takers comparison
Score used to place test takers performance within a pre-existing distribution of scores or data obtained from performance of suitable comparison group
These scores can be obtained by linear/non-linear transformations of obtained data
Z-scores
Non-normalised standard score (linear)
Transforms original scores of group measures in units of SD
Normally distributed scores of tests with different means, standard deviations & score ranges can be meaningfully compared once they have been linearly transformed into common scale as long as same reference group used
T-scores
Non-normalised standard score (linear)
Same purpose as z-scores, with advantage of eliminating negative numbers & decimal numbers