Assesment 1 Flashcards

1
Q

Measurement theory

A

A branch of applied mathematics that is useful in measurement & data analysis

The fundamental idea of MT is that measurements are not the same as the attribute being measured
Hence, if you want to draw conclusions about the attribute, you must take into account the nature of the correspondence between the attribute & the measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation

A

Magnitude & direction of the relationship between an IV (predictor) & DV (criterion)

Linear regression= y=ax+b

Shows a relationship, not a comparison between variables

Not a statistical technique for hypothesis testing, can be used after hypothesis testing to calculate effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Curvilinear relationship

A

As one variable increases, so does the other variables, but only up to a certain point, after which, as one variable continues to increase, the other decreases
e.g. Yerkes Dodson law

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Spurious correlation

A

Coincidental correlation
Can be misleading
Correlations need to be theoretically driven

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measurement theory: accuracy & precision

A

Accuracy= closeness of a measured value to a standard or known value e.g. sample with mean IQ score of 120 is not accurate

Precision= closeness of 2 or more measurements to each other e.g. IQ score in 3 different sessions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Latent variables

A

Latent variables= constructs are unobserved, hidden or latent variables inferred from the data collected on related observable variables

SEM= multivariate stats analysis technique used to analyse structural relationships among observable & unobserved (latent) variables
Implies a structure for the covariences between the observed variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Latent variable models

A

The relationship between the observable & the unobservable quantities is described by a mathematical function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classical test theory

A

Behavioural perspective

Measures the overall score on a test
Manifest behaviour is the unique reason representation of a construct, with no consideration to latent traits

Assumes the existence of the measurement error
Therefore aims to elaborate strategies (statistics) to control or evaluate the magnitude of error
Unit of analysis is the whole test (item sum or mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CTT evaluation

A

1) standard error of measurement applies to all scores in particular pop
2) longer tests more reliable than shorter
3) test scores obtain meaning by comparing their position in a norm group
4) unbiased assessment of item properties depends on having representative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Item response theory

A

Cognitive perspective

The answer a subject gives to an item depends on his or her level on the latent trait, the magnitude of his or her theta

Proposes the validation of items & not of tests
This favours the composition of large groups of independent items that can be used to create or customise different tests for different purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unidimensional IRT

A

Premise that the interactions of a person with test items can be adequately represented by a mathematical expression containing a single parameter describing the characteristics of the person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions 1 & 2 of unidimensional IRT

A

1) unidimensionality= a single latent trait variable is sufficient to explain the common variance among item responses
2) local independence= the response of any person to any test item is assumed to depend solely on the persons single parameter & the items vector of parameters, LI is evidence for unidimensionality if the IRT model contains person parameters on only 1 dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Implications of local independence

A

Probability of a collection of responses can be determined by multiplying the probabilities of each of the individual responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assumptions 3 & 4 of unidimensional IRT

A

3) the characteristics of a test item remain constant over all of the situations where it is used
4) monotonocity= probability of correct responses to the test item increases of does not decrease as the locations of examinees increase on the coordinate dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IRT evaluation

A

1) standard error of measurement differs across scores but generalised across pops
2) shorter tests can be more reliable than longer ones
3) test score obtain meaning by comparing their distance from items
4) unbiased estimates of item properties may be obtained from unrepresentative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test blueprint (specifications)

A

Tells you exactly what skills will be tested & how many points each question worth

May include important details

Ensures that test assesses level or depth of learning you want to measure

Development of test specs more common for skills tests

Test specs indicate what dimensions & descriptors can be evaluated in the test & in what proportions

The operationalisation of subjective concepts enables their measurement from set of descriptors, which will represent in the future of the phenomenon under investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Specific criteria for development of items

A

1) Behavioural= items must express a behaviour, not an abstraction
2) Objectivity= items should allow for a right or wrong response
3) Simplicity= items should express a single idea in order to avoid ambiguities
4) Clarity= items should be intelligible even to the lowest level of target pop, short sentences with simple & unambiguous expressions
5) Relevance= items must be consistent with the psych trait & other items covering the same construct
6) precision= items must have a position defined in the attribute continuum & be distinct from other items that cover the same continuum
7) neutrality= do not use extreme expressions e.g. excellent, miserable, the magnitude of the persons reaction is given in the response scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Specific criteria regarding test development

A

1) amplitude= range of simple descriptors to more complex descriptors
2) balance of amplitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Validity

A

The extent to which a test accurately measures what it is intended to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Validity: CTT

A

1st period (1900-1950)= content-related validity (content & face validity)

2nd period (1950-1970)= criterion-related validity (concurrent & predictive validity)

3rd period (1970-current)= construct-related validity (convergent & discriminant validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Content related validity

A

Systematic review of the test content to determine if items cover a representative sample of the universe behaviours to be measured & to determine if the choice of items is appropriate & relevant

Global, mostly non-statistical procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Content-related validity: panel of experts (raters)

A

A consultation with experts in area of evaluated construct, analyse the representativeness of items in relation to theory on construct & assist in adequacy of items to the target pop

Average of 5 judges

Agreement rate of 80% expected between judges

Asked to make suggestions of improvement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Content related validity: pilot study

A

Procedure seeks to verify whether items have been well understood by target audience

Content validity can be demonstrated statistically by item analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Content related validity: face validity

A

Not a statistical or numerical technique
Regards whether a test is an apparent measure of its associated criterion
Involves language & layout- the way content is presented
The test should look good to test takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Content validity procedures

A

1) setting test goals
2) selection of the universe of behaviours that appear to measure the construct (pool of items)
3) item development
4) item analysis
5) final choice of test items

26
Q

Criterion-related validity

A

Extent to which a measure is related to either a present or future outcome

Such evidence is provided by high correlations between a test & well-defined criterion measure (ideal is above 0.5)

Criterion will depend on type of construct evaluated e.g. academic performance or group dynamics

Criterion is a standard on which a test will be compared against

27
Q

Criterion-related validity: concurrent validity

A

Derived from assessments of simultaneous relationship between year & criterion such as between learning disability test & school performance

Involves determining the current status of a person in relation to some classification scheme, such as diagnostic categories

28
Q

Criterion-related validity: predictive validity

A

The extent to which a score on a scale or test predicts scores on some criterion

29
Q

Construct related validity; construct validity

A

The degree to which a test measured what it claims or purports to be measuring

Whether a scale or test measures construct adequately

From the construct validity, the degree to which a person has a certain characteristic is inferred

Failures in the validation process may stem from instrument, it’s administration or the theory

30
Q

Construct validity

A

Content, criterion & construct

31
Q

Construct-related validity: convergent validity

A

Degree to which 2 instruments measuring the same construct are theoretically & empirically related

Ideal correlation is above 0.5

32
Q

Convergent-related validity; discrimination validity

A

If 2 measures of the same quality show higher correlations, then 2 measures that do not assess the same quality should not

2 tests of unrelated constructs should have low correlations- they should discriminate between 2 qualities that are not related to each over

May also be related to the ability of an instrument to discriminate groups of subjects that have a high magnitude in an attribute against subjects who have low magnitude

Ideal correlation in predictive validity is below 0.3

Also called divergent validity

33
Q

IRT validity

A

Can be used to investigate any type of test, whether measuring abilities or attitudes
Content validity & proper representation of latent trait under measurement is one of primary concerns of IRT, hence development of specifications matrix
Items are developed via TRI in systematic- based in descriptors- rather in an intuitive way as CTT does
The remaining steps for the test development & therefore content validity are similar to those deployed by CTT

34
Q

Content validity procedures

A

1) setting goals
2) identification of dimensions & descriptors
3) specification matrix
4) item development
5) item analysis
6) final choice of test items

35
Q

Criterion related validity

A

Not common but adoption of specific stats methods for study of criterion validity, non-linear correlations can offer more precision for correlation tests

Criterion could be behaviour or another instrument

36
Q

Factor analysis

A

A procedure for reducing scores on many variables to scores on a smaller number of ‘factors’

Purpose is to explore underlying variance structure of set of correlation coefficients
Thus, factor analysis is useful for exploring & verifying patterns in set of correlation coefficients

37
Q

Full info factor analysis

A

Most modern technique for study of construct unidimensionality

Does not require computation of intercorrelations between items since they are based on individuals pattern of response rather than on correlational structure of the multivariate latent response distribution
These models are therefore ‘full info’ models

To attest to construct unidimensionality, FA oblique rotation technique used with the extraction of 2 factors in order to verify if they’re correlated
Presence of high correlation between first 2 factors indicates that, even if there is a second order factor, only a single latent trait is being assessed, so the assumption of unidimensionality is met
Acceptable correlation is atleast 0.5

38
Q

Test info curve

A

Shows for which range of theta levels the test is particularly valid

Items above the validity range are too difficult & below the range, too easy

39
Q

Factorial components of variance

A

The variance of the observed variables in relation to their factors can be divided into several parts

A= specific variance 
B= common variance
C= error variance
40
Q

Common variance

A

Covariance between items & factors

The percentage of communality (common variance) defines the quality of the behavioural representation of the latent trait from the observable variables (test items)

Communality= the sum of squared factor loadings in all factors- the proportion of each variables variance that can be explained by the factors

41
Q

True variance

A

Reliability-> true variance
True variance= communality + specificity

Uniqueness= complement of communality, corresponds to difference between total variance & value of communality

42
Q

Factor analysis advantages

A

Most used technique for validation of psych instruments

Reduces large number of variables to manageable item pool

43
Q

Factor analysis disadvantages

A

Usefulness depends on ability of the researchers to develop a set of conditions that make the technique viable

Subjectivity of factors

Traditional FA assumes that the relations between the variables are typically linear

44
Q

Level of measurement

A

In conventional factor analysis, all variables must be quantitative

Important to test normality of variables, seeking to control large deviations of normality

45
Q

Reliability for CTT

A

For CTT; reliability was most investigated psychometric parameter & several statistical techniques were developed to estimate it

Corresponds to consistency of scores obtained by same individuals when they are re-examined with same test at different times, different sets of equivalent items or under other variable conditions of examination

Also concerns accuracy of measure or internal consistency (item-total correlation) between items of test

46
Q

Error variance

A

Any condition that is irrelevant to objectives of a test represents error variance

Factors reducing error variance= standardisation of test administration, environmental control, instructions, true limits, normative sample

47
Q

Test-retest reliability

A

Reliability coefficient is correlation between scores obtained by same individual in 2 different administrations of same test

Coefficient of stability or constancy

Ideal correlation is >0.6

Error variance source= interval between test administrations

48
Q

Split half reliability

A

Reliability coefficient is correlation between scores obtained by same individuals in same test divided into 2 equivalent halves

Coefficient of internal consistency

Ideal correlation is >0.6

Error variance source= content sampling

49
Q

Inter-rater reliability

A

Reliability coefficient is correlation between scores assigned by each rater

Error variance source= different raters perceptions

50
Q

Parallel (equivalent) forms

A

Reliability coefficient is correlation between scores obtained by same respondents in application or equivalent forms of test

Coefficient of equivalence

Ideal correlation is >0.7
True variance greater than error variance

Error variance source= content sampling

51
Q

Kuder-Richardson

A

Reliability coefficient is correlation between scores obtained in each item of test

Kuder-Richardson coefficient

Kuder-Richardson 20 formula used when items are dichotomous

Ideal coefficient is between 0.7-0.9

Error variance source= content sampling & content heterogeneity

52
Q

Cronbach’s alpha

A

Reliability coefficient is correlation between scores obtained in each item of the test
Similar to Kuder-Richardson technique

Cronbach’s alpha formula is modification of Kuder-Richardson equation & reflects magnitude of covariance among items
Varies from 0-1 (1=100% internal consistency)

Ideal correlation is between 0.7-0.9

Error variance source= content sampling & content heterogeneity

53
Q

Factors affecting reliability

A

> sample variability, >reliability= >test accuracy

> number of items, >reliability= >test accuracy

54
Q

Reliability for IRT

A

Reliability not as important as for CTT, though is part of validation process

Reliability can be obtained by estimating item information curve, which corresponds to graphical analysis that shows level of theta to which item brings maximum information

If amount of info small, ability can not be estimated with precision & estimates will be widely scattered about true ability

Item info function depends on item parameters a,b & x

55
Q

Standardisation

A

The meaning of test scores derives from the frames of reference we use to interpret them & from the context in which the scores are obtained

Standardisation refers to the need for uniformity in all procedures while using a valid & accurate test

56
Q

Norm

A

Created with scores from respondents who participated in normative study

Expected to be drawn from representative sample (normative group)

57
Q

Norm-referenced tests

A

Norms usually presented in form of tables with descriptive stats that summarise performance of groups or groups in question

When norms are collected from test performance of groups of people, these reference groups are labelled normative or standardised samples

Norms typically created from calculation of percentiles & standard scores

Mean & SD are main stats used to create norms

58
Q

Norm-referenced tests continued

A

Norms most widely used frame of reference for interpreting test scores

Used for test takers comparison

Score used to place test takers performance within a pre-existing distribution of scores or data obtained from performance of suitable comparison group

These scores can be obtained by linear/non-linear transformations of obtained data

59
Q

Z-scores

A

Non-normalised standard score (linear)

Transforms original scores of group measures in units of SD

Normally distributed scores of tests with different means, standard deviations & score ranges can be meaningfully compared once they have been linearly transformed into common scale as long as same reference group used

60
Q

T-scores

A

Non-normalised standard score (linear)

Same purpose as z-scores, with advantage of eliminating negative numbers & decimal numbers