Measurement Theory and Assessment 1 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Psychometrician

A

Specialist in psychology or education who develops and evaluates psychological tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test

A

A standardised measure for sampling behaviour which describes it using categories or scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Characteristics of a test

A
  • standardised procedure
  • for a specific sample of behaviour
  • uses scores or categories
  • uses norms or standards
  • makes a prediction of non-test behaviour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Norm-referenced test

A

Performance of the examinee is referenced to standardisation sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Criterion-referenced test

A

Determines where the examinee stands, regarding tightly defined educational objectives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assessment

A

Appraising/estimating the magnitude of one or more attributes in a person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Group tests

A

Suitable to the testing of large groups of individuals simultaneously (e.g. pen-and-paper tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Individual tests

A

Designed to be administered one-on-one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of psychological tests

A
  • intelligence tests
  • aptitude tests
  • achievement tests
  • creativity tests
  • personality tests
  • interest inventories
  • behavioural procedures
  • neuropsychological tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Responsibilities of test publishers

A
  • publication and marketing issues
  • competence of test purchasers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Responsibilities of test users

A
  • best interests of the client
  • confidentiality and the duty to warn
  • expertise of the test user
  • informed consent
  • obsolete tests and the standard of care
  • responsible report writing
  • communication of test results
  • consideration of individual differences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Diagnostics

A

Getting to know a situation in order to be able to make a decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Psychodiagnostics

A

Getting to know an individual’s psychosocial functioning
- reliable and valid description of their psychosocial reality
- find possible explanations for problems
- test possible explanations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Scientific diagnostics

A
  • ideally repeatable
  • ideally approach reality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Uses of tests

A
  • problem analysis
  • classification and diagnosis
  • treatment planning
  • program/treatment evaluation
  • self-knowledge
  • scientific research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Committee on tests and testing in The Netherlands (COTAN)

A

Criteria
- principles of test construction
- goal
- group
- function
- standardisation
- quality of test material
- quality of test manual
- norms: representative reference group
- reliability: consistency, repeatability
- validity: does the test assess what it aims to?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Tests need to:

A
  • be relevant
  • be performed by qualified individuals
  • have role integrity
  • be confidential
  • have informed consent
  • be independent and objective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Classical test theory

A

Test scores are influenced by two factors: consistency factors and inconsistency factors

X = T +e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sources of measurement error

A
  1. item selection: choosing an instrument/parts of an instrument
  2. test administration: general environment aspects, countenance of an examiner
  3. test scoring: subjectively scored tests are vulnerable to mistakes/bias from the scorer
  4. systematic measurement error: consistent error where something unwanted is measured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correlation coefficient (r)

A

Degree of linear relationship between two sets of scores obtained from the same people

Range: -1.00 to 1.00
Positive correlation (r > 0.00) or negative correlation (r < 0)
The closer r is to 1 (as an absolute value), the stronger the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Test-retest reliability

A

Administering an identical test to the same sample group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Alternate forms reliability

A

Two tests are independently created to measure the same thing; typical have same (or similar) means and standard deviations; correlation of test groups (from the same sample group) should be strong and positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Split-half reliability

A

Correlate scores from the 1st and 2nd half of a test to each other (instead of administering 2 tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Spearman-Brown formula

A

Corrects for the underestimation of reliability when using split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Coefficient alpha (Cronbach’s alpha)

A

Mean of all possible split-half coefficients, corrected by the Spearman-Brown formula

Range: 0.00 to 1.00
Index of internal consistency of the items; tendency for items to correlate positively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Kuder-Richardson formula

A

Similar to Cronbach’s formula, used for tests with only two answer options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Interscorer reliability

A

A sample of tests is independently scored by two or more examiners; scores for the tests from each examiner are correlated (should have a strong, positive correlation). Used for subjective scoring tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Systematic errors

A
  • either positive or negative
  • average measurement error is not 0
  • can be due to test construction/an inconsistency in the assessed construct
  • serve as a measure of validity - how well is the test measuring what it is supposed to
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Unsystematic erros

A
  • are random and unpredictable
  • are both positive and negative
  • average measurement error is 0
  • are not related to the true score
  • are a measure of reliability - affects the consistency of scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Raw score

A

Most basic information provided by a psychological test

e.g. how many questions were answered correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Norm group

A

Sample of examinees, representative of the population for whom the test is intended

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Norm-referenced test

A

Results of an examinee are interpreted using the instrument’s corresponding norms

33
Q

Measurements of central tendency

A

Mean: average, good for normally distributed data

Median: middle number/score, better than mean when distribution of data is skewed, used for percentiles

Mode: most common score, shows the peak on a skewed distribution§

34
Q

Percentile

A

Percentage of people who scored below a specific raw score (e.g. score of 25 > 94th percentile, 94% of participants scored below 25)

35
Q

Standard score

A

Distance from the mean in standard deviation units, aka a z-score

36
Q

T-scores

A

Transformation of z-scores to avoid negative and decimated numbers
M = 50, SD = 10

T = 10z + 50

37
Q

Stanine

A

Raw scores converted to a system using 1 to 9
M = 5, SD ≈ 2
Scores are ranked lowest to highest then put into numbers bij percentage:
1 > bottom 4%
2 > next 7%
3 > next 12%
4 > next 17%
5 > next 20%
6 > next 17%
7 > next 12%
8 > next 7%
9 > next 4%

38
Q

Random sampling

A

Each member of the population (or subset thereof) has an equal chance of getting selected

39
Q

Stratified random sampling

A

Create strata (groups) from the population based on certain demographics then selecting the sample randomly (can be proportional)

40
Q

Expectancy table

A

Shows the relationship between test scores and the expected outcomes on a different, relevant task

e.g. scores on a scholastic aptitude test and subsequent college grade point average

41
Q

Criterion-referenced test

A

Compare the examinee’s score to a predefined performance standard

Used often for education purposes

42
Q

Most commonly used confidence intervals (CI)

A

68% CI > X ± 1*SD
90% CI

43
Q

Most commonly used confidence intervals (CI)

A

68% CI > X ± 1SD
90% CI > X ± 1.65
SD
95% CI > X ± 1.96SD
99% CI > X ± 3
SD

X = score
SD = standard deviation

44
Q

Difference score (SEdiff)

A

Used to determine if the difference between pre- and post-treatment scores is valid (or due to the unreliability/validity of the test)

45
Q

Relative norms

A

Goal: classify on a continuum
Interpretation: control group
Items should maximally discriminate

46
Q

Absolute norms

A

Goal: determine if criterion has been reached
Interpretation: previously determined criterium
Items should be relevant to criterium
Especially used in education

47
Q

Norms

A

Summary of distribution of characteristics in a representative sample

Need to be up-to-date:
- 15 years > outdated
- 20 years > unusable

48
Q

Summative assessment

A
  • used for selection, qualification, or prognosis
  • assessment of learning

e.g. course exam

49
Q

Formative assessment

A
  • strengths & weaknesses
  • aimed at instruction (compare with own scores or those of peers)
  • assessment for learning

e.g. polls & feedback on a report

50
Q

Flynn effect

A

IQ increases by 3 points every 10 years

51
Q

Validity

A

A test is valid to the extent that inferences made form it are appropriate, meaningful, and useful

Types of validity:
- content validity
- criterion-related validity
- construct validity

52
Q

Content validity

A

The degree to which the content of a test is representative of the sample of behaviour/construct the test is designed to assess
- affected by a proper selection of items and thorough assessment of the construct
- can be evaluated using an expert panel

53
Q

Face validity

A

Does the test look valid to test users, examiners, and examinees?
- more a matter of social acceptability than a technical form of validity

54
Q

Criterion validity

A

Correlation between an examinee’s test score and the behaviour/construct you want to predict
- concurrent validity
- predictive validity

55
Q

Concurrent validity

A

Assess the behaviour at approximately the same time (usually the same day), using both the predictor and criterion tests

56
Q

Predictive validity

A

Assess the behaviour at separate time (usually a long period in-between), in order to predict future behaviour, predictor test first, then later criterion test

57
Q

Construct validity

A

The extent to which the test/measure accurately assesses what it is supposed to, measured by correlating the test to another test
- convergent validity
- discriminant validity

58
Q

Convergent validity

A

Assess the relationship between the main test scores to those of a test which assesses the same construct
- ideally > good, high correlation

59
Q

Discriminant validity

A

Assess the relationship between the main test scores and test scores on another unrelated test (one which does not assess the same construct)
- ideally > bad/no correlation

60
Q

Test construction process

A
  1. Defining the test
  2. Selecting a scaling method
  3. Constructing the items & analysis
  4. Revising the test
    6 Publishing the test

If test is found to be inadequate after step 4, return to step 3

61
Q

Representative scaling methods

A
  • Expert rankings
  • Likert scales
  • Guttman scales
  • Thurstone scales
  • Absolute scales
  • Empirical scales
62
Q

Item-difficulty index

A

Method for testing items

Proportion of examinees who get the item correct in a tryout; identifies the items which should be altered or discarded from the test

63
Q

Item-reliability index

A

Method for testing items

Items should display internal consistency and good correlation to total test scores

64
Q

Item-validity index

A

Method for testing items

Used to identify predictively useful test items; how well does each item contribute to the overall predictive validity

65
Q

Item-characteristic curves

A

Method for testing items

Graphical display of the relationship between the probability of a correct response and the examinee’s position on the underlying trait being measured by the test

66
Q

Item-discrimination index

A

Method for testing items

Statistical index of how efficiently an item discriminates between people who obtain high and low scores on the entire test

67
Q

Cross validation

A

Method for revising a test

Using the original regression equation in a new sample to determine whether the test still predicts the criterion well

68
Q

Validity shrinkage

A

Method for revising a test

Often, a test predicts the relevant criterion less accurately with a new sample

69
Q

Feedback from examinees

A

Method for revising a test

Receive feedback from the examinees in the try-out sample on the:
- behaviour of examiners
- testing conditions
- clarity of exam instructions
- convenience in using the answer sheet
- perceived suitability of the test
- perceived cultural fairness of the test
- perceived sufficiency of time
- perceived difficulty of the test
- emotional response to the test
- level of guessing
- level/method of cheating by the examinee or others

70
Q

Factor analysis

A

Summarises the interrelationships among a large number variables in a concise and accurate manner as an aid in conceptualisation

71
Q

CHC Theory broad ability factors

A
  • fluid intelligence/reasoning (Gf)
  • crystallised intelligence/knowledge (Gc)
  • domain-specific knowledge (Gkn)
  • visual-spatial abilities (Gv)
  • auditory processing (Ga)
  • broad retrieval (Gr)
  • cognitive processing speed (Gs)
  • decision/reaction time or speed (Gt)
72
Q

Sternberg’s Triarchic Theory of Intelligence

A
  • componential (analytic) intelligence > executive processes
  • experiential (creative) intelligence > dealing with novelty
  • contextual (practical) intelligence > adaptation
73
Q

IQ tests measure…

A
  • problem-solving abilities
  • verbal abilities
  • global capacity vs specific mental functions
  • speed of response and thinking
74
Q

IQ tests do not measure…

A
  • learning competence
  • social competence
75
Q

IQ experts to know…

A
  • Galton: IQ as sensory keenness (speed)
  • Spearman: IQ as a global capacity (g) and specific factors (s)
  • Thurstone: IQ as 7 primary mental abilities
  • Luria: IQ as simultaneous vs successive processing
  • Guilford: IQ as the SOI model; added creativity; model consists of: operations, contents, and products
76
Q

Cattel-Horn-Carroll (CHC) Theory (1968)

A
  • hierarchical structure of intelligence
  • stratum 3: overall capacity (g)
  • stratum 2: broad cognitive abilities
  • stratum 1: narrow cognitive abilities
77
Q

Gardner Multiple Intelligences (1983)

A
  • critique on g > no underlying general factor exists
  • introduced multiple intelligences: people smart, music smart, etc.
  • found evidence in brain studies (localisation)
  • evolutionary plausible
78
Q

Wechsler IQ: broad cognitive skill indexes

A
  • verbal comprehension index (VCI)
  • visual spatial index (VSI)
  • fluid reasoning index (FRI)
  • working memory index (WMI)
  • processing speed index (PSI)
79
Q

Wechsler IQ test: psychometric properties

A

Full-scale IQ: M = 100, SD = 15, 55 - 145
Indexes IQ: M = 100, SD = 15, 55 - 145

Individual subtests: M = 10, SD = 3, 1 - 19

FSIQ alpha: 0.96 (SEM = 3)
FSIQ test-retest: 0.95