Measurement Theory and Assessment 1 Flashcards

1
Q

Psychometrician

A

Specialist in psychology or education who develops and evaluates psychological tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test

A

A standardised measure for sampling behaviour which describes it using categories or scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Characteristics of a test

A
  • standardised procedure
  • for a specific sample of behaviour
  • uses scores or categories
  • uses norms or standards
  • makes a prediction of non-test behaviour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Norm-referenced test

A

Performance of the examinee is referenced to standardisation sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Criterion-referenced test

A

Determines where the examinee stands, regarding tightly defined educational objectives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assessment

A

Appraising/estimating the magnitude of one or more attributes in a person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Group tests

A

Suitable to the testing of large groups of individuals simultaneously (e.g. pen-and-paper tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Individual tests

A

Designed to be administered one-on-one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of psychological tests

A
  • intelligence tests
  • aptitude tests
  • achievement tests
  • creativity tests
  • personality tests
  • interest inventories
  • behavioural procedures
  • neuropsychological tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Responsibilities of test publishers

A
  • publication and marketing issues
  • competence of test purchasers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Responsibilities of test users

A
  • best interests of the client
  • confidentiality and the duty to warn
  • expertise of the test user
  • informed consent
  • obsolete tests and the standard of care
  • responsible report writing
  • communication of test results
  • consideration of individual differences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Diagnostics

A

Getting to know a situation in order to be able to make a decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Psychodiagnostics

A

Getting to know an individual’s psychosocial functioning
- reliable and valid description of their psychosocial reality
- find possible explanations for problems
- test possible explanations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Scientific diagnostics

A
  • ideally repeatable
  • ideally approach reality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Uses of tests

A
  • problem analysis
  • classification and diagnosis
  • treatment planning
  • program/treatment evaluation
  • self-knowledge
  • scientific research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Committee on tests and testing in The Netherlands (COTAN)

A

Criteria
- principles of test construction
- goal
- group
- function
- standardisation
- quality of test material
- quality of test manual
- norms: representative reference group
- reliability: consistency, repeatability
- validity: does the test assess what it aims to?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Tests need to:

A
  • be relevant
  • be performed by qualified individuals
  • have role integrity
  • be confidential
  • have informed consent
  • be independent and objective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Classical test theory

A

Test scores are influenced by two factors: consistency factors and inconsistency factors

X = T +e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sources of measurement error

A
  1. item selection: choosing an instrument/parts of an instrument
  2. test administration: general environment aspects, countenance of an examiner
  3. test scoring: subjectively scored tests are vulnerable to mistakes/bias from the scorer
  4. systematic measurement error: consistent error where something unwanted is measured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correlation coefficient (r)

A

Degree of linear relationship between two sets of scores obtained from the same people

Range: -1.00 to 1.00
Positive correlation (r > 0.00) or negative correlation (r < 0)
The closer r is to 1 (as an absolute value), the stronger the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Test-retest reliability

A

Administering an identical test to the same sample group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Alternate forms reliability

A

Two tests are independently created to measure the same thing; typical have same (or similar) means and standard deviations; correlation of test groups (from the same sample group) should be strong and positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Split-half reliability

A

Correlate scores from the 1st and 2nd half of a test to each other (instead of administering 2 tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Spearman-Brown formula

A

Corrects for the underestimation of reliability when using split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Coefficient alpha (Cronbach's alpha)
Mean of all possible split-half coefficients, corrected by the Spearman-Brown formula Range: 0.00 to 1.00 Index of internal consistency of the items; tendency for items to correlate positively
26
Kuder-Richardson formula
Similar to Cronbach's formula, used for tests with only two answer options
27
Interscorer reliability
A sample of tests is independently scored by two or more examiners; scores for the tests from each examiner are correlated (should have a strong, positive correlation). Used for subjective scoring tests
28
Systematic errors
- either positive or negative - average measurement error is not 0 - can be due to test construction/an inconsistency in the assessed construct - serve as a measure of validity - how well is the test measuring what it is supposed to
29
Unsystematic erros
- are random and unpredictable - are both positive and negative - average measurement error is 0 - are not related to the true score - are a measure of reliability - affects the consistency of scores
30
Raw score
Most basic information provided by a psychological test e.g. how many questions were answered correctly
31
Norm group
Sample of examinees, representative of the population for whom the test is intended
32
Norm-referenced test
Results of an examinee are interpreted using the instrument's corresponding norms
33
Measurements of central tendency
Mean: average, good for normally distributed data Median: middle number/score, better than mean when distribution of data is skewed, used for percentiles Mode: most common score, shows the peak on a skewed distribution§
34
Percentile
Percentage of people who scored below a specific raw score (e.g. score of 25 > 94th percentile, 94% of participants scored below 25)
35
Standard score
Distance from the mean in standard deviation units, aka a z-score
36
T-scores
Transformation of z-scores to avoid negative and decimated numbers M = 50, SD = 10 T = 10z + 50
37
Stanine
Raw scores converted to a system using 1 to 9 M = 5, SD ≈ 2 Scores are ranked lowest to highest then put into numbers bij percentage: 1 > bottom 4% 2 > next 7% 3 > next 12% 4 > next 17% 5 > next 20% 6 > next 17% 7 > next 12% 8 > next 7% 9 > next 4%
38
Random sampling
Each member of the population (or subset thereof) has an equal chance of getting selected
39
Stratified random sampling
Create strata (groups) from the population based on certain demographics then selecting the sample randomly (can be proportional)
40
Expectancy table
Shows the relationship between test scores and the expected outcomes on a different, relevant task e.g. scores on a scholastic aptitude test and subsequent college grade point average
41
Criterion-referenced test
Compare the examinee's score to a predefined performance standard Used often for education purposes
42
Most commonly used confidence intervals (CI)
68% CI > X ± 1*SD 90% CI
43
Most commonly used confidence intervals (CI)
68% CI > X ± 1*SD 90% CI > X ± 1.65*SD 95% CI > X ± 1.96*SD 99% CI > X ± 3*SD X = score SD = standard deviation
44
Difference score (SEdiff)
Used to determine if the difference between pre- and post-treatment scores is valid (or due to the unreliability/validity of the test)
45
Relative norms
Goal: classify on a continuum Interpretation: control group Items should maximally discriminate
46
Absolute norms
Goal: determine if criterion has been reached Interpretation: previously determined criterium Items should be relevant to criterium Especially used in education
47
Norms
Summary of distribution of characteristics in a representative sample Need to be up-to-date: - 15 years > outdated - 20 years > unusable
48
Summative assessment
- used for selection, qualification, or prognosis - assessment of learning e.g. course exam
49
Formative assessment
- strengths & weaknesses - aimed at instruction (compare with own scores or those of peers) - assessment for learning e.g. polls & feedback on a report
50
Flynn effect
IQ increases by 3 points every 10 years
51
Validity
A test is valid to the extent that inferences made form it are appropriate, meaningful, and useful Types of validity: - content validity - criterion-related validity - construct validity
52
Content validity
The degree to which the content of a test is representative of the sample of behaviour/construct the test is designed to assess - affected by a proper selection of items and thorough assessment of the construct - can be evaluated using an expert panel
53
Face validity
Does the test look valid to test users, examiners, and examinees? - more a matter of social acceptability than a technical form of validity
54
Criterion validity
Correlation between an examinee's test score and the behaviour/construct you want to predict - concurrent validity - predictive validity
55
Concurrent validity
Assess the behaviour at approximately the same time (usually the same day), using both the predictor and criterion tests
56
Predictive validity
Assess the behaviour at separate time (usually a long period in-between), in order to predict future behaviour, predictor test first, then later criterion test
57
Construct validity
The extent to which the test/measure accurately assesses what it is supposed to, measured by correlating the test to another test - convergent validity - discriminant validity
58
Convergent validity
Assess the relationship between the main test scores to those of a test which assesses the same construct - ideally > good, high correlation
59
Discriminant validity
Assess the relationship between the main test scores and test scores on another unrelated test (one which does not assess the same construct) - ideally > bad/no correlation
60
Test construction process
1. Defining the test 2. Selecting a scaling method 3. Constructing the items & analysis 4. Revising the test 6 Publishing the test If test is found to be inadequate after step 4, return to step 3
61
Representative scaling methods
- Expert rankings - Likert scales - Guttman scales - Thurstone scales - Absolute scales - Empirical scales
62
Item-difficulty index
Method for testing items Proportion of examinees who get the item correct in a tryout; identifies the items which should be altered or discarded from the test
63
Item-reliability index
Method for testing items Items should display internal consistency and good correlation to total test scores
64
Item-validity index
Method for testing items Used to identify predictively useful test items; how well does each item contribute to the overall predictive validity
65
Item-characteristic curves
Method for testing items Graphical display of the relationship between the probability of a correct response and the examinee's position on the underlying trait being measured by the test
66
Item-discrimination index
Method for testing items Statistical index of how efficiently an item discriminates between people who obtain high and low scores on the entire test
67
Cross validation
Method for revising a test Using the original regression equation in a new sample to determine whether the test still predicts the criterion well
68
Validity shrinkage
Method for revising a test Often, a test predicts the relevant criterion less accurately with a new sample
69
Feedback from examinees
Method for revising a test Receive feedback from the examinees in the try-out sample on the: - behaviour of examiners - testing conditions - clarity of exam instructions - convenience in using the answer sheet - perceived suitability of the test - perceived cultural fairness of the test - perceived sufficiency of time - perceived difficulty of the test - emotional response to the test - level of guessing - level/method of cheating by the examinee or others
70
Factor analysis
Summarises the interrelationships among a large number variables in a concise and accurate manner as an aid in conceptualisation
71
CHC Theory broad ability factors
- fluid intelligence/reasoning (Gf) - crystallised intelligence/knowledge (Gc) - domain-specific knowledge (Gkn) - visual-spatial abilities (Gv) - auditory processing (Ga) - broad retrieval (Gr) - cognitive processing speed (Gs) - decision/reaction time or speed (Gt)
72
Sternberg's Triarchic Theory of Intelligence
- componential (analytic) intelligence > executive processes - experiential (creative) intelligence > dealing with novelty - contextual (practical) intelligence > adaptation
73
IQ tests measure...
- problem-solving abilities - verbal abilities - global capacity vs specific mental functions - speed of response and thinking
74
IQ tests do not measure...
- learning competence - social competence
75
IQ experts to know...
- Galton: IQ as sensory keenness (speed) - Spearman: IQ as a global capacity (g) and specific factors (s) - Thurstone: IQ as 7 primary mental abilities - Luria: IQ as simultaneous vs successive processing - Guilford: IQ as the SOI model; added creativity; model consists of: operations, contents, and products
76
Cattel-Horn-Carroll (CHC) Theory (1968)
- hierarchical structure of intelligence - stratum 3: overall capacity (g) - stratum 2: broad cognitive abilities - stratum 1: narrow cognitive abilities
77
Gardner Multiple Intelligences (1983)
- critique on g > no underlying general factor exists - introduced multiple intelligences: people smart, music smart, etc. - found evidence in brain studies (localisation) - evolutionary plausible
78
Wechsler IQ: broad cognitive skill indexes
- verbal comprehension index (VCI) - visual spatial index (VSI) - fluid reasoning index (FRI) - working memory index (WMI) - processing speed index (PSI)
79
Wechsler IQ test: psychometric properties
Full-scale IQ: M = 100, SD = 15, 55 - 145 Indexes IQ: M = 100, SD = 15, 55 - 145 Individual subtests: M = 10, SD = 3, 1 - 19 FSIQ alpha: 0.96 (SEM = 3) FSIQ test-retest: 0.95