PA 2 Flashcards

1
Q

What is CTT?

A

Classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What model is CTT based on?

A

The True Score Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the basic principles of the True Score Model, for an individual and for a population?

A

Individual: Observed Score (X) = True Score (T) + Error (E)
Population: Total variance = true variance + error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define “error” in CTT

A

Error is the component of the observed score unrelated to test takers true ability or trait being measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define “reliability” in simple terms

A

Consistency in measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define “reliability” in CTT

A

Reliability is the proportion of the total

variance attributed to true variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the basic formula for reliability in CTT?

A

Reliability = True variance / Total variance (True variance + error variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define systematic v random error

A

Systematic Error: Source of error that is constant, proportionate, predictable.
Random Error: Source of error that is unpredictable, inconsistent, unrelated i.e. noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List and define 4 types of possible measurement error

A

Test Construction: Variation due to differences in items on same test or between tests (i.e. item/content sampling).
Test Administration: Variation due to testing environment:
• Testtaker variables (e.g., stress, discomfort, lack of sleep)
• Examiner variables (e.g., demeanor).
Test Scoring and Interpretation
Sampling Error: representativeness of sample.
Methodological errors: poor training, unstandardized administration, unclear questions, biased questions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is IRT?

A

Item Response Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the core difference between CTT and IRT?

A

CTT assumes all item on a test have an equal ability to measure the underlying construct of interest.
IRT provides a way to model the probability that a person with a particular ability level will correctly answer a question that is “tuned” to that ability level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define “difficulty” and “discrimination” in IRT

A

Difficulty relates to an item not being easily accomplished, solved, or comprehended.
• Discrimination refers to the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or construct being measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

List 4 main types of reliability

A

Test‐retest reliability
Parallel and Alternate forms reliability
Internal consistency reliability
Interrater/interscorer reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is test-retest reliability and how is it obtained?

A

An estimate of reliability over time.

Obtained by correlating pairs of scores from the same people doing the same test at different times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name situations where the test-retest is not recommended

A
Unstable variables (e.g. mood v personality)
Too long between tests (reliability tends to decrease)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define and distinguish between Parallel and Alternate Forms Reliability methods

A
  • Parallel forms: Two versions of a test in which the means and variances of the test scores are equal.
  • Alternate forms: two similar forms of a test, but they do not meet the strict requirement of parallel forms.
  • In both cases, reliability obtained by correlating the scores of the same people using the different forms.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Split Half Reliability

A

Split‐half reliability: a measure of internal consistency, obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe the 3 basic steps of Split Half Reliability method

A
  • Step 1. Divide the test into two halves.
  • Step 2. Correlate scores on the two halves of the test.
  • Step 3. Generalise the half‐test reliability to the full‐test reliability using the Spearman‐Brown formula.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define the Spearman‐Brown (S‐B) formula

A

S‐B formula allows one to estimate internal consistency reliability from a correlation between two halves of the one test, and predict reliability changes based on any number of measurement items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

List and define 4 methods of estimating Internal Consistency

A
  • Spearman‐Brown (S‐B) formula: correlation between two halves of the one test
  • Inter‐item consistency/correlation: The degree of relatedness of items on a test - able to gauge the homogeneity of a test.
  • Kuder‐Richardson formula 20: best choice for determining the inter‐item consistency of DICHOTOMOUS items.
  • Coefficient (Cronbach’s) alpha: mean of all possible split‐half correlations, corrected by the Spearman‐Brown formula. The most popular approach for internal consistency. (Values range from 0 to 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Name 2 disadvantages of Cronbach’s alpha

A
  1. Lower estimate of reliability
  2. Not a measure of unidimensionality i.e., it is a function only of the number of items, and the average inter‐item correlation.
22
Q

If a test measures more than one variable, what is the best way to test reliability?

A

Factor analysis

23
Q

Define Interrater (Interscorer) reliability. What sorts of studies often need this?

A

The degree of agreement/consistency between two or more scorers (or judges or raters).
Often used in behavioural studies

24
Q

What are the two main ways to obtain Interrater reliability, and when are they used?

A
  • Use intraclass correlation for continuous measures.

* Use Cohen’s Kappa for categorical measures

25
Q

Name 5 different test characteristics that determine what type of reliability measure you should use

A
  • Are the items homogeneous or heterogeneous?
  • Is the trait being measured dynamic or static?
  • Is the range of test scores restricted or not?
  • Is it a speed or a power test?
  • Criterion‐referenced or norm referenced?
26
Q

What is SEM and how does it relate to reliability?

A

Standard Error of Measurement. Provides measure of precision of an observed test score (i.e., estimate of amount of error in an observed score; estimate of the extent of deviation between observed and true score).
Generally: higher reliability = lower SEM.

27
Q

In test measures, what does CI stand for, and what does it mean?

A

Confidence Interval. It gives a probably spread of true scores based on the observed score. Eg 95% CL.

28
Q

In test measures, what does SED stand for, and what does it mean?

A

Standard Error of Difference. A measure of how large a difference in test scores would be to be considered statistically significant.

29
Q

List 3 situations where you would use SED

A

• How did Person A’s performance on test 1 compare with own performance on test 2?
• How did Person A’s performance on test 1 compare with Person B’s performance on test 1?
• How did Person A’s performance on test 1 compare with Person B’s performance on test 2?
(NB. Both tests must be on same scale)

30
Q

What is norm-referenced testing?

A

Deriving meaning from a person’s test score by comparing it to a reference group.

31
Q

What is a normative sample?

A

The reference group to which test‐takers are compared.

32
Q

What is a criterion‐referenced test?

A

A test that compares an individual’s score to a particular predetermined standard, criterion, level of performance, or mastery (e.g. a driving exam)

33
Q

Define standardization

A

The process of administering a test to a representative sample to establish norms

34
Q

Define sampling

A

The selection of an intended population for the test, that has at least one common, observable characteristic.

35
Q

Define stratified sampling

A

Purposefully including a representation of different subgroups of a population

36
Q

Define stratified random sampling

A

Divide into strata, then randomly sample from each strata. Final numbers can be proportionate or not, depending on the study requirements.

37
Q

What is a purposive sample?

A

Selecting a sample believed to be representative of the intended population

38
Q

What is incidental or convenience sampling?

A

Using a sample that is convenient or available for use. May not be representative of the population, so may be hard to generalise.

39
Q

Describe 6 steps in the process of developing norms

A
  1. Obtain a normative sample:
  2. Standardise a setting for test administration.
  3. Administer the test with standard set of instructions.
  4. Collect and analyze data.
  5. Summarize data using descriptive statistics including
    measures of central tendency and variability.
  6. Provide a detailed description of the standardization
    and administration protocol.
40
Q

What is a percentile, and what is a potential problem with this method of assessing a norm?

A

A percentile is the percentage of people in the normative sample whose score was below a particular raw score.
Easy and popular, however real differences between raw scores may be minimized near ends of distribution and exaggerated in middle of distribution.

41
Q

Define age norm

A

The average performance of a normative sample segmented by age

42
Q

Define grade norm

A

The average performance of a normative sample segmented by grade

43
Q

Define subgroup norm

A

A normative sample can be segmented by any of criteria initially used in selecting sample

44
Q

Define national norm

A

Derived from normative sample that is nationally representative of the population

45
Q

Define national anchor norm

A

Equivalency table for scores on two different tests. Allows common comparison.

46
Q

Define local norms

A

Normative information with respect to the local population’s performance on some test

47
Q

Describe the “normal” curve”

A

A bell‐shaped, symmetrical, mathematically defined curve that is highest at its center. Can be conveniently divided into areas defined by units of standard deviations.

48
Q

Define “standard score”

A

A raw score converted from original scale to another with a predefined scale (i.e., set mean and standard deviation)

49
Q

Define Z score

A

Conversion of a raw score into a number indicating how many standard deviation units the score is below or above the mean. x score = score minus mean / SD

50
Q

Define T scores

A

Scores using a scale where the mean is 50 and the SD is 10. Also known as “fifty plus or minus 10 scale”