Chapter 6: Psychometrics, Test Design, and Essential Statistics Flashcards

1
Q

When performance on a test is not normally distributed but there is some variance, what is the best way to interpret the test result using normative data?

A) z score
B) T score
C) cut-off
D) percentile

A

D - percentile

Use of standardized scores that assume a normal distribution are not appropriate when there is not a normal distribution in the normative data, as occurs when most ppl do relatively well on the test. Thus, interpretation of either a z or T score would be inappropriate.

Although a cut-score could be used, dichotimizing the sample into intact or impaired results in loss of some of the critical measurement meaning of the score, especially when scores fall very close to the cut-off.

Thus, use of a percentile distribution is most appropriate b/c this at least tells the clinician the proportion of the sample that did as well or worse than the patient of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

An individual is given a battery of tests with at least three tests in each of five cognitive domains. He performs in the mildly impaired range on one test in each of two separate cognitive domains. How do you interpret this pattern of performance?

A) The pt is clearly impaired in two important cognitive domains; I diagnose him accordingly and provide treatment recommendations in my report.
B) The pt is essentially intact in almost all cognitive domains; I make no diagnosis and clarify in my report that no treatment is deemed necessary.
C) The pt may be impaired in one or more domain; I need more tests to be sure and will send a request for that in a report to the insurance company.
D) This may be due to normal variability. Unless a disorder is otherwise indicated by history, I make no diagnosis but comment on the variability in my report.

A

D - This may be due to normal variability.

Unless a disorder is otherwise indicated by history, I make no diagnosis but comment on the variability in my report.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

On Test 1, an individual obtains a T score of 65. On Test 2, she obtains a scaled score of 115. On Test 3, the individual scores in the 67th percentile. On Test 4, the individual’s score is equivalent to a z score of 0.5. Which of the following is the correct ordering of these scores, from lowest to highest?

A) Test 4, 3, 2, 1
B) Test 3, 4, 2, 1
C) Test 4, 2, 3, 1
D) Test 3, 2, 4, 1

A

B - Test 3, 4, 2, 1

Converting each score to one common metric will allow for their comparisons.

Percentiles: 
T1 = 93%ile
T2 = 84
T3 = 67
T4 = 69
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The concept of regression to the mean is best expressed by which of the following?

A) The apple doesn’t fall all that far from the tree.
B) Highly intelligent ppl have even smarter children.
C) Ppl of superior intelligence are likely to have high average children.
D) Ppl who are low average are at high risk of having impaired children.

A

C - People of superior intelligence are likely to have high average children

Answers B & D - imply regression AWAY from the mean

Answer A - indicates no systematic expected change in predictions with repeated measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You desire to calculate a confidence interval around an obtained test score to determine how confident you are that the obtained score reflects the person’s true ability in this domain. For the most appropriate estimate of the confidence interval, you should use the following in your calculation:

A) Validity coefficient for the test.
B) Standard error of the mean.
C) Standard error of the estimate.
D) Standard error of the measurement.

A

C - Standard error of the estimate

As stated in the section of SEM/SEE, the SEE is based on the obtained score and requires no knowledge of the true score and includes extra consideration of the reliability of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You have created a new test that you want to use clinically, but its psychometric properties are unknown. You want to know about the incremental validity of the test, and you have knowledge of the base rate of the condition that the test was designed to detect. To calculate the incremental validity if you are equally worried about false positives and false negatives, you need to know the test’s _____.

A) overall hit rate
B) sensitivity
C) specificity
D) positive predictive value

A

A - Overall hit rate

Although positive predictive value could be used to calculate incremental validity,it is only useful if we are interested in our test’s incremental ability to make a positive diagnosis, not as an indicator for overall diagnostic accuracy.

Thus, the overall hit rate is the best choice in the situation b/c we are interested in both yes and no decisions based on the test (both PPV and NPV).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following will NOT affect the reliability of an intra-individual difference score (i.e., the reliability of difference in performance between two tests within one individual)?

A) reliability of test one
B) correlation between the tests
C) variance of the distribution of the difference scores
D) actual difference between the two scores

A

D - Actual difference between the two scores

The reliability of the tests, the association between them, and the error variance are all included in the formula required to calculate the reliable chance interval. The actual difference between the score is only a point of reference and is compared to the calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is not a well-validated clinical use of regression?

A) prediction of premorbid ability level
B) prediction of membership in a clinical group
C) prediction of performance in one domain based on performance in others
D) prediction of future test performance based on past test performance

A

C - prediction of performance in one domain based on performances in others.

Answer C is INCORRECT b/c performance within each domain is believed to be relatively independent from other domains and there is little basis for prediction of performance in one domain on the basis of others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Two individuals are administered the same test. Person 1 scores in the 48th percentile; Person 2 scores in the 93rd %ile. It is later found that there was an error in scoring of the test on these two administrations only, and 3 points are then added to each person’s score. Given this information, which of the following is true?

A) Both percentile ranks will increase by the same amount.
B) Person 1’s percentile rank will increase more than Person 2’s
C) Person 2’s percentile rank will increase more than Person 1’s
D) Neither percentile rank will change

A

B - Person 1’s percentile rank will increase more than Person 2’s

both individual’s scores will change in reference to the normative group b/c of this additive. However, b/c of the assumption of normal distribution, score differences in the middle of the distribution of percentiles are exaggerated compared to those at the extremes. Thus, changing a raw score by 3 points will have a larger influence on the percentile ranking close to the middle of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assuming a normal distribution, how many people would score between a 600 and a 900 on a standardized test with a mean of 750 and a standard deviation of 150 (N-1000)?

A) 840
B) 680
C) 640
D) 720

A

B - 680

500 falls 1 SD below the mean; 900 falls 1 SD above the mean.

68% of a normally distributed sample falls between +/- 1 SD from the mean.

Thus, in a sample of 1000, 680 (68%) fall +/-1 SD from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

TRUE or FALSE

A finding of one or more impaired scores in a relatively large battery is RELATIVELY COMMON in normative samples without neurological impairment.

A

TRUE

Unless the findings fit a profile that is consistent with an impaired domain or expected impairment based on medical history/presumed etiology (i.e., variability across scores in ADHD), the findings should not be OVER-INTERPRETED but considered in this light and discussed as possible normal variance in the interpretation section of the report.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DEFINITION:

Alpha

A

The probability of type I error in making a decision about the tenability of a null hypothesis (“False Positive”)

A measure of a test’s reliability (coefficient alpha) that reflects the internal consistency of the item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DEFINITION:

Alternate forms
aka parallel tests

A

Tests constructed to be similar in content, high in reliability, and equivalent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DEFINITION:

Baye’s theorem

A

Probability/statistics theorem employed in decision analysis to allow the posterior probability of an event to be calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DEFINITION:

Beta

A

The probability of making a Type II error in statistical hypothesis testing (“False Negative”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DEFINITION:

Central Limit Theorem

A

If n independent variates have finite variances, then standard expression of their sum will be normally distributed (as n approaches infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DEFINITION:

Conditional Probability

A

The probability of an event or outcome, given that a difference even has occurred.

Based on Bayes’ Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

DEFINITION:

Confidence Interval

A

Interval around a statistic (i.e., observed test score, sample mean), usually expressed in SD units or percentages, that reflects the expected sample-to-sample variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

DEFINITION:

Content Validity

A

Degree to which scores on a measure capture all the aspects of a dimension of interest. Can be demonstrated by parallel validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

DEFINITION:

Construct Validity

A

Degree to which scores on a measure support inferences about a dimension of interest. Can be demonstrated via factor analysis or other methods that illustrate convergent and discriminant validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DEFINITION:

Descriptive Statistics

A

Show the main features of the data involving the central tendency, variability, and the shape of the summarized data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

DEFINITION:

Discriminant Analysis

A

the process of utilizing a score profile to determine whether an individual belongs to one group (i.e., a specific diagnosis) or another (i.e., no diagnosis or a different diagnosis).

can also be used to describe differences btw two or more groups on a set of measure (DESCRIPTIVE discriminant analysis) or to classify subjects into groups on the basis of a set of measures (PREDICTIVE DA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DEFINITION:

Ecological Validity

A

The degree to which a measure predicts behavior in everyday situations;

a form of external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

DEFINITION:

External Validity

A

Degree to which results from a particular test or measure can be generalized to situations or related to information beyond the test itself (correlation of measure to another measure of some independent criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

DEFINITION:

False Negative

A

(aka Type II error or beta error)

Error that occurs when a test incorrectly indicates the absence of a particular trait or condition when the trait or condition actually exists.

Funny Ex: Telling a obviously showing pregnant woman she isn’t pregnant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

DEFINITION:

False Positive

A

(aka type I error or alpha error) Error that occurs when a test incorrectly indicates the presence of a trait or condition when none genuinely exists

Funny Ex: Telling a man he is pregnant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

DEFINITION:

Inferential Statistics

A

Methods used to reach conclusions that extend beyond the immediate data alone to extend to wider samples and conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

DEFINITION:

Internal consistency

A

Estimate of the reliability of a measure or score based on the average correlation among items within a test.

The size of the coefficient depends on both the average correlation among the items AND the number of items; represented by coefficient alpha.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

DEFINITION:

Item characteristics curves

A

(aka item response function) Shows probability of a correct response as a function of the level of overall performance of the person.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

DEFINITION:

Item Response Theory

A

Models the response of each test-taker of a given ability to each item on the test; based on the idea that the probability of a correct/keyed response to an item is a mathematical function of person parameters (i.e., trait or ability) and item parameters (i.e., difficulty or discrimination).

31
Q

DEFINITION:

Kappa

A

A measure of reliability reflecting agreement among raters that is adjusted for the expected level of chance agreement.

32
Q

DEFINITION:

Kuder-Richardson Formula

A

A formula for estimating the reliability of a dichotomously scored test; influenced by number of items, variance, and item difficulty.

33
Q

DEFINITION:

Kurtosis

A

A measure describing the degree to which a distribution of scores is clustered around the mean;

if peaked, leptokurtic
if flat, platykurtic

34
Q

DEFINITION:

Meta-Analysis

A

The statistical analysis of results obtained from different independent studies. This technique can be used to create metanorms using data from several independent samples.

35
Q

DEFINITION:

Multiple Correlation

A

A correlation coefficient that expresses the relationship between the criterion scores and an additive combination of predictor scores.

Denoted with R

36
Q

DEFINITION:

Normal Distribution

A

(aka bell-shaped or Gaussian distribution)

A bell-shaped score distribution that is symmetric around the mean.

Approximately 68% of the scores fall between -1.0 and +1.0 SDs about the mean, approximately 95% of the scores fall between -2.0 and +2.0 SDs, and approximately 97.7% of teh scores fall between -3.0 and +3.0

37
Q

DEFINITION:

Partial Correlation

A

A measure of the relationship between two variables that exists after accounting for the relationships of a third measure to each of the two variables

38
Q

DEFINITION:

Percentile Rank

A

(aka centile)

A score indicating the percentage of score at or below the comparison value (distribution is 100 equal groups)

39
Q

DEFINITION:

Person Separation Index

A

Summary assessment of the ability of a test to discriminate between groups, taking into account measurement error; analogous to reliability statistic

40
Q

DEFINITION:

Power Tests

A

Tests in which there is no time restriction or time component

41
Q

DEFINITION:

Practice Effects

A

Improvement in performance as a function of having previously taken the test or performed the task.

42
Q

DEFINITION:

Predictive Accuracy

A

asdf

43
Q

DEFINITION:

Predictive Validity

A

asdf

44
Q

DEFINITION:

Prevalence

A

(aka BASE RATE)

The total number of cases of a particular phenomenon that develop within a given period.

45
Q

DEFINITION:

Probability

A

Long-range relative frequency

the ratio of the number of times the event of interest occurs relative to the number of opportunities for the event to occur.

46
Q

DEFINITION:

Profile Analysis

A

A multivariate statistical technique that evaluates whether profiles differ in terms of shape and overall elevation

47
Q

DEFINITION:

Psychometrics

A

the measurement of psychological functions and individual behavioral differences

48
Q

DEFINITION:

Quantile

A

The expression of a distribution as equal, ordered subgroups.

Quantiles can be made by dividing the distribution into any number of equal groups. . For example, quartiles create four groups in the distribution.

49
Q

DEFINITION:

Reciever Operating characteristic curve (ROC curve)

A

A plot of the probability of detecting a condition against the probability of false alarms; based on signal response theory; used for establishing optimal cut-offs for prediction of binary outcomes.

50
Q

DEFINITION:

Regression Analysis

A

A method of statistical analysis in which a single outcome variable is related to one or more predictor variables by examining the tendency for scores on the outcome to move in concert with scores on the predictors; usually lines, but curvilinear and nonlinear possibilities.

51
Q

DEFINITION:

Regression Toward the Mean

A

The tendency for scores at the extremes of a distribution to migrate toward the mean on repeated assessment due to increased probability closer to the center of the distribution.

Whenever the correlation between two scores is imperfect, there will be regression to the mean.

52
Q

DEFINITION:

Reliability

A

the degree to which scores on a test are systematic and the degree to which a measure is free from measurement error

a generic term for several important measurement characteristics including:

Internal consistency, split halves, test-retest, alternate forms, and interrater reliability.

53
Q

DEFINITION:

Reliable Change Index (RCI)

A

Used to determine whether changes present on follow-up testing exceed what is considered to result from the methodological aspects associated with repeat assessment; based on test-retest reliability, the standard error of the test, and practice effects; confidence levels are selected based on desired level of precision.

54
Q

DEFINITION:

Signal Detection Theory

A

Ability to detect the presence of a signal from background noise;

generates the d’ statistic, the distance between the noise distribution and the signal plus noise distribution

has been adapted for sue in characterizing response styles in recognition memory testing.

55
Q

DEFINITION:

Skew

A

A measure of the asymmetry of a probability distribution;

The “tail” of the distribution is in the direction of the skew (i.e., positive or negative)

56
Q

DEFINITION:

Standard Deviation

A

A statistical measure of variability of scores around the mean;

The square root of variance

a measure of dispersion

57
Q

DEFINITION:

Standard Error of the Estimate (SEE)

A

An estimate of the accuracy of a prediction of test performance;

based on the difference between the obtained score and the predicted score, the number of pairs of scores, and the reliability of the test.

58
Q

DEFINITION:

Standard Error of Measurement (SEM)

A

A measure of the variability of scores obtained on a test relative to the “true scores”

A reliable test has a small standard error of measurement.

59
Q

DEFINITION:

Standard Error fo the Mean

A

A measure of the degree to which a sample mean varies from sample to sample around the true mean of the population;

The standard deviation of the sampling distribution sample mean;

The larger the sample, the smaller the standard error of mean;

standard deviation of the sample divided by the square root of the sample size

60
Q

DEFINITION:

Standard Score

A

A score transformed to reflect its distance from the mean measures in SD units; T score, z score, scaled score; most appropriate for use with normal distributions.

61
Q

DEFINITION:

Standardization

A

The process by which data from a group of individuals intended to represent a population of interest are collected and analyzed;

Can be based on health individuals or clinical population

Adequate standardization improves the reliability and validity of test results

62
Q

DEFINITION:

Stratified Sample

A

Process of selecting random sample from specific groups to ensure adequate representation of demographic groups.

63
Q

DEFINITION:

T score

A

Standard score based on normal distribution with a mean of 50 and a SD of 10

64
Q

DEFINITION:

T-test

A

(aka student’s t-test)

A test of differences between sample means based on the student’s distribution; takes into account the ratios of the sample means to sample variance

paired = interested in the difference between two variables for the same subject

independent = determines whether there is a statistically significant difference between the means in two unrelated groups

65
Q

DEFINITION:

Validity

A

The degree to which a measure can be used to support a specific inference.

Validity is NOT a property of a test but of the inferences that the test is designed to produce;

Types: construct, content, concurrent, predictive, criterion, internal, external, discriminate, and convergent

66
Q

DEFINITION:

Variance

A

Measure of the degree to which scores deviate from the mean.

67
Q

DEFINITION:

Z Score

A

A linearly derived standard score with a distribution mean of zero and a standard deviation of one.

68
Q

Which of the following RCI ranges reflects a significant difference in test scores?

(A) +/- 1.43
(B) +/- 1.67
(C) +/- 1.88
(D) +/- 1.96

A

(D) +/- 1.96 reflects a significant difference in test scores.

Calculation of the RCI uses the standard error of the difference (based on the SEM for the test) and computes a z score for the difference between the individual’s test based on the normal probability distribution.

RCI should be adjusted for practice effects

DO NOT overinterpret a significant RCI

69
Q

When interpreting an obtained test score, one must consider that said score is composed of ______________.

(A) true score and random error
(B) true score and both random and nonrandom error
(C) true score only
(D) random error only

A

(B) True score and both random and nonrandom error

As stated in classical test theory, any obtained test score X for an individual consists of a true score and an error component. Some of the error is random while some of the error is systematic (and not random)

X = T + E,

Where:
X is an observed score,
T is the true score,
E is random error.

70
Q

How does increasing the sample size affect the standard error of the estimate?

(A) a larger sample size produces a larger standard error of the estimate
(B) A larger sample size produces a smaller standard error of the estimate.
(C) A larger sample size has no effect on the standard error of the estimate
(D) A larger sample size can produce a larger standard error of the estimate, but only if the sample is larger than 10.

A

(B) A larger sample size produces a smaller standard error of the estimate.

A larger sample size produces a smaller standard error of estimate and increases the liklihood of rejecting the null hypothesis.

Recall that SEE is the SD of true scores if the observed score is held constant. The SEE helps the clinician calculate the range of scores in which the true score is likely to fall. The larger the sample, the greater the likelihood that one will be able to accurately predict where the true score will fall.

71
Q

Would you conclude impairment in a particular domain based on one neuropsychological test score falling 1 standard deviation below the mean?

(A) No, finding one or more score that fall more than one SD below the mean is common, and if using the assumption of the normal distribution of scores, this occurs 16% of the time.
(B) No, finding one or more scores that fall more than one SD below the mean is common, and if using the assumption of the normal distribution of scores, this occurs 4% of the time.
(C) Yes, finding one or more scores falling more than one standard deviation below the mean is uncommon and, if using the assumption of the normal distribution of scores, this occurs 2% of the time.
(D) Yes, finding one or more scores falling more than one standard deviation below the mean is uncommon, and if using the assumption of the normal or skewed distribution of scores, this occurs less than 1% of the time.

A

(A) No, finding one or more score that fall more than one SD below the mean is common, and if using the assumption of the normal distribution of scores, this occurs 16% of the time.

Finding one ore more scores that fall more than 1 SD below the intraindividaul or normative mean is not uncommon; based on assumptions of the normal distribution, it will occur 16% of the time. Even a most conservative definition of a low score (such as 2.0 SDs below the mean) results in low scores being detected about 4% of the time by chance alone.

72
Q

What is the simplest item response theory (IRT) model?

(A) The simplest IRT model is a three-parameter model, which is not algebraically equivalent to the Rasch model.
(B) The simplest IRT model is a two-parameter model, which is not algebraically equivalent to the Rasch model.
(C) The simplest IRT model is a two-parameter model, which is algebraically equivalent to the Rasch model.
(D) The simplest IRT model is a one-parameter model, which is algebraically equivalent to the Rasch model.

A

(D) The simplest IRT model is a one-parameter model, which is algebraically equivalent to the Rasch model.

According to the Rasch model, an individual’s response to a binary item (e.g., right/wrong, true/false, agree/disagree) is determined by the individuals trait level and the difficulty of the item. One way of expressing the Rasch model is in terms of the probability that an individual with a particular trait level will correctly answer an item that has a particular difficulty.

73
Q

The Spearman-Brown Formula can be used to _____________.

(A) calculate the effect of reliability by lengthening the test
(B) calculate the effect on reliability by removing individual items
(C) assess for the homogeneity of items within the test
(D) assess the inter-rater reliability of a test

A

(A) calculate the effect of reliability by lengthening the test

The Spearman-Brown formula is used to calculate the likely effect of lengthening a test to a certain number of items. The formula predicts the reliability of a new test composed by replicating the current test n times (or, equivalently, creating a test with n parallel forms of the current exam).

Thus n = 2 implies doubling the exam length by adding items with the same properties as those in the current exam. Values of less than one may be sued to predict the effect of shortening a test.

74
Q

DEFINITION:

Rasch Model

A

a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between (a) the respondent’s abilities, attitudes, or personality traits and (b) the item difficulty.