Psychometrics & Statistics Flashcards

1
Q

In Classical Test Theory, what 2 components comprise any obtained test score?

A

A true score (T) and random error (E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive versus Inferential statistics

A

Descriptive statistics quantitatively describe the main features of data, i.e., central tendency (mean, median, mode), variability (SD, variance), and shape of summarized data points

Inferential statistics help reach conclusions that extend beyond the data alone using various methods, such as general linear model (t-test, ANVOVA, ANCOVA), regression analysis, multivariate methods (factor analysis, cluster analysis, linear discriminant function, multidimensional scaling).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe Kurtosis versus Skew

A

Kurtosis captures the degree to which a distribution of scores is clustered round the mean. i.e., if there’s a peaked (leptokurtic) or flat (platykurtic) distribution.

Skew is a measure of asymmetry of a probability distribution; shows tendencies of scores to cluster to the higher end (negative skew) or lower end (positive skew) of the distribution. Skewed distributions alter the rank order of central tendency scores (mean, median, mode).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is IRT?

A

Item Response Theory focuses on item-level characteristics rather than on test-level characteristics.
Item-level responses are analyzed to compare the probability of a correct answer against underlying person parameters (i.e., trait or ability) and item parameters (i.e., difficulty or discrimination), using an item characteristic curve (ICC).
ICC also know as ‘item response function’. Can provide info on difficulty, difficulty & discrimination, or difficulty discrimination & guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is probability theory?

A

The ratio of outcomes over an infinite number of replays of the game define the probability of that outcome; similar to games of chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the key elements of Bayesian theory?

A

Bayes’s theorem is used in decision making analysis to allow the posterior probability of an event to be calculated. Key elements are posterior probability, prior probability, and likelihood.

The probability of outcome B given an event A is equal to the outcome A given B times the prior probability of outcome B divided by the prior probability of outcome A.
Conditional probability is the probability of an event given that a difference event has occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of distribution do measures of motor ability and reaction times typically have?

A

positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is regression towards the mean?

A

Tendency for scores at extremes of a distribution to migrate toward the mean on repeated assessment.

In a pair of independent measurement scores from the same distribution, samples far from the mean on the first set of scores tend to be closer to the mean on the second set; they appear to regress because of increased probability closer to the center of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Variance versus Standard Deviation

A

Variance is the degree to which scores deviate from the mean; the average of the squared differences from the mean of each observation in a distribution.

SD is a statistical measure of variability of scores around the mean; equals the square root of the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the purpose of transformations?

A

Transformation makes distributions of data points that lack true normality able to fit to a normal curve. Use of a transformation must be related to an essential measurement concern that can be identified and expressed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Standard scores

A

Transformation of normally distributed data used to make a scale, or set mean and SD, so that measures can be compared. T score, z-score, index score, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using percentages- what distribution shape is the created?

A

Data take on a “rectangular” shape that forces artificially even intervals, regardless of the underlying values.
Percentiles are often used because of the familiarity an ease in understanding for laypersons and test consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Reliability

A

Consistency of test results under varying test administration conditions; to what degree scores are systematic and the measure is free from measurement error. Reliability index is ratio of true score variance to total variance, r values range from 0 - 1, with .80 or higher considered acceptable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Validity?

Describe external versus internal validity.

A

Degree to which a measure can be used to support a specific inference; a property of the inferences that the test is designed to assess rather than the test itself, thus is concerned with external set of considerations in establishing the credibility of a test.

External validity is the degree to which test results can be generalized to other groups and situations. Examples include ecological validity

Internal validity is the degree to which observed effects are real, i.e., not caused by confounding variables or extraneous factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Test-retest reliability

A

Stability of scores on repeated administration of an instrument to the same person.
Error variance- random fluctuation in performance from one administration to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe Reliability versus sensitivity to change.

What is RCI?

A

Perfect reliability of a measure won’t sufficiently detect change; trade-off should optimize reliability versus sensitivity to change.

Reliable change index (RCI) is used to determine if changes exceed what is considered to result from methodological aspects associated with repeat assessment. RCI is based on test-retest reliability, standard error of the test, and practice effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Alternate form reliability?

A

Reliability coefficient that captures the stability of a test over time and consistency of responses to different samples of items asking the same knowledge or performance.
Alternate forms are parallel tests constructed to be similar in content, high in reliability, and equivalent. They reduce effects of error variance due to practice effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is internal consistency reliability?

A

Evaluation of the internal consistency of a test by splitting it in different ways using only a single administration. Reliability of a half-test is the correlation between half scores of the test. Also known as split-half reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Spearman-Brown formula?

A

Calculates the likely effect of lengthening a test. Lengthening a test will increase consistency with respect to item sampling (i.e., internal consistency) but not necessarily stability over time (i.e., test-retest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Inter-item reliability?

A

Consistency- estimation of content sampling error and heterogeneity of the domain of knowledge or behavior.
Higher homogeneity is associated with better interitem reliability, but must match relative homogeneity of the construct or criterion that the test is trying to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Kuder-Richardson formula?

A

Estimates the reliability of a dichotomously scored test; influenced by number of items, variance, and item difficulty. Results in a reliability coefficient and represents the mean of all half-reliabilities resulting from splitting the test.

22
Q

What is Cronbach’s coefficient of alpha?

A

Inter-item reliability coefficient for tests that use a numeric response for each item (i.e., likert-type scale)

23
Q

What is Interrater reliability?

A

Correlation between separate scorings of the same test materials. Scorer variance may systematically affect the outcomes of tests when scoring is not highly standardized or judgment comes into play.

Kappa is a measure of reliability reflecting agreement among raters adjusted for the expected level of chance agreement.

24
Q

What is Content validity?

A

Degree to which scores on a measure capture all aspects of the dimension of interest. Determined through systematic evaluation of test by experts.

The domain must be fully defined at outset and analyzed to ensure all major elements are covered and in correct proportions. A type of criterion validity, NOT equivalent to face validity (extent the test appears to measure the construct of interest)

25
Q

What is Predictive validity?

A

Coefficient that represents relative success of a test at predicting a previously-defined criterion; a type of criterion validity.

26
Q

What is Concurrent validity?

A

Coefficient that represents the degree to which a test measures what it was intended to measure by looking at the performance on a test against a previously validated measure (concurrent criterion); associated with criterion validity.

27
Q

What is Construct Validity?

A

Degree to which a test successfully measures a theoretical construct or trait.
Used to support inferences about a dimension of interest.

28
Q

What validity techniques are important for establishing construct validity?

A

Convergent and Discriminant validity; Internal consistency; Factorial validity.

Convergent validity- two or more approaches to measurement of a trait are positively correlated.

Discriminant validity- low correlation coefficient between two similar approaches to measurement of different traits

29
Q

The ability of a test to discriminate between individuals with and without a specified condition (i.e., detect the presence or absence of a condition) is described by…?

A

Sensitivity and Specificity

Sensitivity is the ability to correctly discriminate those that have the condition.

Specificity is the ability to correctly discriminate those that DO NOT have the condition.

30
Q

SN/SP of a screening test versus confirmatory test

A

A screening test has high sensitivity and low specificity.

A confirmatory test has low sensitivity and high specificity.

31
Q

What is predictive power?

A

The likelihood that individual who receives a particular score has the specified condition or not, which is dependent on the base rate of the condition.

32
Q

Describe PPP versus NPP

A

Positive predictive power is the probability that those with abnormal test scores truly have the condition of interest. Equal to True Positives divided by [True + False Positives], where the denominator is the Total who test positive for the condition.

Negative predictive power is the probability that people with a normal score do not have the condition. Equal to True Negatives divided by [True + False Negatives]

33
Q

If the prevalence of a condition is very high, what happens to predictive power?

A

PPP goes up while NPP goes down if conditions are highly prevalent.

In contrast, low prevalence will cause PPP to decrease and NPP to increase.

34
Q

What is Pre-test probability versus post-test probability?

A

Pre-test probability is the estimated probability that a patient has the condition prior to testing; it depends on the base rate of the condition of interest.

Post-test probability is the probability that a patient has the condition given a positive test result.

Both based on likelihood ratios; the likelihood ratio of a positive test compares true positives to false positives (should be greater than 1), while the likelihood ratio of a negative test compares true negatives to false negatives (should be between 0 and 1).

35
Q

What is an ROC curve used for?

A

Receiver operating characteristic curves plot sensitivity (Y axis) and 1-specificity (x axis); visualize the performance of a test and help determine the optimal cut-off score for prediction of binary outcomes.
The chosen cut-off should exclude the most people without the condition without missing an unacceptable number of people who have the condition.

36
Q

What is standard error of measurement?

A

SEM is the error variance around a true score, i.e., a measure of variability of scores obtained on a test relative to the “true scores.” SEM is equal to the standard deviation (SD) of the random errors around the true score.
A test with high reliability has a low standard error of measurement.

37
Q

What is standard error of estimate?

A

SEE is a measure of variability in predicted scores, based on obtained score without requiring knowledge of the true score. It can help interpret test performance by constructing a confidence interval around estimated criterion score.

SEE is larger with SD is large or validity coefficient is small.

38
Q

What is a confidence interval?

A

Interval around a statistic (i.e., observed score, sample mean) that reflects the expected sample-to-sample variability, usually expressed in SD units or percentages.

It allows for an estimate of uncertainty of an obtained test score based on 1) properties of the test and 2) properties of the normative sample.

39
Q

What is Standardization?

A

Process by which data from a group of individuals intended to represent a population of interest (either healthy or clinical populations) are collected and analyzed. Adequate standardization improves the reliability and validity of test results.

40
Q

What is the central limit theorem?

A

If n independent variates have finite variances, then standard expression of their sum will be normally distributed (as n approaches infinity).

41
Q

What is profile analysis?

A

Plotting standardized scores on a battery of tests (a graph or profile) and making inferences about cognitive functioning and probable diagnosis on the basis of the pattern in test performance.

42
Q

What is Discriminant analysis?

A

Multivariate stats used to describe difference between two or more groups on a set of measures and classify subjects into groups. One can utilize a score profile to determine whether an individual belongs to a group of interest (i.e., a specific diagnoses) or not.

43
Q

How are regression equations useful in test interpretation?

A

Can estimate premorbid level of functioning based on demographic and specific test performances, account for demographic characteristics, or assess change in an individual’s functioning.

44
Q

What is Type I versus Type II error?

A

Type I is a false positive (reject a true null hypothesis), incorrectly indicating the presence of a trait/condition when none exist. Alpha - probability of making a Type I error.

Type II is a false negative (failure to reject a false null hypothesis), incorrectly ruling out a trait/condition when it actually exists. Beta - probability of making a Type II error.

45
Q

What is History effect?

A

Threat to internal validity when events occur between the pre-test and the post-test that could affect participants in such a way as to impact the dependent variable.

46
Q

What is Maturation effect?

A

Threat to internal validity when changes are seen in subjects’ test performance because of the time that has elapsed since first test.

47
Q

What is Instrumentation?

A

Threat to internal validity when measurements are not accurate or procedures are not standardized.

48
Q

What is multiple correlation?

A

Correlation coefficient that expresses the relationship between the criterion scores and ad additive combination of predictor scores. Expressed with R.

49
Q

What is partial correlation?

A

A measure of the relationship between two variables that exists after accounting for the relationships of a third measure to each of the two variables.

50
Q

In normal distribution, what percentage of scores fall between -1 and 1 SD, -2 and 2 SD, and -3 and 3 SD?

A

68% between -1 and 1
95% between -2 and 2
99% between -3 and 3

51
Q

Item difficulty versus Item discrimination

A

Item difficulty index (p) is determined by dividing # who answered item correct by total # of sample, values range from 0-1.

Item discrimination is the extent to which an item differentiates those who obtain high versus low scores. D= U - L, where U is the percent of upper-scoring examinees and L is percent of lower-scoring examinees; D values range from -1 to +1; values .35 or higher are considered acceptable.

52
Q

What is norm-referenced interpretation versus criterion-referenced interpretation?

A

Norm referenced uses percentile ranks, standard scores, and/or age/grade equivalents

Criterion referenced requires prespecified standard; uses percentage, regression equation, and/or expectancy table