Psychological Testing Flashcards

1
Q

*What is a raw score?

A

The most basic level of information provided by a psychological test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

*What are the defining features for most tests?

A

It has a standardized procedure, it is designed to predict something, it gets a sample of behavior, and it gives a score or category result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

*What is the difference between norm-referenced and criterion-referenced tests?

A

A criterion-referenced tests asks whether or not a standard has been met, but is not interested in comparing you to anyone else. A norm-referenced test wants to know how you compare to the population (based on a sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

*What is the difference between testing and assessment?

A

Assessment is a more comprehensive term, referring to the entire process of compiling information about a person and using it to predict behavior. It can be defined as appraising or estimating the magnitude of one or more attributes in a person.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

*How is the information obtained from an individually administered test different from that obtained on a group test?

A

The information from individual tests includes the motivation of the subject, and the examiner can assess the relevance of other factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

*What are the responsibilities of test publishers?

A

They should be familiar with APA guidelines and determine the test has appropriate psychometric validity. Finally, they should ensure the competence of test purchasers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

*What are the three categories used by APA and many test publishers regarding examiner qualifications?

A

Levels A, B and C.
Level A tests require minimal training, and are usually paper and pencil.
Level B tests require training in statistics and knowledge of test constructions. Some graduate training is needed. E.g. Aptitude tests and personalty inventories.
Level C includes the most complex instruments and require a master’s degree. Eg. Protective tests, individual tests of intelligence, and psychological tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

*What are the responsibilities of test users?

A

A test user should have awareness of ethical guidelines, expertise in using the test, informed consent, stop using obsolete tests ASAP, and properly communicate test results (5). Any exceptions to the communication of results must be outlined in the informed consent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

*What are the strengths and weaknesses of the testing used in China from 2200BC through 1906AD?

A

Strengths: Incorporated relevant selection criteria, like penmanship.
Weaknesses: Unnecessarily grueling, failed to validate their selection procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

*How did Wundt contribute to the history of testing?

A

Founded the first psychological laboratory. Had people observe a pendulum to study the speed of thought.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

*How did Galton contribute to the history of testing?

A

Galton studied individual differences when everyone else was looking at sameness. He also had the ideas for correlations and standard deviations. He believed that genetics influenced everything.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

*How did Cattell contribute to the history of testing?

A

Cattell assessed intelligence using brass instruments. He used hand squeezes and feeling two points tests, because he believed people with the best sensory perception skills would be the smartest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

*How did Esquirol contribute to the history of testing?

A

Esquirol said there was a distinction between emotional problems and intellectual disability. One was lifelong and the other was onset. He diagnosed people and classified levels of severity. Although way ahead of his time, he only used language to make mental distinctions. Current methods aren’t so reliant on verbal methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

*How did Binet contribute to the history of testing?

A

Binet, at the request of the French government, to identify which children needed special help in school, he developed a mental test. This was an important moment because it recognized that some children had different needs and needed an identification method that was valid. Binet had a strong research background.

The 1905 Binet Simon Scale was the first legitimate test of intelligence. It had a lot of verbal questions, as well as digit span. The score was a mental age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

*How did Goddard contribute to the history of testing?

A

Goddard tested the intelligence of immigrants, but didn’t consider cultural background.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

*How did Hollingworth contribute to the history of testing?

A

Leta Hollingworth coined the term “gifted.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

*How did Terman contribute to the history of testing?

A

He developed the Stanford-Binet (currently SB5). He was well known for his work with gifted children. He was the one who called in an IQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

*How did Yerks contribute to the history of testing?

A

Yerks created Army Alpha and Army Beta, which was the birth of group testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

*How did Otis contribute to the history of testing?

A

Created multiple-choice questions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

*How did Rorschach contribute to the history of testing?

A

Rorschach gave us the Ink Blots and lots of things on projective testing. Eventually, this led to sentence completion and projective drawing tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

*How did the 1916 Stanford-Binet differ from the 1905 Binet Simon Scale?

A

Terman multiplied the quotient by 100. The number of items increased to 90 (from 30 in 1905), and the new scale was suitable for children and adults, from inferior to superior. The norm sample was representative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

*How did the needs of the American military in WWI and WWII influence psychological testing?

A

It brought about a need for group testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

*What is the importance of Woodworth’s Personal Data Sheet in the history of testing?

A

Basically, all personality tests came from Woodsworth. It was all yes or no questions and basically led to the MMPI..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

*What is the impact of evidence-based practice and outcomes assessment on the field of testing?

A
  • A need for validated instruments for treatments
    • Not something that examines global personality factors but something that focuses on particular symptoms or diagnoses
    • E.g. a quick instrument for assessing PTSD. Is my client getting better or worse?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

*What is the norm group and why is it so important in testing?

A

A norm group consists of a sample of examinees who are representative of the population for whom the test is intended. We need to know where someone falls in relation to the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

*What are the three measure of central tendency and their advantages and disadvantages?

A

Mean, median, and mode. If the results are skewed, use the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

*What is the standard deviation?

A

The standard deviation reflects the degree of dispersion in a group of scores. If the scores are tightly packed, the SD is small. If the distribution is wide, the SD is large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

*What are the advantages and disadvantages of using percentiles?

A

Percentiles are easy for laymen and experts to understand, and they give the score in relation to the population. However, the differences in percentiles don’t reflect the differences in raw scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

*How are standard scores used in testing?

A

A standard score expresses the distance from the mean in standard deviation units. -.50 2.50 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

*What is the mean and standard deviation for z scores?

A

A standard score is also called a z score.

M = 0 and SD = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

*What is the mean and standard deviation for T scores?

A

A T scores is also a standardized score.

M = 50 and SD = 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

*What is the mean and standard deviation for IQ scores?

A

M = 100 and SD = 15

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

*What is the mean and standard deviation for CEEB scores?

A

M = 500 and SD = 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

*What are stanines?

A

A stanine converts all scores to a single-digit scores ranging from 1 to 9. M = 5 and SD = 2. Allowed the use of keypunched cards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

*What considerations are involved in selecting a norm group?

A

The sample should be representative. The sampling is usually an alloy of random sampling and stratified random sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

*What are the characteristics of criterion-referenced tests compared with norm-referenced tests?

A

Purpose: Compare examines performance to a standard vs. one another
Item content: Narrow domain of skill with real-world relevance vs broad and indirect relevance
Item selection: Most items of similar difficulty level vs. items of varying difficulty
Interpretation of scores: Scores usually expressed as a percentage w/passing level predetermined vs scores usually expressed as a standard score, percentile or grade equivalent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

*What is reliability?

A

Consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

*What are the sources of measurement error?

A

Systematic and unsystematic.

39
Q

*What class is an aptitude test in?

A

Level B

40
Q

*What class in an individual intelligence test in?

A

Level C

41
Q

*What is a psychometrician?

A

A specialist in psychology or education who develops and evaluates psychological tests.

42
Q

*What is the definition of a test?

A

A standardized procedure from sampling behavior and describing it with categories or scores.

43
Q

*What is a standardized procedure?

A

A test that is administered in uniform from one examiner and setting to another.

44
Q

*What is a norm?

A

A summary of test results for a large and representative group of subjects.

45
Q

*What is a standardization sample?

A

The sample for the norm, which must be representative of the population for whom the test is intended or else it is not possible to determine an examinee’s relative standing.

46
Q

*What is a norm-reference test

A

The performance of each examinee is interpreted in reference to a relevant standardized sample.

47
Q

*What is a criterion-referenced test?

A

A test where the objective is to determine where the examinee stands with respect to very tightly defined educational objectives.

48
Q

*What is the difference between group tests and individual tests?

A

Group tests are usually paper-and pencil-measures, while individual tests are instruments that by their design and purpose must be administered one on one.

49
Q

*Why is important to establish rapport?

A

Rapport gives the testing environment a comfortable, warm atmosphere that serves to motivate examinees to elicit cooperation. Failure to establish rapport may result in test anxiety.

50
Q

*What are the important components of informed consent?

A

The test taker should be made aware of the following using language they understand:

  1. The reasons for testing
  2. The type of test to be used
  3. The intended use and range of material consequences of the intended use
  4. What will be released afterward
51
Q

*What is standard of care?

A

Usually, customary or reasonable care taken in the profession. Beware obsolete tests.

52
Q

*What is the stereotype threat?

A

The threat of confirming a self-characteristic, a negative stereotype about one’s group.

53
Q

*Physiognomy

A

The notion that we can judge the inner character of a person based on external appearance.

54
Q

*What is an age norm?

A

Depicts the level of test performance for each separate age group in the normative sample.

55
Q

*What are local and subgroup norms?

A

Local norms are derived from representative local examinees, as opposed to a national sample. Similarly, a subgroup norm consists of scores obtained from an identified subgroup instead of a diverse national sample.

56
Q

*What is an expectancy table?

A

A table that portrays the established relationship between test scores and expected outcome on a relevant task.

57
Q

*Classical theory of measurement

A

Test scores are influenced by two factors: Factors of consistency and inconsistency. The examiner wants to measure the factors of consistency.

58
Q

*Unsystematic measurement error

A

Item selection, test administration, and test scoring: The effects are unpredictable and inconsistent.

59
Q

*Systematic measurement error

A

The test consistently measures something it was not designed to measure.

60
Q

*Correlation coeffiient

A

r = The egree of linear relationsip between two variables.

61
Q

*Reliability coefficient

A

rxx the ratio of true score variance to the total variance of tests scores.

62
Q

*Coefficient alpha

A

The mean of all possible split-half coefficients

63
Q

*Spearman Brown

A

Less questions equals less reliability

64
Q

*Restriction of range

A

Makes test-retest reliability low, because not very many subjects.

65
Q

What is standard error of measurement?

A

Theoretically, if the subject took many tests, their various scores would result in a normal curve. That curve would have units of standard deviation. This SD unit is an SEM.

66
Q

What is a confidence interval?

A

A confidence interval is the amount of confidence we have that our score falls within a certain range, based on the intervals of SEM. It is given in a percentage.

67
Q

What is the standard error of the difference?

A

A statistical measure that can help a test user determine whether the difference between scores is significant. It is usually used for sub-scores on a test.

68
Q

Ch4: What is validity?

A

Does the test measure what it claims to measure?

A test is valid to the extent that inferences made from it are appropriate meaningful, and useful.

69
Q

What is the relationship between validity and reliability?

A

If a test is not reliable it’s not going to be valid. However a reliable test can be invalid. Something can be consistently bad. (You have to understand the relationship between reliability and validity.)

70
Q

What do we mean by a continuum of validity?

A

Validity cannot be captured in statistical summaries, instead it is on a continuum ranging from weak to acceptable to strong, based on the three types of validity evidence.

71
Q

What are the three categories of accumulating validity evidence?

A

Content validity
Criterion-related validity
Construct-validity

An ideal validation includes several types of evidence in all three categories.

72
Q

What is face-validity?

A

Well for one, it’s not actually validity. It’s how the test looks to examinees. It’s important because it can impact a person’s approach to the test. It’s loosely related to content validity.

73
Q

What is content validity?

A

Content validity is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample.

Item sampling - Do the items on the test fit the content for what you’re wanting to test
Types of skills - Recall or recognition. STM or LTM.

Especially useful when a great deal is known about the construct.

74
Q

What is criterion-related validity?

A

The test score is compared to an outcome measure (criterion). The criterion can be concurrent, e.g. people take a new IQ test and and established IQ test at the same time. The criterion can also be predictive, like in college readiness tests and employment tests.

75
Q

What makes a good criterion for criterion-related validity?

A

RELIABLE - consistency of scores.

APPROPRIATE - Well duh, but actually sometimes this can be tricky. Should the criterion measure of an aptitude test indicate satisfaction, success, or continuance in the activity?

FREE FROM THE CONTAMINATION OF THE TEST - This is where that becomes a problem, when your criterion becomes contaminated because of the test score. I want to see if this is useful, but you already used to test to determine who you hired.
It can also be contaminated by overlap between questions, e.g. if both tests ask about eating habits and sleeping habits will artificially inflate the correlation.

76
Q

What is decision theory?

A

The purpose of psychological testing is not measurement for its own sake, but measurement in the service of decision making.
Making decisions based on test scores results in a matrix of outcomes. With hits and misses (false positives and false negatives). You have to determine where you want your mistakes to be.

77
Q

What is construct validity?

A

A construct is a theoretical, intangible quality or trait in which individuals differ. Construct validity is theory based: Based on my understanding of this particular construct, what would I expect to see in a test?

No criterion or universe of content is accepted as entirely adequate to define the quality to be measured, so a variety of evidence is required to establish construct validity.

78
Q

What is test homogeneity?

A

A measure of construct validity.

Does it measure a single construct?

If my theory about this is a unitary construct and I do internal consistency and it looks like it’s just one construct. It could be measuring one thing, but it might not be the right thing.

79
Q

What are appropriate developmental changes?

A

A measure of construct validity.

Is my construct something that changes as people age?

Ego-centrism would have different results. The scores should go down as kids get older.

80
Q

What are theory-consistent group differences?

A

A measure of construct validity.

Can we predict who will have high and low scores for this construct?

Different rates of extroversion in different professions. Nuns are high in social interest. Models and criminals are low in social interest.

81
Q

What are theory-consistent intervention effects?

A

A measure of construct validity.

Does the construct change in the appropriate direction after intervention/treatment?

People’s scores of spatial orientation should increase after training, more than those who did not receive training.

82
Q

What is convergent and discrimination validation?

A

A measure of construct validity.

What should it correlate with and what should it be different from?

Intelligence and social interest are theoretically unrelated.
Anxiety and eating disorders overlap.

83
Q

What is factor analysis?

A

A measure of construct validity.

How many factors are you actually measuring?

If you think you’re measuring three factors, and a factor analysis shows three factors, that’s a good sign.

84
Q

What is classification accuracy?

A

A measure of construct validity.

How well does it give accurate identification of test takers? Test makers strive for high levels of:
SENSITIVITY: Accurate identification of patients who have a syndrome.
SPECIFICITY: Accurate identification of normal patients.

These are measured by percentages.
Sensitivity: 79% (correctly identifies 79% of affected individuals)
Specificity: 83% (correctly identifies 79% of unaffected individuals).

85
Q

What are extravalidity concerns?

A

Side effects and unintended consequences of testing.

86
Q

What are some of the unintended side effects of testing?

A

AKA Extravalidity concerns.

Children identified my feel unusual or dumb. Legal consequences. Test should also be evaluated for (1) values in interpretation, (2) usefulness in particular application, and (3) potential and actual social consequences. Along with traditional validity.

87
Q

What does NOIR stand for?

A

Nominal
Ordinal
Interval
Ratio

88
Q

What is a nominal scale?

A

Where the scales are simply categories, without any absolute order.

Male = 1, Female = 2.

89
Q

What is an ordinal scale?

A

A scale with categories following a specific order, but the distance between the categories is variable.

Freshman, Sophomore, Junior, Senior.
Ranking something from most liked to least liked

90
Q

What is an interval scale?

A

A scale in which the units have an order and equal distance between each unit. It does not posses an absolute 0.

A Likert scale is considered an interval scale for statistical purposes.

91
Q

What is a ratio scale?

A

A ratio scale is rare in psychological measurement. A scale with an absolute 0, which also allows for categorization, ranking, and intervals.

92
Q

What are some scaling methods? Which ones are best?

A

“No single scaling method is uniformly better than the others.”

Expert Ranking
Likert scales
Empirical keying
Rational scale construction

93
Q

What’s an example of expert ranking?

A

The Glasgow Coma Scale

How would experts rank each of these responses.

94
Q

What are methods of absolute scaling?

A

A procedure for obtaining a measure of absolute item difficulty based on different age groups of test takers.