Psychological Testing #2 Flashcards

Question 1

Q

*Restriction of range

Answer

A

Makes test-retest reliability low, because not very many subjects.

Question 2

Q

What is standard error of measurement?

Answer

A

Theoretically, if the subject took many tests, their various scores would result in a normal curve. That curve would have units of standard deviation. This SD unit is an SEM.

Question 3

Q

What is a confidence interval?

Answer

A

A confidence interval is the amount of confidence we have that our score falls within a certain range, based on the intervals of SEM. It is given in a percentage.

Question 4

Q

What is the standard error of the difference?

Answer

A

A statistical measure that can help a test user determine whether the difference between scores is significant. It is usually used for sub-scores on a test.

Question 5

Q

Ch4: What is validity?

Answer

A

Does the test measure what it claims to measure?

A test is valid to the extent that inferences made from it are appropriate meaningful, and useful.

Question 6

Q

What is the relationship between validity and reliability?

Answer

A

If a test is not reliable it’s not going to be valid. However a reliable test can be invalid. Something can be consistently bad. (You have to understand the relationship between reliability and validity.)

Question 7

Q

What do we mean by a continuum of validity?

Answer

A

Validity cannot be captured in statistical summaries, instead it is on a continuum ranging from weak to acceptable to strong, based on the three types of validity evidence.

Question 8

Q

What are the three categories of accumulating validity evidence?

Answer

A

Content validity
Criterion-related validity
Construct-validity

An ideal validation includes several types of evidence in all three categories.

Question 9

Q

What is face-validity?

Answer

A

Well for one, it’s not actually validity. It’s how the test looks to examinees. It’s important because it can impact a person’s approach to the test. It’s loosely related to content validity.

Question 10

Q

What is content validity?

Answer

A

Content validity is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample. Especially useful when a great deal is known about the construct.

Item sampling - (Behavior) Do the items on the test fit the content for what you’re wanting to test. If I’m testing 4th grade math level, and I examine skills that aren’t taught until 5th grade, then that’s poor content validity.
Types of skills - (Responses) Multiple choice or open ended?

“Expert review” is often the choice of evidence.

Question 11

Q

What is criterion-related validity?

Answer

A

The test score is compared to an outcome measure (criterion). The criterion can be concurrent, e.g. people take a new IQ test and and established IQ test at the same time. The criterion can also be predictive, like in college readiness tests and employment tests.

Question 12

Q

What makes a good criterion for criterion-related validity?

Answer

A

RELIABLE - consistency of scores.

APPROPRIATE - Well duh, but actually sometimes this can be tricky. Should the criterion measure of an aptitude test indicate satisfaction, success, or continuance in the activity?

FREE FROM THE CONTAMINATION OF THE TEST - This is where that becomes a problem, when your criterion becomes contaminated because of the test score. I want to see if this is useful, but you already used to test to determine who you hired.It can also be contaminated by overlap between questions, e.g. if both tests ask about eating habits and sleeping habits will artificially inflate the correlation.

Question 13

Q

What is decision theory?

Answer

A

The purpose of psychological testing is not measurement for its own sake, but measurement in the service of decision making.

Making decisions based on test scores results in a matrix of outcomes. With hits and misses (false positives and false negatives). You have to determine where you want your mistakes to be.

Question 14

Q

What is construct validity?

Answer

A

A construct is a theoretical, intangible quality or trait in which individuals differ. Construct validity is theory based: Based on my understanding of this particular construct, what would I expect to see in a test?

No criterion or universe of content is accepted as entirely adequate to define the quality to be measured, so a variety of evidence is required to establish construct validity.

Question 15

Q

What is test homogeneity?

Answer

A

A measure of construct validity.

Does it measure a single construct?

If my theory about this is a unitary construct and I do internal consistency and it looks like it’s just one construct. It could be measuring one thing, but it might not be the right thing.

Question 16

Q

What are appropriate developmental changes?

Answer

A

A measure of construct validity.Is my construct something that changes as people age?Ego-centrism would have different results. The scores should go down as kids get older.

Question 17

Q

What are theory-consistent group differences?

Answer

A

A measure of construct validity.Can we predict who will have high and low scores for this construct?Different rates of extroversion in different professions. Nuns are high in social interest. Models and criminals are low in social interest.

Question 18

Q

What are theory-consistent intervention effects?

Answer

A

A measure of construct validity.Does the construct change in the appropriate direction after intervention/treatment?People’s scores of spatial orientation should increase after training, more than those who did not receive training.

Question 19

Q

What is convergent and discrimination validation?

Answer

A

A measure of construct validity.What should it correlate with and what should it be different from? Intelligence and social interest are theoretically unrelated.Anxiety and eating disorders overlap.

Question 20

Q

What is factor analysis?

Answer

A

A measure of construct validity.How many factors are you actually measuring?If you think you’re measuring three factors, and a factor analysis shows three factors, that’s a good sign.

Question 21

Q

What is classification accuracy?

Answer

A

A measure of construct validity.
How well does it give accurate identification of test takers? Test makers strive for high levels of:
SENSITIVITY: Accurate identification of patients who have a syndrome.
SPECIFICITY: Accurate identification of normal patients.

These are measured by percentages. Sensitivity: 79% (correctly identifies 79% of affected individuals). Specificity: 83% (correctly identifies 79% of unaffected individuals).

Question 22

Q

What are extravalidity concerns?

Answer

A

Side effects and unintended consequences of testing.

Question 23

Q

What are some of the unintended side effects of testing?

How do we prevent extravalidity problems?

Answer

A

AKA Extravalidity concerns.

Children identified my feel unusual or dumb. Legal consequences. Test should also be evaluated for (1) values in interpretation, (2) usefulness in particular application, and (3) potential and actual social consequences. Along with traditional validity.

Question 24

Q

What does NOIR stand for?

Answer

A

Nominal
Ordinal
Interval
Ratio

Question 25

Q

What is a nominal scale?

Answer

A

Where the scales are simply categories, without any absolute order.

Male = 1, Female = 2.

Question 26

Q

What is an ordinal scale?

Answer

A

A scale with categories following a specific order, but the distance between the categories is variable.

Freshman, Sophomore, Junior, Senior.
Ranking something from most liked to least liked

Question 27

Q

What is an interval scale?

Answer

A

A scale in which the units have an order and equal distance between each unit. It does not posses an absolute 0. A Likert scale is considered an interval scale for statistical purposes.

Question 28

Q

What is a ratio scale?

Answer

A

A ratio scale is rare in psychological measurement. A scale with an absolute 0, which also allows for categorization, ranking, and intervals.

Question 29

Q

What are some scaling methods? Which ones are best?

Answer

A

"No single scaling method is uniformly better than the others." 
Expert Ranking
Likert scales
Guttman scales
Empirical keying
Rational scale construction

Question 30

Q

What’s an example of expert ranking?

Answer

A

The Glasgow Coma Scale

How would experts rank each of these responses.

Question 31

Q

What are methods of absolute scaling?

Answer

A

A procedure for obtaining a measure of absolute item difficulty based on different age groups of test takers. You don’t want questions to be bunched around certain ages and leave gaps at others.

Question 32

Q

What is empirical keying?

Answer

A

You develop a long list of questions and try them out on contrasting groups (depressed/not depressed, delinquents/non-delinquents) and try and see if the groups answer the questions differently.

Question 33

Q

What is the heart of the method of rational scaling?

Answer

A

That all the scale items correlate positively with each other and also with the total score for the scale. The questions need to correlate with each other, or we won’t keep them.

Question 34

Q

What are the initial questions of test construction?

Answer

A

Range of difficulty
Item format
Item difficulty
Item-discrimination

Question 35

Q

How would range of difficulty be different for different types of tests?

Answer

A

Norm-referenced tests would have a greater range of difficulty, because we want to know who the outliers our.
Criterion-referenced tests would be more restricted, because no one cares if you’re in the 99th percentile of drivers on your driving test.

Question 36

Q

What are some examples of item format and what are their strengths and weaknesses?

Answer

A

A multiple choice questions can capture conceptual as well as factual knowledge and can be easily judge for fairness based on statistics. However, they can be difficult to write with good distractors, and they can can queue a half knowledgeable respondent.

Matching questions are problematic because the responses may not be independent.

True/false questions can be easy to understand but people may choose the most desirable answer.

Forced choice questions can prevent people from picking the most desirable option, but they haven’t been embraced yet by test developers.

Question 37

Q

What are the best types of items to use?

Answer

A

It depends on the test.

Question 38

Q

How do we measure item difficulty?

Answer

A

We measure how many people get the item correct.
An item with a .3 is an item 3% of people got correct. So it’s hard. An easier question would be a .8.

Generally, item difficulty hovers around .5 with a range of .3-.7, but this will change depending on the type of test.

Question 39

Q

What are the two types of item-discrimination?

Answer

A

High vs low scorers - If a lot of the high scoring people get it right, and the low scoring people get it wrong, it’s a good question. So what if most of the people As and Bs get it wrong and the people who get Cs and Ds get it right? Then there might be a problem with the key or the question is poorly worded.
Analysis of item choices - What was the variability of the choices? Did everyone guess A and B and no one guesses C or D? Then C and D are wastes of space. You want good distractors. Occasionally, B could be too close to A, so you want to make the distractor less like the actual answers.

Question 40

Q

What is cross-validation and how is it related to validity shrinkage?

Answer

A

Cross-validation means using the original regression equation in a new sample to determine whether the test predicts the criterion as well as it did in the original sample. Because the test was developed based on the original sample, it follows that it would correlate less with the second sample. This phenomenon is called validity shrinkage.

Question 41

Q

How might you get feedback from examinees, and how will that contribute to test development?

Answer

A

You can give questionnaires to the examinees after the test or you can have them think aloud about it in an open-ended manner.

The Inter-University entrance exam was modified in numerous ways in response to feedback. Time limits on some sections were increased. Perceived culturally unfair items were deleted.

Question 42

Q

How are testing materials important?

Answer

A

Tri-fold board instead of just one piece of cardboard. Books that stand up on their own. Intelligence tests have a lot of components that need to be manipulated, and on top of those manuals, stopwatches, and small children.

Question 43

Q

What are the two manuals you need for a test and why?

Answer

A

Technical manual and user’s manual - A test user needs both of these. The technical manual tells you the background and helps you determine if you want to use the test.

Question 44

Q

What is an real definition and how is it different from an operational definition?

Answer

A

A real definition is one that seeks to tell us the true nature of the thing being defined. An operational definition is a definition of a concept in terms of the way it is measured.

Question 45

Q

What are the shortcomings of operational definitions of intelligence?

Answer

A

They are circular: “What the tests test.”

They block further progress in understanding the nature of intelligence.

Question 46

Q

How does the textbook define intelligence?

Answer

A

Intelligence is:

The capacity to learn from experience.
The capacity to adapt to one’s environment.

These two themes occur again and again in definitions of intelligence. Many textbooks also include the ability to engage in abstract reasoning.

Question 47

Q

What is Spearman’s theory of intelligence?

Answer

A

Two factors - G and S

G = General factor - This is what Spearman emphasized. Your score on a test would be strongly affected by G. So he wanted tests that would measure G.

S = Specific ability - Like verbal skills, spacial skills.

Question 48

Q

What is Thurstone’s theory of intelligence?

Answer

A

Unlike Spearman, Thurstone didn’t believe in a general ability or G factor. Instead he said there were several broad factors like verbal comprehension and perceptual speed. He called these Primary Mental Abilities.

Question 49

Q

What is Cattell-Horn-Carroll’s theory of intelligence?

Answer

A

Their had a fairly complex theory with lots of pieces.

They said there were three types of intelligence, which kind of combine Spearman’s and Thurstone’s theories.

1) Pervasive (similar to G)
2) Broad (similar to Primary Mental Abilities)
3) Specific (similar to S)

Their broad factors included the differentiation of fluid and crystallized intelligence.

Question 50

Q

What is fluid intelligence?

Answer

A

Higher level reasoning, like testing hypothesis, inductive reasoning, etc. Mostly non-verbal and not culturally bound. Could also be considered the process for the process for solving problems.

Question 51

Q

What is crystallized intelligence?

Answer

A

Our acquired knowledge, what we accumulate across time. Especially cultural knowledge and language.

Question 52

Q

What is Guilford’s theory of intelligence?

Answer

A

Structure of Intellect Model
Guilford went a little overboard and came up with 150 factors of intelligence. He had to simplify that somehow, so now we have these three things:

1) Operations - What kind of intellectual operation required by the test? Is memorization or evaluation?
2) Contents - How are the materials or information presented to the examinee? Are they visual or auditory?
3) Products - What kind of mental structure must the brain produce? A unit or a system?

Question 53

Q

What is Naglieri and Das’s theory of intelligence?

Answer

A

PASS: Planning, Attention, Simultaneous, and Successive Theory
Can be considered an information processing theory. Does something need simultaneous or successive processing? Planning is the last step.

Question 54

Q

What is Gardner’s theory of intelligence?

Answer

A

Theory of Multiple Intelligences

7 types of intelligences according to this book, with three under investigation

Gardner doesn’t have the most pieces, but his theory is the broadest, covering things like bodily-kinesthetic ability although many wouldn’t consider that an intelligence.

He uses research with savants to defend his intelligences. Savants challenge Spearman’s G.

Question 55

Q

What is Sternberg’s theory of intelligence?

Answer

A

Triarchic Theory of Intelligence

Componential (Analytical) - part of traditionally IQ testing
Experiential (Creative) - how we deal with novelty and automatize information processing, not really assessed by IQ tests
Contextual (Practical) - how we select, adapt, and shape our environment, not really assessed by IQ tests

Question 56

Q

Why do we need to know about different tests?

Answer

A

Because we need a knowledge of a tests strengths and weaknesses as they pertain to the referral question.

Question 57

Q

How are the Wechsler-Bellevue Intelligence Scales 1939 important to intelligence test history?

Answer

A

Described by some as the first successful test of intelligence for adults because for a long time, intelligence testing was based on what works with kids, which adults found boring.

Wechsler through out mental age, which didn’t mean anything for adults, and use IQ constancy instead.

Question 58

Q

What is IQ constancy?

Answer

A

The IQ retains its properties and remains constant across different ages, even though raw intellectual ability might shift.

Question 59

Q

What were the three scores of the Wechsler-Bellevue Intelligence Scales 1939?

Answer

A

Verbal Scale IQ
Performance Scale IQ
Composite score

Question 60

Q

What are the three Weschsler scales now?

Answer

A

WPPSI-IV (ages 2-7)
WISC-IV (ages 6-16)
WAIS-IV (16-90)

Question 61

Q

What does WPPSI mean?

Answer

A

Wechsler Preschool and Primary Scale of Intelligence

Question 62

Q

What does WISC mean?

Answer

A

Wechsler Intelligence Scale for Children

Question 63

Q

What does WAIS mean?

Answer

A

Wechsler Adult Intelligence Scale

Question 64

Q

Why is important that there are commonalities among the tests?

Answer

A

This has helped them stay relevant, because once you’re trained in one, you are trained in others.

Answer 65

A

Vocabulary.

…but you can’t use this test alone.

Answer 66

A

Picture completion because it may be inappropriate for culturally disadvantaged. One of the early tests had a tennis court with a net missing. Some children would say the body was missing in the picture of the face. Some children never see pictures of just a face.

Answer 67

A

Information

Answer 68

A

1) Used when a problem occurs with another subtest. It doesn’t happen very often.
2) When additional information is needed.

People are tired of you and tired of taking the test after an hour to and hour and a half, so you don’t need to do any more than necessary.

Answer 69

A

Verbal comprehension
Perceptual reasoning
Working memory
Processing speed

Answer 70

A

The domains: nonverbal and verbal
The factors: Fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory.

2 Domains X 5 Factors = 10 subtests

Answer 71

A

A routing procedure is on the SB5. It estimates the general cognitive ability of the examinee in order to determine the starting points on subtests.

Answer 72

A

Verbal IQ

Nonverbal IQ

Answer 73

A

It has extensive high- end items and improved low-end items.
It can be used to assess individuals with limited English.
The test was evaluated on fairness, including religious tradition.
The working memory factor can help assess ADHD.

Answer 74

A

It has 10 subtests, and 16 composite scores, including general intelligence, optimal level, and 14 ability areas (in 7 dichotomies). General intelligence correlates, but the theory behind the other composites hasn’t been supported. Also, there are more composite scores than subtests, which is weird.

Answer 75

A

Based on PASS
Children with ADHD score lower on Planning and Attention.
Less differences between black and white scores.

Answer 76

A

Simultaneous and Successive

Answer 77

A

It only takes about 20 minutes.
Mainly a screening test.
Correlates strongly with the WISC, but tends to overestimate scores by about 3-5 points.

Answer 78

A

Intelligence tests are designed to measure the broad mental abilities of the individual, but achievement tests are intended to appraise what a person has learned in school or some other course of study. They are often used in diagnosing learning disabilities.

Answer 79

A

Kaufman Test of Educational Achievement.
Individual Test of Achievement
It scores ages 4 1/2 - 25

Reading
Mathematics
Written Language
Oral Language.

Answer 80

A

It’s a long story…
The government says one thing, but it didn’t get help to kids who needed it.
The NJCLD said a Learning disorder is an intrinsic to the individual, identifies the central nervous system dysfunction as the origin, and states that LD may extend into adulthood.
A person who has weakness in all areas does not have an LD.

Answer 81

A

Response to intervention (RTI)
The focus is on early results and outcomes rather than later spending excessive time and resources on children who are already feeling because of their LD.

Answer 82

A

a relative weakness in one area (intraindividual)
a coexisting condition cannot be the primary cause
They are heterogeneous
developmental
social and emotional difficulties

Answer 83

A

Dyslexia or verbal learning disability (left brain)

Right hemisphere or nonverbal learning disability

Answer 84

A

Ability
Aptitude
Achievement
The distinction between these is often fuzzy, they differ mainly in their functions and applications but not so much in content.

Answer 85

A

Estimate current intellectual level. Maybe used for screening or placement purposes such as the gifted and talented program.

Answer 86

A

They measure a few homogeneous segments of the ability are designed to predict future performance. Predictive validity is most important.

Answer 87

A

Assess current skill attainment in relation to the goals of school and training programs.

Answer 88

A

Some examination score far below their true ability.

2. Invalid scores and I’ll be recognizes such.

Answer 89

A

Culture free tests are impossible, all of our knowledge is acquired in a culture.
Culture-fair tests are questionable. Some people say you can reduce the impact of culture so it’s fair, but other people say it’s just a nice idea. Even a novel stimulus, that no 5yo has ever encountered before, a child may pick something different as the goal. A lot of culture fair tests have validity problems.

Answer 90

A

The margin of error to be expected in the predicted criterion score.

Answer 91

A

A scale where if you endorse one statement, you also endorse all the milder statements.

I occasionally feel sad or blue.
I often feel sad or blue.
I feel sad or blue most of the time.
I always feel sad or blue.

Answer 92

A

The Wechsler Individual Achievement Test
Ages 4-50

Linked with all Wechsler scales for comparison of intelligence and achievement. Good for identifying learning disabilities.

Answer 93

A

The Woodcock-Johnson III Tests of Achievement

Co-normed with its own intelligence test. The most extensive and comprehensive achievement battery of any tests.

Area scores are linked directly to federal standards of public law 94-142.

Answer 94

A

Wide Range Achievement Test-4

A screening instrument (15-25 min) not for specific achievement deficits.

Answer 95

A

Rules that tell us when to start the test and when to stop them. Used in the Wechsler Scales.

Answer 96

A

Although the overall scores of each test correlate strongly, the different approaches yield distinct sets of subscores. Also the referral question. Know strengths and weaknesses of each test.

Answer 97

A

13-15 subtests

Breakdown of scores 4 ways

A common metric for IQ and Index scores across all three tests.

Common subtests among the tests

Answer 98

A

Originally designed to measure Spearman’s G, the eduction of correlates (eduction = figuring out relationships)

Two factors, with conjectured associated skills:
Adding and subtracting items = rapid decision making and perception of part-whole relationships
Pattern of progression items = mechanical ability, estimating projected movement, and mental rotations.

Test-retest reliability isn’t great, especially with younger subjects.

About as culture fair as it gets.

Answer 99

A

Military - It screens people and helps the military determine what kind of training or role they should have.

It has lots of composite scores made up of the subtests. But these composite scores correlate strongly.

Most widely used aptitude test.

Answer 100

A

Good measure of general cognitive ability.

Only about r = .42-.62 correlation, with the higher level from using HS GPA.

Colleges probably feel like they make better decisions with this than without it.

Answer 101

A

Various ways that tests are culturally and sexually biased.

A test is deemed biased if it is differentially valid for different subgroups.

Answer 102

A

Items ask for information that ethnic minority or disadvantaged persons have not had equal opportunity to learn.
The scoring of the items is improper, since the test author has arbitrarily decided on the only correct answer, which may not be correct in all cultures.
The wording may be unfamiliar.

Answer 103

A

Expert judges cannot identify culturally biased test items based on an analysis of item characteristics.

Answer 104

A

Factor analysis
Regression equations
Intergroups comparisons of difficulty levels
Rank ordering of item difficulties

Answer 105

A

A test is deemed biased if it is differentially valid for different subgroups.

Test fairness is a broad concept that recognizes to importance of social values in test usage.

Answer 106

A

Unqualified individualism
Quotas
Qualified individualism

Answer 107

A

How much of our psychological make-up comes from our genetics vs. the environment.
Research using twin studies.

Answer 108

A

Environmental circumstances impact intelligence.

Lack of enrichment means kids’ scores actually go down with time.

Answer 109

A

Socioeconomics
Test bias
Genetics (not strong because the gap has decreased over time)

Answer 110

A

Cross-sectional is not the best way to research these questions.

Cross-sequential research has shown important points:
Overall, declines begin to occur around age 70.
Different findings for different abilities
More decline for processing speed
May even have improvements in some skills (vocabulary)

Answer 111

A

IQ scores improve through generations.
Are you taking a new test or one that’s about to be replaced. It could make a difference on whether they receive educational benefits.

Test revisions are essential.
Mazes are no longer used in IQ tests.

Answer 112

A

A test that is relatively more difficult for members of one group than another when there’s not reasonable explanation for it.

Answer 113

A

A test measures different hypothetical trains for one group than the other.

Answer 114

A

The best candidates without exception should be selected.

Answer 115

A

Selecting employees to match the general racial make-up of the area, even if people aren’t the most qualified.

Answer 116

A

Refusing to race or sex to make decisions, even when it is empirically justified to do so.