Psychological Testing Flashcards

Question

*What is the norm group and why is it so important in testing?

Answer 1

A norm group consists of a sample of examinees who are representative of the population for whom the test is intended. We need to know where someone falls in relation to the distribution.

Answer 2

Mean, median, and mode. If the results are skewed, use the median.

Answer 3

The standard deviation reflects the degree of dispersion in a group of scores. If the scores are tightly packed, the SD is small. If the distribution is wide, the SD is large.

Answer 4

Percentiles are easy for laymen and experts to understand, and they give the score in relation to the population. However, the differences in percentiles don't reflect the differences in raw scores.

Answer 5

A standard score expresses the distance from the mean in standard deviation units. -.50 2.50 etc

Answer 6

A standard score is also called a z score. | M = 0 and SD = 1

Answer 7

A T scores is also a standardized score. | M = 50 and SD = 10

Answer 8

M = 100 and SD = 15

Answer 9

M = 500 and SD = 100

Answer 10

A stanine converts all scores to a single-digit scores ranging from 1 to 9. M = 5 and SD = 2. Allowed the use of keypunched cards.

Answer 11

The sample should be representative. The sampling is usually an alloy of random sampling and stratified random sampling.

Answer 12

Purpose: Compare examines performance to a standard vs. one another Item content: Narrow domain of skill with real-world relevance vs broad and indirect relevance Item selection: Most items of similar difficulty level vs. items of varying difficulty Interpretation of scores: Scores usually expressed as a percentage w/passing level predetermined vs scores usually expressed as a standard score, percentile or grade equivalent.

Answer 13

Consistency

Answer 14

Systematic and unsystematic.

Answer 15

A specialist in psychology or education who develops and evaluates psychological tests.

Answer 16

A standardized procedure from sampling behavior and describing it with categories or scores.

Answer 17

A test that is administered in uniform from one examiner and setting to another.

Answer 18

A summary of test results for a large and representative group of subjects.

Answer 19

The sample for the norm, which must be representative of the population for whom the test is intended or else it is not possible to determine an examinee's relative standing.

Answer 20

The performance of each examinee is interpreted in reference to a relevant standardized sample.

Answer 21

A test where the objective is to determine where the examinee stands with respect to very tightly defined educational objectives.

Answer 22

Group tests are usually paper-and pencil-measures, while individual tests are instruments that by their design and purpose must be administered one on one.

Answer 23

Rapport gives the testing environment a comfortable, warm atmosphere that serves to motivate examinees to elicit cooperation. Failure to establish rapport may result in test anxiety.

Answer 24

The test taker should be made aware of the following using language they understand: 1. The reasons for testing 2. The type of test to be used 3. The intended use and range of material consequences of the intended use 4. What will be released afterward

Answer 25

Usually, customary or reasonable care taken in the profession. Beware obsolete tests.

Answer 26

The threat of confirming a self-characteristic, a negative stereotype about one's group.

Answer 27

The notion that we can judge the inner character of a person based on external appearance.

Answer 28

Depicts the level of test performance for each separate age group in the normative sample.

Answer 29

Local norms are derived from representative local examinees, as opposed to a national sample. Similarly, a subgroup norm consists of scores obtained from an identified subgroup instead of a diverse national sample.

Answer 30

A table that portrays the established relationship between test scores and expected outcome on a relevant task.

Answer 31

Test scores are influenced by two factors: Factors of consistency and inconsistency. The examiner wants to measure the factors of consistency.

Answer 32

Item selection, test administration, and test scoring: The effects are unpredictable and inconsistent.

Answer 33

The test consistently measures something it was not designed to measure.

Answer 34

r = The egree of linear relationsip between two variables.

Answer 35

rxx the ratio of true score variance to the total variance of tests scores.

Answer 36

The mean of all possible split-half coefficients

Answer 37

Less questions equals less reliability

Answer 38

Makes test-retest reliability low, because not very many subjects.

Answer 39

Theoretically, if the subject took many tests, their various scores would result in a normal curve. That curve would have units of standard deviation. This SD unit is an SEM.

Answer 40

A confidence interval is the amount of confidence we have that our score falls within a certain range, based on the intervals of SEM. It is given in a percentage.

Answer 41

A statistical measure that can help a test user determine whether the difference between scores is significant. It is usually used for sub-scores on a test.

Answer 42

Does the test measure what it claims to measure? | A test is valid to the extent that inferences made from it are appropriate meaningful, and useful.

Answer 43

If a test is not reliable it's not going to be valid. However a reliable test can be invalid. Something can be consistently bad. (You have to understand the relationship between reliability and validity.)

Answer 44

Validity cannot be captured in statistical summaries, instead it is on a continuum ranging from weak to acceptable to strong, based on the three types of validity evidence.

Answer 45

Content validity Criterion-related validity Construct-validity An ideal validation includes several types of evidence in all three categories.

Answer 46

Well for one, it's not actually validity. It's how the test looks to examinees. It's important because it can impact a person's approach to the test. It's loosely related to content validity.

Answer 47

Content validity is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample. Item sampling - Do the items on the test fit the content for what you're wanting to test Types of skills - Recall or recognition. STM or LTM. Especially useful when a great deal is known about the construct.

Answer 48

The test score is compared to an outcome measure (criterion). The criterion can be concurrent, e.g. people take a new IQ test and and established IQ test at the same time. The criterion can also be predictive, like in college readiness tests and employment tests.

Answer 49

RELIABLE - consistency of scores. APPROPRIATE - Well duh, but actually sometimes this can be tricky. Should the criterion measure of an aptitude test indicate satisfaction, success, or continuance in the activity? FREE FROM THE CONTAMINATION OF THE TEST - This is where that becomes a problem, when your criterion becomes contaminated because of the test score. I want to see if this is useful, but you already used to test to determine who you hired. It can also be contaminated by overlap between questions, e.g. if both tests ask about eating habits and sleeping habits will artificially inflate the correlation.

Answer 50

The purpose of psychological testing is not measurement for its own sake, but measurement in the service of decision making. Making decisions based on test scores results in a matrix of outcomes. With hits and misses (false positives and false negatives). You have to determine where you want your mistakes to be.

Answer 51

A construct is a theoretical, intangible quality or trait in which individuals differ. Construct validity is theory based: Based on my understanding of this particular construct, what would I expect to see in a test? No criterion or universe of content is accepted as entirely adequate to define the quality to be measured, so a variety of evidence is required to establish construct validity.

Answer 52

A measure of construct validity. Does it measure a single construct? If my theory about this is a unitary construct and I do internal consistency and it looks like it's just one construct. It could be measuring one thing, but it might not be the right thing.

Answer 53

A measure of construct validity. Is my construct something that changes as people age? Ego-centrism would have different results. The scores should go down as kids get older.

Answer 54

A measure of construct validity. Can we predict who will have high and low scores for this construct? Different rates of extroversion in different professions. Nuns are high in social interest. Models and criminals are low in social interest.

Answer 55

A measure of construct validity. Does the construct change in the appropriate direction after intervention/treatment? People's scores of spatial orientation should increase after training, more than those who did not receive training.

Answer 56

A measure of construct validity. What should it correlate with and what should it be different from? Intelligence and social interest are theoretically unrelated. Anxiety and eating disorders overlap.

Answer 57

A measure of construct validity. How many factors are you actually measuring? If you think you're measuring three factors, and a factor analysis shows three factors, that's a good sign.

Answer 58

A measure of construct validity. How well does it give accurate identification of test takers? Test makers strive for high levels of: SENSITIVITY: Accurate identification of patients who have a syndrome. SPECIFICITY: Accurate identification of normal patients. These are measured by percentages. Sensitivity: 79% (correctly identifies 79% of affected individuals) Specificity: 83% (correctly identifies 79% of unaffected individuals).

Answer 59

Side effects and unintended consequences of testing.

Answer 60

AKA Extravalidity concerns. Children identified my feel unusual or dumb. Legal consequences. Test should also be evaluated for (1) values in interpretation, (2) usefulness in particular application, and (3) potential and actual social consequences. Along with traditional validity.

Answer 61

Nominal Ordinal Interval Ratio

Answer 62

Where the scales are simply categories, without any absolute order. Male = 1, Female = 2.

Answer 63

A scale with categories following a specific order, but the distance between the categories is variable. Freshman, Sophomore, Junior, Senior. Ranking something from most liked to least liked

Answer 64

A scale in which the units have an order and equal distance between each unit. It does not posses an absolute 0. A Likert scale is considered an interval scale for statistical purposes.

Answer 65

A ratio scale is rare in psychological measurement. A scale with an absolute 0, which also allows for categorization, ranking, and intervals.

Answer 66

"No single scaling method is uniformly better than the others." Expert Ranking Likert scales Empirical keying Rational scale construction

Answer 67

The Glasgow Coma Scale How would experts rank each of these responses.

Answer 68

A procedure for obtaining a measure of absolute item difficulty based on different age groups of test takers.

Psychological Testing Flashcards

(94 cards)