Exam One Flashcards

Question 1

Q

Psychological testing

Answer

A

Refers to all possible uses, applications, and underlying concepts of psychological and educational tests.

Question 2

Q

Psychologists’ responsibility around test administration

Answer

A

Duty to select fair (representative), appropriate, updated, reliable and valid tests as scores drive decision-making

Question 3

Q

Types of psychological tests

Answer

A

Achievement- refers to previous learning (course material)
Aptitude- refers to the potential for learning or acquiring a specific skill (SAT)
Intelligence- general potential to solve problems, adapt, think abstractly, and learn from experience

Question 4

Q

Types of personality tests

Answer

A

Structured/Objective- multiple choice, true/false,
or Likert scale format, usually self-report

Projective- test materials or required response
(or both) are ambiguous (Rorschach)

Question 5

Q

How to evaluate utility of tests

Answer

A

Aspects of psychometric soundness

reliability (consistency)
validity (accuracy)

Test construction

item creation and/or selection
logical vs. theoretical vs. empirical considerations

Test administration
-variation in scores due to administrator, examinee,
and/or random error

Question 6

Q

Early antecedents for tests

Answer

A

Han Dynasty - Test batteries used for work-related evals
Ming Dynasty- testing rounds in testing centers used to nominate public officials
British missionaries- civil service test system
US- American Civil Service Commission

Question 7

Q

Darwin/Galton

Answer

A

Darwin
-The Origin of Specices: Evolution acts upon individual differences (survival and reproduction of the fittest)
Galton
-Documented individual differences in cognitive and physical abilities
-Founder of eugenics (selective reproduction
of individuals with “desirable” traits
Cattell
-Individual differences in cognitive and physical abilities
-Coined the term “mental tests”

Question 8

Q

Experimental psychologists

Answer

A

Donders

reaction time tests
cognitive psych experiment

Wundt

First psych lab
Sensation and perception

This era drove scientific method of psych testing
(requires rigorous experimental control)

Question 9

Q

Intelligence tests

Answer

A

Binet-Simon scale- first intelligence test, first use of standardized sample
Stanford-Binet scale- US version; standardized
sample- 1000, edited and new items
Group tests- developed in response to WWI by Yerkes; Army Alpha and Army Beta (1917)
Wechsler Intelligence Tests - included nonverbal subscale of intelligence (“performance”)

Question 10

Q

standardized sample

Answer

A

norm-based sample = comparing score to other people

Question 11

Q

representative sample

Answer

A

comprises individuals similar to those for whom the test is to be used

Question 12

Q

Mental age

Answer

A

measurement of a child’s performance relative to other children of that age group

Question 13

Q

Personality tests

Answer

A

measures traits
Woodworth Personal data sheet- military recruits likelihood of “shell shock”
Rorschach
Thematic Apperception Test

Question 14

Q

Modern personality tests

Answer

A

Objective Tests - no assumptions about meaning
of a test response
MMPI, CPI, 16PF (based on factor analysis (finds minimum number of dimensions to account for large # of variables)

Question 15

Q

Descriptive statistics

Answer

A

Statistics describing the sample or population.
measures of central tendency and variance
-can be used with ANY type of data
-including experimental or non-experimental data

Question 16

Q

Inferential statistics

Answer

A

Statistical procedures that allow inferences to be made from the sample to the population.
-infer causality
-more limited to experimental data
-type of data dictates type of analysis used
-must be careful of data distribution
(parametric vs. nonparametric)

Question 17

Q

Nominal data

Answer

A

Categorical data; no mathematical meaning
(dichotomous if two categories)
gender; political party, religion, species, team

Question 18

Q

Ordinal data

Answer

A

Indicates order- cannot know how far apart each item is (no equal intervals)
first to last; most to least
-basketball standings, sibling-line position, IQ scores

Question 19

Q

Interval data

Answer

A

True score data but there is no true zero; does not have equal intervals.

temperature in degrees, SAT scores
most psychological measures; Likert scale

Question 20

Q

Ratio data

Answer

A

Interval data with true zero.
most physical measures- height, weight,
speed, distance, volume, area

Question 21

Q

Normal distribution

Answer

A

Bell shaped, symmetry around central tendencies
- most stat procedures in PSYC assume normally
distributed scores
- parametric stats are based on symmetrical
(normal) distributions

Question 22

Q

Characteristics of parametric distributions

Answer

A

-approximate symmetry
-the distribution can be divided into standard deviation units
-the size of the deviation can be mathematically
defined on any measure that is interval or ratio in nature (skew)

Question 23

Q

Skew

Answer

A

The degree of departure from symmetry
Positively skewed- most S’s fall on the L side; tail skews right.
Negatively skewed- most S’s fall on the R side; tail skews left
Bimodal- 2 areas of the curve at equal frequencies with a dip in between

Question 24

Q

Variance

Answer

A

The variation of or differences among people in a distribution across the measure X

arises from natural, random differences among Ss
environmental variations
measurement error
researcher error (overt, covert)

Question 25

Q

Percentile ranks and how to calculate

Answer

A

Percentile ranks- the percentage of scores that
fall below particular score within distribution
Calculate:

divide number of cases below the score of interest by total number of cases in the group
multiply results by 100

Question 26

Q

Standard scores

Answer

A

z-scores: raw scores that are converted to fixed mean and standard deviation
-score measured in SD units (the deviation of a score from the mean in SD units)

Question 27

Q

Calculating a z-score

Answer

A

Find difference between observed score and mean for the distribution
Divide difference by SD of distribution

Mean exam score= 11.05 (SD is 7.01)
For a score of 14, z score is .42

Question 28

Q

Norms

Answer

A

Allow for evaluation of one’s performance relative to a larger group

Question 29

Q

Norm-referenced tests

Answer

A

-each test taker’s performance evaluated against standardized sample
-typically used for the purpose of making comparisons
with a larger group
-norms should be current, relevant, and representative
of the group to which the individual is being compared

Question 30

Q

Criterion-referenced tests

Answer

A

-represent predetermined level of performance to be reached (“benchmarks”)
-scores are compared to a preset “criterion score” (not
compared to others)
-No Child Left Behind

Question 31

Q

Correlations v. regression

Answer

A

Correlation assesses the magnitude and direction of a relationship. Regression is used to make predictions about scores on one variable from knowledge of scores on another variable. These predictions are obtained from the regression line (line of best fit).

Question 32

Q

Correlation coefficient (r)

Answer

A

-strength of association between variables
-Ranges between -1.0 and +1.0
-Calculating correlation between 2 variables for entire group, not 1 individual
-Reflects the amount of variability that is shared between 2 variables
+/- .10: weak, +/- .30: moderate, +/- .50: strong

Question 33

Q

p-value

Answer

A

Indicates whether the association is greater than what would be accepted by chance.

Question 34

Q

shared variance (r2)

Answer

A

Common variance, effect size, coefficient of determination

Question 35

Q

Correlation does not equal causation

Answer

A

Mediating variables may explain the relationship
Relationships can be bidirectional (thus both would be causal)
Causality can be inferred only under experimental manipulations

Question 36

Q

Experimental conditions

Answer

A

Experiments:

random assignment of participants
manipulation of at least one independent variable

Question 37

Q

Coefficient of determination

Answer

A

Correlation coefficient squared and then converted into a percentage; indicates effect size

Question 38

Q

Coefficient of alienation

Answer

A

A measure of nonassociation between two variables. subtract r2 from 1 (where r is the coefficient of determination)

Question 39

Q

Statistical significance

Question 40

Q

Reliability

Answer

A

refers to the accuracy, dependability, consistency, or repeatability of test results

Question 41

Q

Classical test theory

Answer

A

-Assumes each person has a true score (T) that
would be obtained with no errors in measurement
-Because measure instruments are imperfect, the observed score (X) for each person almost always differs from person’s true ability
-Difference between observed and true score = measurement error (E)
-T (true score) = X (observed score) - E (measurement error)
-Major assumptions- errors are obtained randomly and are normally distributed
- cannot be eliminated
-some error is systematic

Question 42

Q

Standard error of measurement

Answer

A

Provides estimate of how much individual’s score would be expected to change on re-testing with same/equivalent form of test

Avg the scores over the infinite number of tests, the average of scores is considered an estimate of the true ability/knowledge (T true score). The standard deviation of all those scores= SEM
creates a confidence band within which a person’s true score would be expected to fall

Question 43

Q

Domain sampling method

Answer

A

Instead of testing your ability to spell every possible word, we select a random sample of words.

T- % correct in spelling all words in English language
X- % correct in spelling all words in sample

As sample gets larger (the closer T and X are), reliability increases and error decreases
Because we do not know T:
Calculate the correlation between all sampling times (Xs)
Correlations are then averaged to predict T

Question 44

Q

Item response theory

Answer

A

Newer method & more preferred to the CTT, the
IRT instead uses an alerting method to assess ability
-Test increases in difficulty if get previous Q right
-Test decreases in difficulty if get previous Q wrong
-Level of “ideal” ability is heavily sampled

Overall result is a more reliable estimate of ability

Question 45

Q

Measurement error affecting reliability

Answer

A

questionable measurement precision
item sampling
construction of test items
factors related to test environment
varying judgments or beliefs of raters/observers
scoring of the test (objectivity of evaluator)
difficult of the test
factors related to test-taker

Question 46

Q

Measures to assess reliability

Answer

A

test-retest
parallel forms (ideal but rarely used)
internal consistency reliability (single test, most frequently used- IRT)

Question 47

Q

Test-retest

Answer

A

The same test is administered to the same person at different points in time.
-also called time sampling method
-only useful when assessing stable traits
Reduce carryover or practice effects
-The interval between measurement must be considered:
-shorter intervals -> higher carryover
-Be careful of developmental milestones

Question 48

Q

Parallel forms

Answer

A

Compares scores on two different measures of the same quality
-also called equivalent or alternate forms method

A rigorous assessment of reliability

carryover effects are eliminated
greater sampling of domain
Generally underutilized
- difficult to get people “back in the door”

Question 49

Q

Internal consistency

Answer

A

Extent to which different items on a test measure the same attribute or trait. Scores from 2 halves are correlated with each other

Question 50

Q

Methods to assess internal consistency

Answer

A

split-half
KR20 (Kuder & Richardson)
Cronbach alpha (coefficient alpha)

Question 51

Q

Split-half reliability

Answer

A

One test is split into two equal halves
Each half is compared to the other
-can be split randomly, first/second halves, or odd/even
The Spearman-Brown formula is used to correct for half-length and increases the estimate of reliability

Question 52

Q

KR20 reliability

Answer

A

-Simultaneously considers all possible ways of splitting methods (avoids problems of split-half methods)
-Only appropriate for tests in which items are
dichotomous (0 - incorrect/ 1- correct)
-finds the proportion of people who got each item right v. wrong

Question 53

Q

Coefficient alpha

Answer

A

Cronbach alpha: considered to be the most general and rigorous formula for determining reliability estimate through internal consistency
-can be used on Likert scales, when items can’t be classified as “right” or “wrong”

Question 54

Q

Inter rater reliability

Answer

A

Measure of reliability in behavioral observation studies

code a behavior from observational or behavioral study- compare degree of overlap among different observers
Start with an ethogram- operational definitions of variables

Question 55

Q

Kappa statistic

Answer

A

indicates actual agreement as corrected by level of chance agreement among different raters
1.0= perfect agreement between observers

Question 56

Q

Reliability coefficients

Answer

A

Range from 0.0-1.0
1.0= perfect reliability
=.90, then 10% of variation in scores attributable to measurement error

.90 and above= test highly reliable
.70 - .89 = moderate

Question 57

Q

Validity

Answer

A

Extent to which a test measure the quality it purports to measure
-test is accurately reflecting whatever construct, trait, or characteristic that it claims to measure

Evidence for validity comes from showing the association between the test and other variables.

Question 58

Q

Face validity

Answer

A

Based on logical v. stat. analysis

- The appearance that a test measures what it purports to at a surface level

Question 59

Q

Content validity

Answer

A

Evidence that the content of a test adequately represents the conceptual domain it is designed to cover

test items are a fair sample of the total potential content and relevant to construct being tested
based on logical analysis v. statistical analysis

Construct underrepresentation: failure to capture important components of a construct
Construct irrelevant variance- scores are influenced by factors irrelevant to the construct

Question 60

Q

Can a test be content valid without being face valid?

Answer

A

depression measures

- child abuse queries

Question 61

Q

Criterion validity

Answer

A

Extent to which a test corresponds with a particular criterion (standard against which test is compared)
- typically used when objective is to predict future performance on an unknown criterion

examples:
pre marital test marriage success
SAT college freshman GPA

Question 62

Q

Sub classes of criterion validity

Answer

A

Predictive- test or measure predicts future performance/success in relation to a particular criterion . Correlation (r) to describe extent to which 1 variable is predictive
SAT -> success in college

Concurrent- -concurrent measure is taken at same time as test
-correlation (r) to describe extent to which 1 variable correlates with another at the same time
work samples -> job performance

Question 63

Q

Validity coefficient

Answer

A

Relationship between a test and a criterion-
usually Pearson r
-tells the extent to which the test is valid for
making statements about the criterion

Less consensus regarding size of VCs

coefficients of .60 or higher are rare
.30 - .40 considered to be acceptable
even tests with lower validity coefficients can yield useful information
- link between cholesterol and heart disease is quite low, but importance predictive consequences for reducing mortality rates

Question 64

Q

Construct validity

Answer

A

Process used to establish meaning of a test through a series of studies

simultaneously define a construct and develop tests to measure it
look for correlation between the test and other measures

Answer 64

A

evidence that a test measures the same attribute as do other measures that purport
to measure the same construct
- tests should correlate well (highly) if believed to measure same construct
-what measures should a new depression measure/Health Index/reading ability test correlate with?

Answer 65

A

evidence that a test measures something different from what other available tests measure
-test would not correlate with unrelated tests

Answer 66

A

Measure of unique information gained through
using a test
-how much does information from test add to what is already known?
-how well does it improve the accuracy of decisions?
-based on logical analysis vs. statistical analysis

Answer 67

A

define clearly what you want to measure (operational defintion)
Generate an item pool (more items than you will end up including)
Avoid long/difficult Qs
Avoid items that convey 2 or more ideas
Consider making positively and negatively worded items
Be mindful of diversity

Answer 68

A

2 alternatives for each item

- overall less reliable and therefore less precise

Answer 69

A

rating scale with a continuum of alternatives to indicate agreement
may or may not contain a neutral point
is open to factor analysis

Answer 70

A

multiple alternatives for each item
probability of selecting correct answer by chance is lower
diminishing returns

Answer 71

A

rating system typically using more alternatives (1-10)
heavily context dependent (reduces validity)
diminishing returns

Answer 72

A

General term for a set of methods used to evaluate test items

Answer 73

A

asks what percent got item right
usually want ID to fall between chance level and 100% (usually .30-.70)
if 84% get #1 correct, ID is .84.
the higher the number, the easier the item

Answer 74

A

Calculating optimal item difficulty:

Subtract chance from 100% success (1.0)
Divide by 2
Add this value to chance

If chance is .25 (4 alternatives):

(1.0-.25) / 2 = .75/2 =.375
.375 + .25 = .625

Answer 75

A

Determines whether people who have done well on a particular item have also done well on the entire test

Answer 76

A

type of item analysis
compares those who did well to those who did poorly
calculation of a discrimination index - find the difference in the proportion of people in each group who got each item correct
Higher Discrimination Index= Higher Discriminability

Answer 77

A

type of item analysis
Correlation between a dichotomous and a continuous variable (individual item versus overall test score)
Is less useful on tests with fewer items
point biserial correlations closer to 1.0 indicate better questions

Answer 78

A

test analysis can tell us about the quality of a test, but it doesn’t help students learn
Purposes of tests are varied, and may emphasize ranking students over identifying weaknesses or gaps in knowledge
If teachers feel they need to “teach to the test” the outcomes of a test may be misleading and indicate more mastery than actually exists

Answer 79

A

role of feedback (type of feedback given to test taker)
role of race and gender of tester on test taker
role of language of the test taker (tests are highly linguistic)

Answer 80

A

anxiety over how one will be evaluated and how well s/he will perform
for members of a stereotyped group, pressure to disconfirm negative stereotypes

Answer 81

A

STT depletes working memory
STT leads to reduced effort and, in turn, reduced performance
STT causes physiological arousal that can disrupt performance

Answer 82

A

respondents give response they perceive to be expected

Answer 83

A

(Rosenthal effects):

can influence what interviewer expects out of interviewee
told a child is “smart” or “bad” ahead of time
giving examinee “benefit of the doubt” because he/she is pleasant

Answer 84

A

(Hawthorne effects):

can influence what subject expects out of test/interview
may act in accordance with those expectations

Answer 85

A

the less personalized the modality, the more likely information is to be disclosed
will disclose even more when confidentiality of responses is ensured

Answer 86

A

responses automatically recorded (reduces error)
standardization ensured
precisely timed responses
examiner bias controlled

Answer 87

A

specific set of Qs

- standardized- Qs are printed use exact phrasing

Answer 88

A

use transitional phrases or playback/restatement/summarizing/clarifying/understanding statements
goal is to lead to elaboration by interviewee within minimum effort by interviewer to maintain the flow

Answer 89

A

Clinical interview is used when you will likely be seeing the client moving forward in therapy whereas an assessment interview is conducted for the purpose of gathering information to answer a referral question ie “does this child have ASD?”
-Assessment interview, you are more likely to use standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources

Answer 90

A

Seek convergent or even divergent validity
- Correlate interview data with other measures (GPA, job performance, etc.)
Usually moderate validity coefficients (.40)

Answer 91

A

early impressions “stick” even if evidence to the contrary emerges
One prominent characteristic of interviewee biases interviewer’s judgments
misunderstanding of cultural differences

Answer 92

A

-interviewer reliability coefficients are quite variable
-Unstructured interviews have the lowest reliability, though they may lead to fairer outcomes than other asessment tools
-Interviews vary in their standardization-they can focus on different areas of importance
Structured interviews provide higher reliability estimates
-don’t provide as much or as varied information as unstructured or semi-structured interviews