Exam One Flashcards

1
Q

Psychological testing

A

Refers to all possible uses, applications, and underlying concepts of psychological and educational tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Psychologists’ responsibility around test administration

A

Duty to select fair (representative), appropriate, updated, reliable and valid tests as scores drive decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of psychological tests

A
  1. Achievement- refers to previous learning (course material)
  2. Aptitude- refers to the potential for learning or acquiring a specific skill (SAT)
  3. Intelligence- general potential to solve problems, adapt, think abstractly, and learn from experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of personality tests

A

Structured/Objective- multiple choice, true/false,
or Likert scale format, usually self-report

Projective- test materials or required response
(or both) are ambiguous (Rorschach)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to evaluate utility of tests

A

Aspects of psychometric soundness

  • reliability (consistency)
  • validity (accuracy)

Test construction

  • item creation and/or selection
  • logical vs. theoretical vs. empirical considerations

Test administration
-variation in scores due to administrator, examinee,
and/or random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Early antecedents for tests

A

Han Dynasty - Test batteries used for work-related evals
Ming Dynasty- testing rounds in testing centers used to nominate public officials
British missionaries- civil service test system
US- American Civil Service Commission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Darwin/Galton

A

Darwin
-The Origin of Specices: Evolution acts upon individual differences (survival and reproduction of the fittest)
Galton
-Documented individual differences in cognitive and physical abilities
-Founder of eugenics (selective reproduction
of individuals with “desirable” traits
Cattell
-Individual differences in cognitive and physical abilities
-Coined the term “mental tests”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Experimental psychologists

A

Donders

  • reaction time tests
  • cognitive psych experiment

Wundt

  • First psych lab
  • Sensation and perception

This era drove scientific method of psych testing
(requires rigorous experimental control)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Intelligence tests

A

Binet-Simon scale- first intelligence test, first use of standardized sample
Stanford-Binet scale- US version; standardized
sample- 1000, edited and new items
Group tests- developed in response to WWI by Yerkes; Army Alpha and Army Beta (1917)
Wechsler Intelligence Tests - included nonverbal subscale of intelligence (“performance”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

standardized sample

A

norm-based sample = comparing score to other people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

representative sample

A

comprises individuals similar to those for whom the test is to be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mental age

A

measurement of a child’s performance relative to other children of that age group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Personality tests

A
  • measures traits
  • Woodworth Personal data sheet- military recruits likelihood of “shell shock”
  • Rorschach
  • Thematic Apperception Test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Modern personality tests

A

Objective Tests - no assumptions about meaning
of a test response
MMPI, CPI, 16PF (based on factor analysis (finds minimum number of dimensions to account for large # of variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Descriptive statistics

A

Statistics describing the sample or population.
measures of central tendency and variance
-can be used with ANY type of data
-including experimental or non-experimental data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inferential statistics

A

Statistical procedures that allow inferences to be made from the sample to the population.
-infer causality
-more limited to experimental data
-type of data dictates type of analysis used
-must be careful of data distribution
(parametric vs. nonparametric)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Nominal data

A

Categorical data; no mathematical meaning
(dichotomous if two categories)
gender; political party, religion, species, team

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ordinal data

A

Indicates order- cannot know how far apart each item is (no equal intervals)
first to last; most to least
-basketball standings, sibling-line position, IQ scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interval data

A

True score data but there is no true zero; does not have equal intervals.

  • temperature in degrees, SAT scores
  • most psychological measures; Likert scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ratio data

A

Interval data with true zero.
most physical measures- height, weight,
speed, distance, volume, area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Normal distribution

A

Bell shaped, symmetry around central tendencies
- most stat procedures in PSYC assume normally
distributed scores
- parametric stats are based on symmetrical
(normal) distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Characteristics of parametric distributions

A

-approximate symmetry
-the distribution can be divided into standard deviation units
-the size of the deviation can be mathematically
defined on any measure that is interval or ratio in nature (skew)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Skew

A

The degree of departure from symmetry
Positively skewed- most S’s fall on the L side; tail skews right.
Negatively skewed- most S’s fall on the R side; tail skews left
Bimodal- 2 areas of the curve at equal frequencies with a dip in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Variance

A

The variation of or differences among people in a distribution across the measure X

  • arises from natural, random differences among Ss
  • environmental variations
  • measurement error
  • researcher error (overt, covert)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Percentile ranks and how to calculate

A

Percentile ranks- the percentage of scores that
fall below particular score within distribution
Calculate:

  • divide number of cases below the score of interest by total number of cases in the group
  • multiply results by 100
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Standard scores

A

z-scores: raw scores that are converted to fixed mean and standard deviation
-score measured in SD units (the deviation of a score from the mean in SD units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Calculating a z-score

A
  • Find difference between observed score and mean for the distribution
  • Divide difference by SD of distribution

Mean exam score= 11.05 (SD is 7.01)
For a score of 14, z score is .42

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Norms

A

Allow for evaluation of one’s performance relative to a larger group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Norm-referenced tests

A

-each test taker’s performance evaluated against standardized sample
-typically used for the purpose of making comparisons
with a larger group
-norms should be current, relevant, and representative
of the group to which the individual is being compared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Criterion-referenced tests

A

-represent predetermined level of performance to be reached (“benchmarks”)
-scores are compared to a preset “criterion score” (not
compared to others)
-No Child Left Behind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Correlations v. regression

A

Correlation assesses the magnitude and direction of a relationship. Regression is used to make predictions about scores on one variable from knowledge of scores on another variable. These predictions are obtained from the regression line (line of best fit).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Correlation coefficient (r)

A

-strength of association between variables
-Ranges between -1.0 and +1.0
-Calculating correlation between 2 variables for entire group, not 1 individual
-Reflects the amount of variability that is shared between 2 variables
+/- .10: weak, +/- .30: moderate, +/- .50: strong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

p-value

A

Indicates whether the association is greater than what would be accepted by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

shared variance (r2)

A

Common variance, effect size, coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Correlation does not equal causation

A
  1. Mediating variables may explain the relationship
  2. Relationships can be bidirectional (thus both would be causal)
  3. Causality can be inferred only under experimental manipulations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Experimental conditions

A

Experiments:

  1. random assignment of participants
  2. manipulation of at least one independent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Coefficient of determination

A

Correlation coefficient squared and then converted into a percentage; indicates effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Coefficient of alienation

A

A measure of nonassociation between two variables. subtract r2 from 1 (where r is the coefficient of determination)

39
Q

Statistical significance

A

p

40
Q

Reliability

A

refers to the accuracy, dependability, consistency, or repeatability of test results

41
Q

Classical test theory

A

-Assumes each person has a true score (T) that
would be obtained with no errors in measurement
-Because measure instruments are imperfect, the observed score (X) for each person almost always differs from person’s true ability
-Difference between observed and true score = measurement error (E)
-T (true score) = X (observed score) - E (measurement error)
-Major assumptions- errors are obtained randomly and are normally distributed
- cannot be eliminated
-some error is systematic

42
Q

Standard error of measurement

A

Provides estimate of how much individual’s score would be expected to change on re-testing with same/equivalent form of test

  • Avg the scores over the infinite number of tests, the average of scores is considered an estimate of the true ability/knowledge (T true score). The standard deviation of all those scores= SEM
  • creates a confidence band within which a person’s true score would be expected to fall
43
Q

Domain sampling method

A

Instead of testing your ability to spell every possible word, we select a random sample of words.

T- % correct in spelling all words in English language
X- % correct in spelling all words in sample

  • As sample gets larger (the closer T and X are), reliability increases and error decreases
  • Because we do not know T:
  • Calculate the correlation between all sampling times (Xs)
  • Correlations are then averaged to predict T
44
Q

Item response theory

A

Newer method & more preferred to the CTT, the
IRT instead uses an alerting method to assess ability
-Test increases in difficulty if get previous Q right
-Test decreases in difficulty if get previous Q wrong
-Level of “ideal” ability is heavily sampled

Overall result is a more reliable estimate of ability

45
Q

Measurement error affecting reliability

A
  1. questionable measurement precision
  2. item sampling
  3. construction of test items
  4. factors related to test environment
  5. varying judgments or beliefs of raters/observers
  6. scoring of the test (objectivity of evaluator)
  7. difficult of the test
  8. factors related to test-taker
46
Q

Measures to assess reliability

A
  • test-retest
  • parallel forms (ideal but rarely used)
  • internal consistency reliability (single test, most frequently used- IRT)
47
Q

Test-retest

A

The same test is administered to the same person at different points in time.
-also called time sampling method
-only useful when assessing stable traits
Reduce carryover or practice effects
-The interval between measurement must be considered:
-shorter intervals -> higher carryover
-Be careful of developmental milestones

48
Q

Parallel forms

A

Compares scores on two different measures of the same quality
-also called equivalent or alternate forms method

A rigorous assessment of reliability

  • carryover effects are eliminated
  • greater sampling of domain
  • Generally underutilized
    • difficult to get people “back in the door”
49
Q

Internal consistency

A

Extent to which different items on a test measure the same attribute or trait. Scores from 2 halves are correlated with each other

50
Q

Methods to assess internal consistency

A
  • split-half
  • KR20 (Kuder & Richardson)
  • Cronbach alpha (coefficient alpha)
51
Q

Split-half reliability

A

One test is split into two equal halves
Each half is compared to the other
-can be split randomly, first/second halves, or odd/even
The Spearman-Brown formula is used to correct for half-length and increases the estimate of reliability

52
Q

KR20 reliability

A

-Simultaneously considers all possible ways of splitting methods (avoids problems of split-half methods)
-Only appropriate for tests in which items are
dichotomous (0 - incorrect/ 1- correct)
-finds the proportion of people who got each item right v. wrong

53
Q

Coefficient alpha

A

Cronbach alpha: considered to be the most general and rigorous formula for determining reliability estimate through internal consistency
-can be used on Likert scales, when items can’t be classified as “right” or “wrong”

54
Q

Inter rater reliability

A

Measure of reliability in behavioral observation studies

  • code a behavior from observational or behavioral study- compare degree of overlap among different observers
  • Start with an ethogram- operational definitions of variables
55
Q

Kappa statistic

A

indicates actual agreement as corrected by level of chance agreement among different raters
1.0= perfect agreement between observers

56
Q

Reliability coefficients

A

Range from 0.0-1.0
1.0= perfect reliability
=.90, then 10% of variation in scores attributable to measurement error

.90 and above= test highly reliable
.70 - .89 = moderate

57
Q

Validity

A

Extent to which a test measure the quality it purports to measure
-test is accurately reflecting whatever construct, trait, or characteristic that it claims to measure

Evidence for validity comes from showing the association between the test and other variables.

58
Q

Face validity

A
  • Based on logical v. stat. analysis

- The appearance that a test measures what it purports to at a surface level

59
Q

Content validity

A

Evidence that the content of a test adequately represents the conceptual domain it is designed to cover

  • test items are a fair sample of the total potential content and relevant to construct being tested
  • based on logical analysis v. statistical analysis
  1. Construct underrepresentation: failure to capture important components of a construct
  2. Construct irrelevant variance- scores are influenced by factors irrelevant to the construct
60
Q

Can a test be content valid without being face valid?

A
  • depression measures

- child abuse queries

61
Q

Criterion validity

A

Extent to which a test corresponds with a particular criterion (standard against which test is compared)
- typically used when objective is to predict future performance on an unknown criterion

examples:
pre marital test marriage success
SAT college freshman GPA

62
Q

Sub classes of criterion validity

A

Predictive- test or measure predicts future performance/success in relation to a particular criterion . Correlation (r) to describe extent to which 1 variable is predictive
SAT -> success in college

Concurrent- -concurrent measure is taken at same time as test
-correlation (r) to describe extent to which 1 variable correlates with another at the same time
work samples -> job performance

63
Q

Validity coefficient

A

Relationship between a test and a criterion-
usually Pearson r
-tells the extent to which the test is valid for
making statements about the criterion

Less consensus regarding size of VCs

  • coefficients of .60 or higher are rare
  • .30 - .40 considered to be acceptable
  • even tests with lower validity coefficients can yield useful information
    • link between cholesterol and heart disease is quite low, but importance predictive consequences for reducing mortality rates
64
Q

Construct validity

A

Process used to establish meaning of a test through a series of studies

  • simultaneously define a construct and develop tests to measure it
  • look for correlation between the test and other measures
65
Q

Convergent evidence for construct validity

A

evidence that a test measures the same attribute as do other measures that purport
to measure the same construct
- tests should correlate well (highly) if believed to measure same construct
-what measures should a new depression measure/Health Index/reading ability test correlate with?

66
Q

Discriminant evidence for construct validity

A

evidence that a test measures something different from what other available tests measure
-test would not correlate with unrelated tests

67
Q

Incremental validity

A

Measure of unique information gained through
using a test
-how much does information from test add to what is already known?
-how well does it improve the accuracy of decisions?
-based on logical analysis vs. statistical analysis

68
Q

Test item writing

A
  • define clearly what you want to measure (operational defintion)
  • Generate an item pool (more items than you will end up including)
  • Avoid long/difficult Qs
  • Avoid items that convey 2 or more ideas
  • Consider making positively and negatively worded items
  • Be mindful of diversity
69
Q

Item format- dichotomous

A
  • 2 alternatives for each item

- overall less reliable and therefore less precise

70
Q

Item format- Likert

A
  • rating scale with a continuum of alternatives to indicate agreement
  • may or may not contain a neutral point
  • is open to factor analysis
71
Q

Item format- polytomous

A
  • multiple alternatives for each item
  • probability of selecting correct answer by chance is lower
  • diminishing returns
72
Q

Item format- category

A
  • rating system typically using more alternatives (1-10)
  • heavily context dependent (reduces validity)
  • diminishing returns
73
Q

Item analysis

A

General term for a set of methods used to evaluate test items

74
Q

Item difficulty

A
  • asks what percent got item right
  • usually want ID to fall between chance level and 100% (usually .30-.70)
  • if 84% get #1 correct, ID is .84.
  • the higher the number, the easier the item
75
Q

Calculating item difficulty

A

Calculating optimal item difficulty:

  1. Subtract chance from 100% success (1.0)
  2. Divide by 2
  3. Add this value to chance

If chance is .25 (4 alternatives):

  1. (1.0-.25) / 2 = .75/2 =.375
  2. .375 + .25 = .625
76
Q

Item discriminability

A

Determines whether people who have done well on a particular item have also done well on the entire test

77
Q

The Extreme Group method

A
  • type of item analysis
  • compares those who did well to those who did poorly
  • calculation of a discrimination index - find the difference in the proportion of people in each group who got each item correct
  • Higher Discrimination Index= Higher Discriminability
78
Q

Point Biserial method

A
  • type of item analysis
  • Correlation between a dichotomous and a continuous variable (individual item versus overall test score)
  • Is less useful on tests with fewer items
  • point biserial correlations closer to 1.0 indicate better questions
79
Q

Limitations of item analysis

A
  • test analysis can tell us about the quality of a test, but it doesn’t help students learn
  • Purposes of tests are varied, and may emphasize ranking students over identifying weaknesses or gaps in knowledge
  • If teachers feel they need to “teach to the test” the outcomes of a test may be misleading and indicate more mastery than actually exists
80
Q

Relationship between examiner and test taker

A
  • role of feedback (type of feedback given to test taker)
  • role of race and gender of tester on test taker
  • role of language of the test taker (tests are highly linguistic)
81
Q

2 types of stereotype threat

A
  • anxiety over how one will be evaluated and how well s/he will perform
  • for members of a stereotyped group, pressure to disconfirm negative stereotypes
82
Q

Stereotype threat hypotheses

A
  • STT depletes working memory
  • STT leads to reduced effort and, in turn, reduced performance
  • STT causes physiological arousal that can disrupt performance
83
Q

Response acquiesance

A

respondents give response they perceive to be expected

84
Q

Expectancy effects

A

(Rosenthal effects):

  • can influence what interviewer expects out of interviewee
  • told a child is “smart” or “bad” ahead of time
  • giving examinee “benefit of the doubt” because he/she is pleasant
85
Q

Subject effects

A

(Hawthorne effects):

  • can influence what subject expects out of test/interview
  • may act in accordance with those expectations
86
Q

Empirical findings about manner in which tests is administered

A
  • the less personalized the modality, the more likely information is to be disclosed
  • will disclose even more when confidentiality of responses is ensured
87
Q

Advantages of computerized administration

A
  • responses automatically recorded (reduces error)
  • standardization ensured
  • precisely timed responses
  • examiner bias controlled
88
Q

Structured interview

A
  • specific set of Qs

- standardized- Qs are printed use exact phrasing

89
Q

Unstructured interview

A
  • use transitional phrases or playback/restatement/summarizing/clarifying/understanding statements
  • goal is to lead to elaboration by interviewee within minimum effort by interviewer to maintain the flow
90
Q

Clinical interview v. assessment interview

A

Clinical interview is used when you will likely be seeing the client moving forward in therapy whereas an assessment interview is conducted for the purpose of gathering information to answer a referral question ie “does this child have ASD?”
-Assessment interview, you are more likely to use standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources

91
Q

Interview validity

A
  • Seek convergent or even divergent validity
    • Correlate interview data with other measures (GPA, job performance, etc.)
  • Usually moderate validity coefficients (.40)
92
Q

Errors that bias interview validity

A
  • early impressions “stick” even if evidence to the contrary emerges
  • One prominent characteristic of interviewee biases interviewer’s judgments
  • misunderstanding of cultural differences
93
Q

Interview reliability

A

-interviewer reliability coefficients are quite variable
-Unstructured interviews have the lowest reliability, though they may lead to fairer outcomes than other asessment tools
-Interviews vary in their standardization-they can focus on different areas of importance
Structured interviews provide higher reliability estimates
-don’t provide as much or as varied information as unstructured or semi-structured interviews