Class Notes Flashcards
if one level of IV is assigned via randomization
considered experimental – as consumer be keenly aware of which level is randomized – e.g., randomized on tx but not gender – 2 x 2 factorial design
any time you adminster the same measure to the same participants (to collect data on DV) at a later date (at any point – could be 5 minutes apart)
Repeated measures design
subsumed within repeated measures – generally speaking, typically more than 20 observations (e.g., clinical trials) no longer need 20 for statisical methods to hold (so need at least 10) – tend to occur in fixed intervals depends on what you’re collecting data on – logorythmic data collection - time period continues to get bigger (exponentially) between observations
Time-series design
Time-series design
subsumed within repeated measures – generally speaking, typically more than 20 observations (e.g., clinical trials) no longer need 20 for statisical methods to hold (so need at least 10) – tend to occur in fixed intervals depends on what you’re collecting data on – logorythmic data collection - time period continues to get bigger (exponentially) between observations
Difference between repeated measures and time series design
number of observations (collected data on DVs)
In non-equivalent group design one should
control with pre-test
Threats that concern ______ relate to statistical conclusion validity
integrity of treatment itself
Threats that concern ______ relate to internal validity
making comparisons between tx groups
What does randomization do?
“Ensure” that participant groups are equal prior to treatment
What does randomization not do?
Ensure anything that happens after treatment – possibility of history effects
Important issues to look for in time series designs
- change in intercept (level)
- slope
- stability of effect (continuous or discontinuous effect)
- delayed vs immediate effect (instantaneous vs delayed – when you see effects taking place)
Standard error is based on
sample of samples
What type of research is meta-analysis?
Ex-post facto
Assessment is defined as
- Overarching, sampling behavior
* In contrast to research when we assess people
Measurement is defined as
- Establishing quantitative rules for assigning numbers to represent attributes of persons
- Attributes of people, not to people
- Distinction between observations and inferences
- Must think: How representative is this of behaviors outside of this context?
Test, scales, and measures are defined as
Objective, quantitative measurement using standardized procedures; psychometric properties of scores essential
What are Rating protocols?
Taxonomies, classification and rating systems done by an observer (usually)
What is Evaluation?
Assessing the congruency between what is expected and actually occurs (formal to informal, may be quantitative) Chen, 1990
What is Clinical assessment?
Less formal, typically not fully standardized or quantitative
What is a Scale?
- often used interchangeably (not always) with measure, questionnaire or test
- Some say questionnaire is less formal
- Assumed to be assessing a single construct or domain
What is a construct?
Trait, domain, ability, latent variable, theta 0
What is theta?
Item Response Theory (IRT) uses this to talk about the construct itself – latent variable
What are the differing types of item responses?
- Dichotomous
- Polytomuos
- Graded responses
What are dichotomous item responses?
Two levels (true or false, yes or no, etc.)
What are polytomous item responses?
Three or more levels, often ordered but not always
What are graded item responses?
More than 2 ordered response options
All graded responses are polytomous but not all polytomous items are graded items
What is Classical Test Theory (CTT)?
- Total sums of squares partitioned into true score variance vs error score variance
- Partitioning variance
What is Modern Test Theory = Item Response Theory (IRT)?
Has to do with probability
- What’s the probability that someone will respond in a certain way?
- Not only assigning where the individual is on the construct
What is Standard Error of Measurement (SEM or SEm) ?
- Estimate of extent to which an observed score deviates from true score
- Create confidence interval
- Probability that an individual’s true score lies within a range
What is reliability?
- How consistently does a scale measure what it is designed to measure?
What are types of reliability?
- Test-retest
- Parallel forms
- Split halves
- Internal consistency
- Inter-rater
Parallel forms
- Administer once in different forms and see how they correlate
Split-halves
- See how one half of test correlates to other half within each participant
- But need twice as many items (think statistical power and N)
Three types of validity
- Content
- Criterion
- Construct
What is content validity?
- Extent to which items are representative of or sampled from content domain being measured
- Most important in tests in classes
What are limitations of content validity?
- Often non-statistical approach is used – ask panel of experts
- Often relies upon face validity - does this look like it tests what it’s supposed to test
What are two types of criterion validity?
- Predictive
2. Concurrent
What is predictive validity?
Before and after – look at outcomes
What is concurrent validity?
Measure all items at same time
What are limitations of concurrent validity?
- Often use 1-item scales
- Lack validity and reliability
- Extremely limited variance
What is construct validity?
- How well scores measure a specific trait or construct
* Requires a priori specification and operationalization of the construct
Steps in developing CTT test construction
- Qualitative research
- Explicate theorizing
- Define constructs
- Generate item pool
- Content validity study
- Derivation sample
- Cross-validation study
- Subsquent studies testing evidence of construct validity
What are limitations of CTT?
- Cannot change anything about the scale
- Standard error of measurement is presumed constant across levels of the construct, items, and scores
- Statistics are sample dependent
What are main characteristics of IRT?
- Probablisitc
- Statistical mathematical theory
- Scaling items as well as people on latent trait
- Wide variety of IRT models (dichotomous–more common– and polytomous)
What are basic assumptions of IRT?
- Well-known constructs and well-established tests
a. know dimensionality
b. know validity
c. know what’s correct and incorrect - extensive item banks and data banks
- large samples (>3000)
In contrast to assumptions of IRT, what is typical of CPY instrumentation?
- New scales
- Constructs are not well known or defined
a. confounds
b. unknown dimensionality
c. ordered rating scale data (likert) - non-existent item pool; no data banks
- small sample sizes (in best cases n = ~300-400)
What is measurement invariance?
Cross-cultural applications of tests, scales, and measures
4 questions to ask regarding measurement invariance
To what extent…
- Can a construct be conceptualized equivalently across cultures?
- is the same construct being measured equivalently across cultures?
- can mean scores be compared equivalently across cultures?
- can measures of association (correlation) be compared equivalently across cultures?)
Multiple regression
- Find lines that best fit the data instead of planes (as above)
- Linear composite of IVs to best explain DVs
- Combination of weights on IVs constitutes linear composites
Confirmatory Factor Analysis (CFA)
- forces the data into this model then assess how well the model fits the data
- Also assess how scales correlate with one another (Orthagonal or Oblique - allows correlation)
Exploratory Factor Analysis (EFA)
- Misapplication of statistical procedure for measurement development
- Analogous to doing “atheoretical” research
- No constraints on data
- See how many factors come out (“sem-magical”)
- interpret the factors that come out based on the data that they’re based on
- Do a CFA because we have implicit theorizing
Things to look for in articles relating to measurement
- Look for ceiling/floor effects (Determine possible range for scale or subscale. How many items, what’s the rating scale? If mean + or - 1 SD is highly skewed – No longer have adequate prediction of probability of error when you have skewed scores)
- Type 1 and Type 2 error rates are not protected
* “I adapted this test”
* Changed the rating scale, items, instructions, order, etc. – changed anything
* Computing cronbach’s alpha is insufficient to protect against these threats
* Different rating scales for same scale/sub-scale
Analysis of variance and multiple regression
are identical!
Multiple regression =
1 DV, always univariate
Multivariate =
multiple IVs and multiple DVs
Multiple regression in linear model
y = a + b1x1+b2x2+…bkXk + e
Multi-collinearity
cannot interpet the overlap of explained variance between variables in multiple regression
Part correlation
unique variance explained by one predictor in the model, controlling for other predictors in the model
If summing all zero order correlations and the value exceeds 1.0…
by definition you have profound multicolliniarity (x1 and x2 are highly correlated)
Moderation
- Answers for whom or when does this relation apply?
- Affects magnitude and/or strength of relation
- Easier to think about as high low, but better to use continuous
Interactions refer to moderation or mediation effects?
Moderator
If interaction term is significant…
main effects are meaningless
Mediation
*How or why a relation exists
Full mediation
Mediator variable completely accounts for relation between IV and DV
Partial mediation
Explains much of, but not all, of the relation between IV and DV (IV and DV path alone still exists but is nearer to non-significance)
MAXMINICON
- Maximize experimental variance
- Minimize error variance
- Control extraneous variance
MSH = SSH/dfH
Mean square variance (want to increase)
MSE = SSE/dfe
Mean square error (want to decrease)
Maximize experimental variance by
- Ensure maximum variability in Y due to X
2. Make treatments as different as possible, but realistic
Control extraneous variance a priori by
- Homogenize on the confounding variable (restriction of range)
- Match participants on all relevant conditions
- Randomly assign participants to treatment conditions
Control extraneous variance a priori or post hoc by
- Build a blocking variable into the design to control the confound
- Covary any confounding variable (analysis of covariance)
Minimize error variance by
- Block on any variable that is related to the DV but not related to the IV
- Covary on any variable that is related to the DV but not related to the IV
- Maximize the reliability of the measures used ( rtt)
- Increase the sample size (error and statistical power are a function of N)
- Use repeated measures designs instead of between groups designs