Research Methods III Flashcards

1
Q

Covariance

A
  • Reflects the degree to which 2 variables vary together.
  • Relationship between two continuous variables in original RAW (unstandardized) scale.
  • Scale dependent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sum of Squares

A

Compute deviations for X & Y and E (cannot be a negative value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sum of Products

A

Compute product of xy deviations and E (error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation

A

Standardized (z-score) measure of linear relationship between 2 continuous variables.
- Standardized.
- Z-score.
- Scale invariant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Fisher z-test

A

Testing 2 independent sample correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Effect of r

A
  • Pearson correlation
  • Provides a measure of effect size due to being based on standardized scores.
    +/- 1 = small effect, +/-3 = medium effect, +/-5 = large effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression

A

The statistical technique to produce the best straight line to predict Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression equation

A

Yi = bo + biXi + Ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Yi

A

Dependent or outcome variable, criterion variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Xi

A

Independent variable, predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

bi

A

Regression coefficient for the predictor.
Gradient (slope).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

bo

A

y intercept
value of y when x = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ei

A

the errors in prediction based on the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assumptions of Regression:
Linearity

A

Based on linear correlations, assumes linear bivariate relationship between each x and y, and also between y and predicted y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Assumptions of regression:
Normality

A

Normally distributed, both univariate and multivariate distributions of residuals.
Y scores are independent and normally distributed (Shapiro-Wilk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Assumptions of regression:
Independence of scores

A

Independence of Y (outcome: DV) scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Assumptions of regression:
Independence of errors

A

Errors (residuals) from observations should not be correlated with each other (Durbin-Watson test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Assumptions of regression: Minimal multicollinearity

A

Predictors (IVs) should not be highly correlated with each other.
No higher than r = .80 for predictors.
Want Variance Inflation Factor (VIF) to be less than 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Assumptions of regression:
Homoscedasticity

A

Variance of residuals are uniform for all values of Y (test with Levene’s test) assumed with the sample size is large. Cannot be assumed if sample size is small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ordinary Least Squares regression (OLS)

A
  • Yields values for b-weights (regression coefficients) and the y-intercept that will result in the sum of the squared residuals being at the minimum (smallest).
    Best fitting line = smallest total error.
    Resulting regression line = least-square error solution.

B-Weights + y-intercept = SS Residuals at minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Partially standardized regression coefficient

A

Regression coefficient predicting Y from X.
Only standardized on x, not y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The Regression Model & Sum of Squares

A

Squaring each of the deviations and summing across observations yields SS for each source of variability of y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

ANOVA to test the Regression Model:
Regression

A

Variability in y that can be explained by the predictor(s) – represents the component of Y that is shared with x1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

ANOVA to test the Regression Model:
Residual

A

Variability in Y that cannot be explained by the predictor – simply what is ‘left over’ after accounting for X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

t-test to test the Regression coefficient
b-cofficients

A

unstandardized (raw) regression coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

t-test to test the Regression coefficient:
B(beta) standardized (z-score) regression coefficients

A

One standard deviation increase in X results in an expected change of beta standard deviation units in Y.

Increase in X, change in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Suppression

A

A second predictor variable (X2) that is unrelated to Y (dependent variable) raises the amount of variance explained by the first predictor by eliminating certain irrelevant aspects of the first predictor (X1).
X2 suppresses some of the “error” or “irrelevant” variance in X1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Classical suppression

A

r = 0, but beta = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Surprising Suppression

A

beta > r
Surprising because the Beta values increased from both X1 and X2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Partial Correlation

A

Correlation between 1 DV and 1 IV variable (Y and X1) with two or more variables (e.g., X2, X3) partialed from BOTH DV and the first IV variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Semi-partial Correlation

A

The part correlation has variable 2 ONLY partial out of predictor 1. It is the correlation of Y with that part of X1 which is independent of X2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Coding

A

Regression framework is flexible.
Categorical or nominal independent variables can be used in Multiple Regression/Correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Vectors

A

g - 1
Represent df of IVs
All coding systems come up with the same correlations, BUT they produce different regression equations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Dummy Coding

A

Os and 1s
Representation of a variable consisting of g categories by creating g - 1 variables (vectors) for which each g - 1 categories is coded 1 on a single variable while the remaining categories are coded 0 on these variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Regression = …

A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Contrast Coding

A

Identification of specific comparisons of interest & assigning values that enable the treatments to be directly compared.
In a two group design, one group is assigned +1 and the other group is assigned -1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Effects Coding

A

A method of coding categorical variables in which each group is compared to the weighted or unweighted mean of all the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Effects Coding:
Unweighted effects coding

A

Used when you wat to compare the mean of a particular group with the grand mean, regardless of proportions in population or in sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Effects coding:
The group coded -1 is known as…

A

The reference group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Effects coding:
The group coded 1 may be referred to as…

A

A coded group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

bo = ?

A

Grand mean across all groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

b1 = ?

A

Difference between the group mean and the grand mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Dummy Coding b-weight

A

Indicate differences between conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Contrast Coding b-weight

A

Used to calculate difference between conditions compared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Effect Coding b-weight

A

Indicates difference between Ya & Yt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Dummy Coding y-intercept

A

Y-bar when all X’s are 0; the mean of the reference group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Contrast Coding y-intercept

A

Y-bar; mean of all the groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Effect Coding y-intercept

A

Y-bar T; mean of all the groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Ordinal regression

A

When range is sufficient (i.e., approximately) > 6ish, treat as continuous; if not, trat as nominal.
Only need 1 DS if treated as continuous.
The DF is larger (k -1) if treated as nominal, which can reduce power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Power of regression predictor type from highest to lowest:

A

Continuous, ordinal, nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Interaction

A

The effect of 1 IV on DV changes based on level of another IV.
If each factor has 2 levels (or 2 groups), only one vector is needed to differentiate the 2 factors, in Multiple Regression (MRC).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How to code for interaciton

A

The code for the interaction is simply the multiplication of Vector A and Vector B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Correlation when there is no IV effect

A

Correlation is different from 0 when there is an IV effect because the means for both treatment groups are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

R-squared for overall regression

A

Sum all the R-squareds for the main effect of A, the main effect of B and the interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

R-squared is analogous to Sum of Squares

A

SS is used for ANOVA and R-squared is analogous to Mean Square.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Is it a different or same result in MR (multiple regression) as in ANOVA?

A

Same!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Multicollinearity problem if you don’t center

A

Remove any shared variance between the interaction (A x B) variable & the independent variables that make up the interaction term.
Correlations for the variables X & Z with Y do not change when you center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

The individuals predictors’ correlations with the interaction….

A

…does change to 0 when you center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Centering corrects for multicollinearity and…

A

…avoid accounting for parts of Y more than once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

By centering, the regression coefficient of the individuals predictors are…

A

…the main effect of the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

In regression equations without interaction terms…

A

The y-intercepts are different bu the b-weights are exactly the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

In regression equations with interaction terms…

A

The y-intercepts and b-weights are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Interpreting the plots for possible interactions between 2 continuous variables

A

If lines cross there is an interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

MR Formula:

A

Y = bo + b1 x X1 + b2 x X2 +b3 x X1 x X2 + ey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

b1

A

Regression coefficient specific to when the value of predictor X2 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

b2

A

Regression coefficient specific to when the value of predictor X1 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

b3

A

The interaction’s predictive effect.
Describes how b1 and b2 change as a function of X2 and X1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

The value of b1 changes by b3 units….

A

For every one unit increase in X2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

The value of b2 changes by b3 units…

A

For every one unit increase in X1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Interpreting the coefficient for the interaction term (b3):
The tobs statistic for b3 provides…

A

A NHT (null hypothesis test) for X1 & X2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Interpreting the coefficient for the interaction term (b3):
If p(tobs)

A

Reject Ho & conclude the magnitude of b1 depends on the level of X2 and that b2 depends on X1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Interpreting the coefficient for the interaction term (b3):
If p(tobs) > .05, we…

A

Fail to reject Ho & conclude the magnitudes of b1 & b3 are constant across all values of X2 & X1 – usually droped x1 x x2 to improve precision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Simple Slope analysis

A

Interpret the simple-slope plots to see if the coefficient for the interaction term is positive, 0, or negative.

74
Q

Non-linear (curvilinear) regression

A

Squaring the predictor

75
Q

Looking at the graphs to see if the quadratic predictor has a positive or negative effect on the regression line:

A

If the lines are curving down, it is negative, and if it is curving up, it is positive.

76
Q

Proportion & probability of a nominal dependent variable

A

For a dichotomous variable Y coded [ 0, 1 ]

77
Q

The mean as a proportion for a nominal variable

A

Can’t do multiple linear regression (OLS) with nominal DV because it violates normality and homoscedastic assumption.

78
Q

OLS or linear least squares

A

A method for estimating the unknown parameters in a linear regression model, based on the smallest sum of the squared errors between the regression line and the raw Y scores.

79
Q

Assumptions that do not need to be met in logistic regression:

A
  • Linearity.
  • Homoscedasticity.
  • Normality.
80
Q

Assumptions of logistic regression:

A

Dependent variable should be measured on a dichotomous scale.
Have one or more independent variables (ICs), which can be continuous or discrete.
Independence of observations.
Dependent variable should have mutually exclusive and exhaustive categories.
Independence of errors.

81
Q

Other assumptions of logistic regression:

A

Little to no multicollinearity.
Linear relationship between any continuous independent variables (IVs) and the logit transformation of the dependent variable.
Large sample sizes.
Needs at least 10 cases per independent variable, some recommended at least 30 cases.
No outliers.

82
Q

Latent Response Variable (y*)

A

a variable that cannot be directly measured or observed
one approach to conceptualizing Logistic Regression (LogReg)

83
Q

The Logistic Curve

A

Relates the independent variable, x, to the rolling mean of the DV

84
Q

Estimating the b weights via maximum likelihood

A

b-coefficient for continuous normal y as in, linear regression were estimated using ‘least squares’
no possible for dichotomous Y (logistic model)
Rely on Maximum Likelihood Estimation (MLE)

85
Q

Odds Ratio

A

Exponential of B
Indicator of the change in odds resulting in a unit of change in the predictor.

86
Q

Better measures of R-squared for Logistic Regression
Cox & Snell’s R-squared

A

common measure of “effect size”; measures how well the logistic regression model fits the data in predicting the dependent variable

87
Q

Better measures of R-squared for Logistic Regression:
Nagelkerke’s R-squared

A

This is an adjusted version of Cox & Snell R-square that adjusts the scale of the statistic to cover the full range from 0 to 1.
Chi-square test of independence to statistically test the logistic regression.

88
Q

Null deviance

A

Represents the deviance of the model that contains no predictors other than the constant.

89
Q

Residual deviance

A

Represents the deviance of the model. This deviance should be LESS than the null deviance because the lower value represented BETTER accuracy.

90
Q

Why need a LR with just the y-intercept (Step 0)?

A

Want to test if against the other models within predictors to see if they are better models.

91
Q

Regression to the Mean (RtM)

A

A statistical phenomenon that occurs whenever you have a nonrandom sample from a population and two measures that are imperfectly correlated.

92
Q

Characteristics of Regression to the Mean

A
  • Statistical phenomenon, can occur because sample is not random.
  • Group phenomenon.
  • Happens between any two variables.
  • Relative phenomenon.
  • You can have regression to the mean occur going up and down.
  • The more extreme the sample group, the greater the regression to the mean.
  • The less correlated the two variables, the greater the regression to mean.
93
Q

The Problem of RtM in Repeated Measures Design

A
  • In a between-groups scenario TRUE random sampling & assignment ensures that RtM effects are equivalet across conditions.
  • Some studies sample ‘special’ populations and/or non-randomly assign subjects to conditions.
94
Q

Pair-link diagram

A

The x-axis is suppressed and two X axes are constructed.

95
Q

Galton squeeze diagram

A
  • It is similar to the pair link plot but standardized variables are used & the Y axis represents the means on the post test for each score of Y.
96
Q

The problem of repeated measures t-test to study change…

A

They don’t keep into account baseline.

97
Q

The problem of using change scores to study change…

A

They don’t keep into account baseline.

98
Q

ANCOVA as an alternative to study change

A

A sequential regression model that examines the treatment effect while controlling for pretest scores. Covary out baseline scores.
In a repeat measures design, the covariate is the pre-test or baseline scores. It examines post-test scores while controlling for pre-test.

99
Q

Lord’s paradox

A

Raises the issue of when it is appropriate to control from baseline status. In three papers, Frederic Lord noted that different results obtain if researchers adjust for pre-existing differences.

Depending on the data analyses you run, you can end up with completely different results.

100
Q

The common problem with t-test & ANCOVA in studying change

A

What they don’t take into account is that within a group, individuals have different baselines and can have different levels of change.

101
Q

The 3 ways to analyze repeated measures data (longitudinal data):

A
  • Change scores
  • General Linear Model (GLM; aka repeated measures ANOVA or ANCOVA)
  • Mixed Effects Model
102
Q

Regression to the model problem in ANCOVA

A

This approach attempts to model differences in Ts ‘controlling for’ a person’s T1 score. Ironically, whenever groups are not exactly equivalent at T1, this approach actually introduces RtM.

103
Q

Long or short format for mixed effects modeling?

A

Long format.

104
Q

Residual covariance

A

What’s left after accounting for average group effects.

105
Q

Advantages of using a Mixed Effects Model:
Missing Data

A

This default approach to missing data in nearly all statistical packages is Listwise Deletion, which drops any observation with any missing data on any variable involved in the analysis. If the percentage missing is small & the missing data are a random sample of the data set, this is a reasonable approach.

106
Q

Advantages of using a Mixed Effects Model:
Post hoc tests

A

Because of the way the Sum of Squares are calculated in the multivariate approach, post-hoc tests are not available for repeated measures factors. They are available, however, using the mixed model.

107
Q

Advantages of using a Mixed Effects Model:
Flexibility in treating time as continuous

A

Depending on the design of the study rather than consider time as four categories, it can be more accurate to treat time as a continuous variable. This allows you to model a regression line for time, rather than estimate four means.

108
Q

Advantages of using a Mixed Effects Model:
A single dependent variable can be used in other analyses.

A

A study have two-factor (2 x 4) repeated measures design to see if the impact of these two factors on an outcome was mediated by a third variable. Each subject has eight values of the mediator (one for each of the conditions) and eight values on the final outcome.

109
Q

Advantages of using a Mixed Effects Model:
Easier to build into larger mixed models

A

In a study examining schools and teacher, we can have a cluster of children within teachers. If so, we would need to include teacher as another level in the mixed model. Changing from a 2 to a 3 level model is simple to do if the model is already set up as a mixed model.

110
Q

Fixed effects

A

Population or Group average effects.

111
Q

Random effects

A

Deviations (of the individuals in a group) from the population (group) average effects.

112
Q

Interpreting the mixed effects model:
Y-hat i = bo + b1 x1 + uo + u1zi + Ei (standardized s-score formula)

A
  • fixed intercept = the population average of the y-intercept
113
Q

uo

A

the deviations from the population average of the y-intercept effect

114
Q

ui

A

the deviations of the population average slope for each of the subjects

115
Q

Maximum Likelihood (ML) Estimation

A
  • RAW ML (full Information Maximum Likelihood; FIML)
  • Assumes missing data are Missing at Random
  • Generally biased data downward (that can result in negative variances)
116
Q

Best Linear Unbiased Estimator (BLUE)

A

For the coefficients that you get from the model.
Used to compare nesting, to compare various different structures that we fit to our random effects.
When comparing fixed effects models to each other, we cannot use residual MLs. Need to use Raw ML.

117
Q

Random intercepts

A

For each cluster, a different intercept is allowed to exist.
uo represents the deviations from the population average intercept.

118
Q

Random slopes

A

The slope will be something varying within that level of the cluster.
So the variable “u1” is exactly the same as x variable in that ui represents the deviation of the population average slope for each of the subjects.
Random slopes should vary across the subjects.
Slope is allowed to be different between clusters.

119
Q

Nesting

A

Random slopes are generally nested within a random intercept.
There could be multiple random slopes within an intercept.
Random intercepts can be nested within each other.

120
Q

fixed regression intercept (bo)

A

the Y score when all predictors are zero

121
Q

fixed effect of being ‘at a given time’ (b1 x T2_dum)

A

this is the within subjects factor (change overtime)

122
Q

fixed effect of being in a given condition at a given time (b3 x Ts_dum x MBSR_dum)

A

This is the interaction between the within subjects factor (time) and between subjects factor (treatment group)

123
Q

estress_it

A

random person & time-specific error/residual

124
Q

Covariance matrix

A

A raw score version of a correlation matrix where the diagonal are measuring the relationships between the same variable (time point).

125
Q

Residual covariance matrix

A

The relationships between variances of time points (random effect variance) after taking out any variance due to fixed effect s (Group Average Main Effects, Interactions, & Y-intercept).

126
Q

Unstructured covariance matrix

A

Not imposing any constraints on the values.

127
Q

Auto regressive covariance matrix

A

In the growth study dataset, for example, the response variable of each subject is measured at various ages.
We may suspect that the residual error terms within a subject are correlated.
A reasonable choice of the residual error covariance will therefore be a block diagonal matrix, where each block is a first-order autoregression (ARI) covariance matrix.

128
Q

Reading BIC to determine which MEM model is better

A

If BIC is smaller, we have improved the model.

129
Q

Examining interactions using graphs:

A

If lines cross, there is an interaction.

130
Q

Number of vectors needed to code nominal variables in MEM regression

A

Need g-1 vectors.

131
Q

Slicing by 1 IV to look at the sample effects of another IV

A

Testing for all possible simple effects of Time at different levels of the Condition.

132
Q

Pairwise Comparisons and problem of Experimentwise alpha error

A

Pairwise comparison table shows the group comparisons at a particular level of treatment condition. If you try to interpret all of these group comparisons, you will need to worry about experimentwise alpha Type I error.

133
Q

How many vectors are needed for continuous IVs (predictors)?

A

Need 1 vector for continuous variables.

134
Q

In SPSS, “BY” is for…

A

Discrete IVs.

135
Q

In SPSS, “WITH” is for…

A

Continuous variables.

136
Q

Type III Fixed Effects

A

Testing main effects and interactions.

137
Q

Estimated table

A

Shows the means of all 3 treatment groups at each time point.

138
Q

Univariate tests

A

Simple effects table.

139
Q

Using Type III Tests of Fixed Effects to interpret main effects of IVs & the Interaction

A

If the p value is less than .05, then the IV is significant.

140
Q

Sources of missing data

A
  • Unfinished surveys
  • Experimenter-level error (in research design or implementation)
  • Participant being unavailable for a certain time or dropping out
  • Data entry error
141
Q

At what point does missing data become a problem?
What are the different opinions about this?

A

In OLS, if data is missing at one time for a participant all of that participant’s data must be deleted before analysis.
- Different opinions on how much missing data is alloweable (ex: 5% (Shafer, 1999); 10% (Behnet, 2001))

142
Q

Missing Completely at Random (MCAR)

A
  • Assumption that the probability of missing an observation does not depend on any variables.
  • No selection bias.
  • Events that lead to the data being missing are independent of observable & unobservable parameters of interest.
  • If data is truly MCAR, analysis can be done without bias – however, data is rarely MCAR.
143
Q

Missing at Random (MAR)

A
  • Assumption that the missing data are unrelated to true data value after accounting for other known characteristics of the subject.
  • Missingness is not random but can be accounted for by variables that can be added into analysis.
  • Ex: Highly depressed individuals may be more likely to drop out of an experiment or not finish their surveys because they are heavily depressed.
144
Q

Missing Not at Random (MNAR)

A
  • Missing data is due to unfixable selection bias.
  • Nonignorable nonresponse.
  • Leads to biased results.
145
Q

Using logistic Regression to test MAR

A

You can use logistic regression with the missing cores dummy coded as the dichotomous criterion or outcome variable (Missing score versus Not Missing score) and possible predictors.

146
Q

MEM/RIML to handle missing responses

A

Full Information Maximum Likelihood (FIML): Directly estimates parameters using ALL observed data for every case.

147
Q

Pros for FIML

A
  • Requires a single step for imputation & analysis.
  • Uses all available data even if some cases are missing data.
  • Produces unbiased standard error.
  • Can be used with smaller sample (N < 100)
148
Q

Cons for FIML

A

All variables related to the missing data need to be included in the analysis.

149
Q

EM (expectation maximization) algorithm

A

2- step iterative process. These 2 steps repeats until results converge (Successive iterations do not show different parameters)
1: Expectation: Uses parameter (initially base don complete-case data) to estimate values for missing data.
2: Maximization: Uses complete-case data& estimated values for missing data to estimate new model parameters.

150
Q

Pros for EM

A
  • Minimize bias in parameters so that larger samples yield less bias.
  • Ideal for exploratory & reliability analyses.
151
Q

Cons for EM

A
  • Initial estimates based on list-wise deletion (so it does not use all available data).
  • Biased standard errors (but bias reduces as sample size increases)
  • Less efficient than FIML for hypothesis testing.
152
Q

“Worst” MNAR scenario in GLM

A

Gives the worst estimate of the true mean.

153
Q

FIML in MEM is best in…

A

Estimating the group means in all missing types.

154
Q

Discrete/Ordinal Time

A

All of our prior statistical models treated the Time (week) variable as a discrete ordinal ‘factor.’

155
Q

Individual Growth Curves

A

Instead of predicting the average group change, the emphasis is now on trajectories of change that vary randomly across persons i.

156
Q

Time as a continuous predictor

A

Provides a potentially more powerful parameterization.

157
Q

Linear model
How many times points are needed?
How many curves in the model?

A

The model predicts the Y score (DV) based on a straight line.
Needs 3 time points and no curves.

158
Q

Quadratic model
How many times points are needed?
How many curves in the model?

A

This model predicts the Y score (DV) based on a line with 1 curve (or bend in the line). Needs 4 time points.

159
Q

Cubic Model
How many times points are needed?
How many curves in the model?

A

This model predicts the Y score (DV) based on a line with 2 curves. Needs 5 time points.

160
Q

Most things that happen to people…

A

(e.g. change in memory, change in height, change in anxiety, etc.) do not change constantly at the same rate. Linear is many times not realistic.

161
Q

Multilevel Models/Hierarchical models/Random coefficient models

A

Measurement occasion or time is a nested factor that is nested within a person. It is multilevel or clustered because the Person (Level 2) is a higher level than the measurement of the DV at different time points (level 1).

162
Q

Nesting

A

Growth curves are implicitly nested. E.g. Time nested in people.

163
Q

Spaghetti plot

A

In this graph, all 12 linear equations trajectory are plotted in the same graph. The lines represents linear ‘trajectories’ for each of the first 12 people shown previous database comparing CBT, MSBR, & both.

164
Q

Boi

A

average group y-intercept

165
Q

B1i

A

average group slope

166
Q

“i” in Bo and B1

A

deviation of individual in y-intercept & slope from the group y-intercept & slope

167
Q

y is “gamma”

A

fixed effects (intercept & slope) at the Pearson level or level 2

168
Q

Fixed Intercept

A

Yoo is average score across all groups

169
Q

Fixed Slope

A

Y10 average regression coefficient

170
Q

Uoi is the random intercept

A

Represents person-specific y-intercept deviations.

171
Q

U1i or any u that is not uoi is the random slope

A

represents person-specific slope deviations

172
Q

Residual variance (population mean squared)

A

A measure of what was not predicted by the MEM model, the deviation from the actual score from the predicted score.

173
Q

Level 1 (Occasion) Random Effects

A

Reflects the error or deviation of the individual subjects’ raw scores from predicted y-scores.

174
Q

Level 2 (Person) Random Effects

A

Reflects predictors based on individual y-itnercepts & slopes

175
Q

Fixed effects

A

This reflects the group average y-intercept & regression coefficient (slope)

176
Q

Unconditional linear growth curve ?

A

Linear growth curve with only Time as IV

177
Q

Residual (population mean squared)

A

unexplained variance (errors in prediction) at Level 1 (occasion)

178
Q

Level 1 Equation )Quadratic Model)

A

Time is squared in this equation.

179
Q

Using population mean-squared to comparing linear versus quadratic models

A

Smaller is better. It means that the MEM is doing a better job of fitting the data.

180
Q

Time-invariant covariates (TICs)

A

Predicting intercepts & trajectories.

181
Q

Conditional quadratic growth model

A

Include IVs called Time-Invariant covariates (TICs) other than Time. These IVs try to explain between group effects. They are “invariant” because the TIC code or value is the SAME across the rows of a subject.
For instance, treatment condition (CBT, MBSR, & BOTH) is a TIX, someone in the CDT group has the same code for all time points.