439 midterm Flashcards

1
Q

population & parameter

A
  • pop: the entire set of things of interest
  • par: A property or number descriptive of the population (a fixed number, but in practice, we do not know its value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sample & statistic

A
  • sample: A part of the population. Typically, this provides the data that we will examine to gather information
  • stat/estimate: A property or number that describes a sample (use a statistic to estimate an unknown
    parameter)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

descriptive & inferential statistics

A
  • descriptive: Summarize/describe the properties of samples (or populations when they are completely known)
  • inferential: Draw conclusions/make inferences about the properties of populations from sample data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

types of variables

A
  • nominal (classifies/identifies objects, can be dichotomous or multi-categorical) and ordinal (ranking data): categorical (discrete/qualitative)
  • interval (rating data with equal distances) and ratio (Special kind of interval scale with a meaningful zero point): continuous (numerical/quantitative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

univariate and multivariate

A
  • uni: one DV, can have multiple IVs (linear, logistic regression)
  • multi: multiple DVs regardless of the number of IVs (dimension reduction, cluster analysis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

normal distribution

A
  • Y is continuous and normally distributed in the population
  • mean = median = mode
  • Y~ N(mean, SD)
  • 68% of scores within 1SD of the mean, 95% within 2SDs, 99.7% within 3SDs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

z (standard) score

A
  • we can convert Y scores to z scores that follow the standard normal distribution (z ~ N(0,1))
  • deviation of a sample score from population mean divided by the standard deviation of the population (to limit the deviation)
  • to determine how extreme any score is based on standard normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

types of statistical inference

A
  • significance tests (computing a p value)
  • confidence intervals
  • both types are based on sampling distributions of statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sampling distribution of statistics

A
  • the distribution of the values taken by the statistic in all possible samples of size N from the same population
  • distribution of the statistic values (like a mean) in all possible sample of size N from the same population
  • all these possible samples will follow a normal distribution if your population is also normal
  • if the pop is not normally distributed, we use the central limit theorem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

central limit theorem

A
  • as N increases, the distribution of a sample mean becomes closer to a normal distribution. This is true no matter how the population is distributed, as long as it has mean μ and standard deviation σ
  • X (mean of sample) ~ N(μ, σ/√N)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Null Hypothesis Significance Testing

A
  1. State the null and alternative hypotheses (assumptions made about a parameter, we test the hypothesis of no effect if we think there is an effect)
  2. Calculate the value of an appropriate test statistic (how far are the data from the null–for one parameter in the null, we use a t-test, for more than one we use a F-test, for frequency distributions we use chi-square)
  3. Find the p-value for the observed data (the test statistic value).
  4. State a conclusion (significance level alpha decides the area of extreme scores that would be unlikely if the null is true, the cutoff value of the statistic based on alpha is the critical value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a p-value

A
  • a conditional probability (probability based on the condition that the null is true)
  • how likely the test stat you computed is if the null is true
  • the smaller the value, the less compatibility between your data and the null (support for the alternative)
  • if the p-value is smaller than the alpha level, there is a statistically significant effect and we reject the null)
  • just because we have a statistically significant effect doesn’t mean it’s meaningful–a p-value is highly affected by sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

effect sizes

A
  • magnitude of a treatment effect
  • PEarson’s r, correlation squared (R2), cohen’s d, omega or omega squared
  • small effect: r = 0.1, r2 = 0.01, d = 0.25
  • medium: r = 0.3, r2 = 0.06, d = 0.5
  • large effect: r = 0.5, r2 = 0.15, d = 0.8
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

types of errors in NHST

A
  • Type I: reject the null when it is true (false positive) = alpha level
  • Type II: fail to reject the null when it is false (false negative) = beta level
  • 1 - beta = probability of correctly rejecting a false null
  • alpha and beta are related to each other (increase alpha = increase power and Type I error rate = decrease beta and Type II error rate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

z-test

A
  • purpose: to test whether a sample mean differs from a population mean
  • assumption 1: the population is normally distributed
  • assumption 2: the population’s SD is known
  • assumption 3: independence of observations (simple random sample)
  • if your z-score is greater (in absolute value) that 1.96, we reject the null at alpha = .05
  • limitation: knowing the SD of the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

t-test

A
  • used when we don’t know the pop SD, but want to test whether a sample mean differs from a population mean
  • assumptions: population is normal, independence of observations
  • you use the sample SD in the formula, so the sample doesn’t follow a normal distribution, but the t distribution
  • t distribution changes shape based on N (with a large enough N (df>30), the t distribution approximates the standard normal distribution, so you can use critical value 1.96)
  • t distribution has fatter tails than normal, but still bell-shaped and symmetrical
  • if t is large, either the numerator is large or the denominator is small (large N = large t statistic even if your numerator is small)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

important p-value information

A
  • a p-value doesn’t give you any information about the null
  • it means that under the null (condition), you have this likelihood of getting a test statistic this extreme given your sample data
  • = prob(your data if H0 is true) NOT prob(H0 is true given your data)
  • you’re not testing whether the null is true
  • p-value doesn’t give any information about the size or importance of your effect
  • p-values indicate how incompatible your data are with a statistical model as long as the underlying assumptions hold
  • p-values should not be the deciding factor in making a conclusion, should not be reported selectively
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

p-hacking

A
  • trying different statistical methods until p<.05
  • conducting multiple tests for subsets of samples or controlling or different covariates
  • collecting more data until p<.05
  • excluding some observations
  • dropping off one of the conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

scatterplot

A
  • displays the form (linear, nonlinear), direction (positive, negative), strength (weak, strong) of a relationship between two quantitative variables measured on the same individual
  • we usually need a numerical measure to supplement the graph (a correlation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

correlation

A
  • measures the
    direction and strength of the linear
    relationship between two quantitative
    variables (Pearson correlation coefficient r)
  • a correlation treats both variables as equals (shows a symmetric linear relationship)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

pearson r

A
  • standardized covariance
  • covariance indicates the degree to which to variables vary together, but is not meaningful because it is state dependent
  • so standardized covariance is between -1 and 1 so we can compare the relationships between variables (divide by standard deviations of the variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

covariance

A
  • the degree to which X and Y vary together
  • positive Cov = moving in the same direction, negative = moving in opposite directions, 0 = independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

when should we not use r?

A
  • if two variables have a nonlinear relationship (there could be a nnonlinear association, but r won’t pick up on it)
  • if observations aren’t independent (there will be existing correlations between them)
  • if there are outliers (very sensitive to outliers, will pull the line)
  • if homoscedasticity is violated (if one variable has unequal variability across the range of the other variable)
  • if the sample size is very small (N=3-6)
  • if both variables are not continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

other correlation coefficients

A
  • point-biserial: binary & continuous variables
  • phi coefficient: two binary variables
  • Spearman rank order: two ordinal variables (like Judges’ ranks)
  • Kendall’s tau: two ordinal variables (but N is small and there are many tied ranks)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

why could two variables be correlated

A
  • by chance
  • two variables could be mutually affecting each other (price and demand)
  • relationship could be driven by an underlying cause (a confounder)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

lurking factors

A
  • potential causes for a relationship that aren’t measured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

partial correlation

A

relationship between 2 variables after removing the influence of another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

statistical test for a correlation coefficient

A
  • H0: p = 0, H1: p =/= 0
  • use a t-test (one parameter in the null) when two variables are jointly normal (bivariate normality test Shapiro-Wilk)
  • t = (sample stat - pop parameter)/sample SD (s/sqrtN)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

linear regression

A
  • used if we have a directional hypothesis (how X affects Y), can show an asymmetric linear relationship between predictor and outcome variables
  • the effect of X on Y is beta (regression coefficient/slope)
  • there will also be error that accounts for some variation in Y (not just X)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

simple linear regression

A
  • only one X (how DV changes when IV changes)
  • assumption of linearity
  • used as a mathematical model summarizing the relationship by fitting a straight line/regression line (Ŷ) to the data that predicts values of Y based on X (Ŷ = alpha + X(beta))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

interpret alpha and beta for simple linear regression

A
  • alpha: intercept (average value of Y when X = 0)
  • beta: slope (amount by which Y changes on average when X changes by one unit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

method of least squares

A
  • finding the best fitting regression line
  • minimizing the vertical distance between a data point and the line (minimizing the residuals Yi - Ŷi)
  • compute all the residuals for all data points, square them, sum the squares
  • we square the residuals to avoid the negative and positive residuals cancelling out to zero
  • this method is problematic when you have outliers because squaring large values makes them larger
  • B = cov(x,Y)/var(X)
  • alpha = mean of Y - estimate of B(mean of X)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

statistical significance of the slope

A
  • H0: B=0, H1: B=/=0
  • use a t-test
  • assumptions of normality and independence
  • df = N-2 (two variables)
  • t = (stat-parameter)/(SE(B))
  • if t>CV or p<.05, the slope is significantly different from 0 which suggests a significant effect of X on Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

partitioning of variance in simple linear regression for getting the F ratio

A
  • SS(Total) = SS(Regression) + SS(Error)
  • SS(Regression): The variation in Y explained by the regression line.
  • SS(Error): The variation in Y unexplained by the regression line (residuals).
  • df(Reg) # of IVs in the model (=1)
  • df(E) = N-2
  • divide SS by corresponding df = MS
  • divide MS(Reg)/MS(E) = F
  • compare F ratio to F distribution using df(reg) and df(E)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

how are t and F related in simple linear regression

A
  • they’re testing the same thing ONLY in simple linear reg
  • t^2 = F
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

goodness-of-fit of the regression model

A
  • coefficient of determination (R2)
  • proportion of variation in Y accounted for by the model (R2 = SS(Reg)/SS(T)
  • ranges between 0 and 1
  • in simple linear reg, r = sqrt(R2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

simple linear regression in JASP

A
  • model summary gives R2
  • ANOVA is the significance of regression coefficients
  • coefficients: unstandardized are the estimates (divide them by SE = t)
  • in simple linear regression, both ANOVA table and t values in coefficients are testing the same thing so you get the same conclusion about whether to reject the null
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

multiple linear regression

A
  • one DV with multiple IVs (which IV matters most for the DV?)
  • Describes how the DV (Y) changes as
    multiple IVs (XJ) (J ≥ 2) change
  • results in 3D data points on a graph (along a regression plane)
  • equation of regression plane: Ŷ = alpha + B1X1 + B2X2 + … + BjXj
  • IVs can be continuous or discrete (will change the interpretation)
  • alpha: Average value of Y when Xj = 0
  • beta: Amount by which Y changes on average when Xj changes by one unit, holding the other IVs constant (partialled out/controlled).
  • we can still use least squares method to estimate the intercept and slope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

standardized regression coefficients

A
  • effect of a standardized IV on the standardized DV (z-scores)
  • change in the standard deviation of the DV that results from a change of one standard deviation in Xj, holding the other IVs constant
  • Used to compare the effects of IVs on the DV, when the IVs are measured in different units of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

partitioning variance in multiple linear regression

A
  • SS(T) = SS(Reg) + SS(E)
  • df(Reg) = J (number of IVs)
  • df(E) = N-J-1
  • MS(Reg)/MS(E) = F
41
Q

significance testing multiple linear regression

A
  • H0: B1 = B2 = … = Bj, H1: not all Bs are equal to 0
  • if we reject the null, we test individual coefficients with individual t-tests to which is different from zero (H0: Bj = 0, H1: Bj =/= 0)
42
Q

goodness-of-fit in multiple linear regression

A
  • r2: proportion of the total variation in Y accounted for by the regression model (=SS(Reg)/SS(T))
  • also called the squared multiple correlation (SMC)
  • ranges from 0 (no explanation) to 1 (perfect)
  • the F statistic can also be calculated based on R2
  • by adding another IV to the model, SS(Error) will always decrease (even if it’s negligeable decrease), so SS(Reg) will become larger and R2 will increase
  • it is impossible for R2 to decrease when you add an IV
43
Q

adjusted R2

A
  • If no substantial increase in R2 is obtained by adding a new IV, adjusted R2 tends to decrease (it accounts for the # of IVs in the model)
  • can be greater than 1 and smaller than 0, so difficult to interpret
  • this index prefers a simpler model when different models provide similar explanatory value for Y (which is also easier to interpret in practice)
  • only use it when comparing different model; the best one (statistically) will have the largest R2
44
Q

sample size guidelines for linear regression

A
  • 10-15 cases per IV
  • N ≥ 104 + J for individual IVs
45
Q

assumptions for linear regression

A
  1. The relationship between X and Y is linear (linearity).
  2. The error term ‘e’ follows a normal distribution with mean zero and constant variance (σ2) (normality & homoscedasticity)
  3. The error terms of observations are not related to each other (independence)
  4. There are no outliers
  5. There are no high correlations among IVs (no multicollinearity)
46
Q

Assessing normality in LR

A
  • descriptive statistics (sample skewness and kurtosis ~ 0, if >0 = positive skewness or kurtosis, if <0 = negative)
  • skewness stat / SE(skewness) = t(skewness)
  • kurtosis stat / SE(kurtosis) = t(kurtosis)
  • if t(skew or kurtosis) > 3.2 = violation of normality
  • Shapiro-Wilk test for normality assess skewness and kurtosis together (if significant = suggests that the sample may not come from a normal distribution, but very easy to give significant results with a large sample size)
  • normal quantile plot: sort observations from smallest to largest, find their z scores, plot observations against corresponding z scores (if the data are normal, the points lie close to a straight line)
47
Q

assumption of homoscedasticity

A
  • homoscedasticity: (“equal scatter”) is true if all the error terms have the same variance.
  • Heteroscedasticity (“unequal scatter”) indicates the violation of this assumption.
  • If homoscedasticity is violated, the variances (or standard deviations) of regression coefficient estimates tend to be underestimated (t ratios tend to be inflated)
48
Q

checking linearity and homoscedasticity

A
  • Residual plot: plot residuals vs. predicted Y values (often uses standardized residuals)
  • if plot shows an arc shape = nonlinear
  • if plot shows a coin shape = heteroscedasticity
  • if data is randomly distributed = good
49
Q

independence assumption

A
  • Autocorrelation: correlation between error terms ordered in time (time-series data) or space (cross-sectional data)
  • time-series correlation: repeated measures of the same variable are correlated (IQ scores over time)
  • spatial correlation: similar observations from the same area (sample from the same class = dependency)
  • if independence is violated, variances of regression coefficients tend to be underestimated
50
Q

Assessing independence

A
  • Durbin-Watson test of autocorrelation
  • a value around 2 (1.5<d<2.5) indicates the assumption is satisfied
  • d<1 or d>3 are cause for concern
  • d = 4 (perfect negative correlation)
  • d = 0 (perfect positive correlation)
51
Q

outliers assumption

A
  • first check data are coded properly
  • A data point represents an outlier if it is disconnected from the rest of the distribution (z > 3.3)
  • if there is a concern, run LR with and without the data point to see if it has any influence on regression analysis
  • Cook’s distance measures the influence of a data point on the regression equation
  • Cook’s D > 1 is a cause for concern
  • Cook’s D > 4 is a serious outlier
52
Q

multicollinearity assumption

A
  • a situation in which two or more IVs are highly correlated (e.g., r ≥ |.9|)
  • consequences: unstable regression coefficient estimates (large variance of estimates, lower t ratios), large SE = small t stat = large p value = more Type II errors
  • high R2 (or significant F) but no significant t ratios
  • unexpected signs (positive or negative) of regression coefficients
  • matrix inversion problem
53
Q

checking multicollinearity

A
  • check correlations between IVs
  • tolerance: 1 - R2 (R2 for the regression of each IV on the other IVs, ignoring the DV. The higher the intercorrelation of the IVs, the closer the tolerance is to zero)
  • tolerance < 0.1 is a problem
  • variance inflation factor (VIF) = 1/tolerance (VIF > 10 is a problem)
  • condition index: measure of dependency of one variable on the others
  • condition indices are computed as the
    square roots of the ratios of the largest
    eigenvalue to each successive eigenvalue
  • with 3 IVs and 1 DV = 4 condition indices, where the first = 1 and they ascend until the last largest condition number
  • any condition index > 30 is a problem
  • condition number < 100 is fine
  • condition number > 1000 is serious multicollinearity
54
Q

dealing with non-normality

A
  • data transformation: sqrt(Y) for weak skew, log(y) for mild skew, 1/y for strong correction
  • only done on Y
  • prioritize the transformation that addresses kurtosis
  • resampling methods: Jackknife or bootstrap
55
Q

dealing with non-linearity

A
  • data transformation
  • add another IV (x1^2)
  • nonlinear methods
56
Q

dealing with heterscedasticity

A
  • data transformation
  • other estimation methods (weighted least squares)
  • other regression methods
57
Q

dealing with dependence

A
  • data transformation
  • generalized least squares
  • other regressions
58
Q

dealing with outliers

A
  • remove them if Cook’s d >1
  • robust regression (sum of residuals in absolute value instead of least squares)
59
Q

dealing with multicollinearity

A
  • drop a variable; look at standardized coefficients and remove the one that is least related to Y
  • create a composite variable: sum the variables, find their mean, or their difference (the difference may be more interpretable)
  • other regression models (machine learning)
60
Q

dealing with nonlinearity

A
  • polynomial regression
  • Y = alpha + B1X1 + B2X^2 + e (second degree polynomial, adds one curve to the regression line for a quadratic shape)
  • Y = alpha + B1X1 + B2X^2 + B3X^3 (third degree polynomial, adds two curves to the regression line)
  • adding a polynomial will cause R2 to increase because it fits the model better (use adjusted R2 to compare and see which is best)
61
Q

bootstrap method

A
  • when normality is violated
  • instead of using the sampling distribution of statistics to find the standard error (all possible samples of size N), the bootstrap method uses your actual data as a population (so requires no assumption of normality)
  • JASP takes 5000 samples of N observations from your sample (resampling with replacement) to find all the possible estimates of B, then takes SD of all B estimates which is SE
62
Q

ANOVA terminology

A
  • IV = factor (in ANOVA, always a nominal variable) with multiple response categories (levels = k)
  • DV is continuous
  • single factor designs (1 IV = one-way ANOVA)
  • factorial designs (more than one IV = two-way, three-way ANOVA)
  • between subject: subjects only belong to one group (independent-group)
  • within-subject: subjects belong to all groups (repeated-measures)
  • ANOVA used to test effects of IV on DV (same objective as LR) by comparing means
  • H0: u1 = u2 = … = uk
  • H1: not all uk are the same
63
Q

ANOVA assumptions

A
  • populations follow a normal distribution
  • homoscedasticity (homogeneity of variance in the populations)
  • independence of observations
64
Q

why do we need the ANOVA assumptions

A
  • normal distributions are defined by a mean and standard deviation
  • assumption of equal variances means that the standard deviation is fixed, so the only way the populations can differ is by the mean (so if we reject the null, the groups are different)
  • they must be representative samples (independence of observations) so that they’re representative of their populations
65
Q

ANOVAs with k=2 and k=3

A
  • if k=2, H0: u1 = u2, we can do a t-test (t=(X1-X2) - (u1 - u2) / SE(X1-X2)), if X1-X2 increases, t increases, p decreases (reject the null)
  • if k=3, H0: u1 = u2 = u3, we compute the variances of sample means as a measure of distance
  • (k=3) smaller variance from the grand mean = group means are closer together = fail to reject the mean, larger variance (further away from the grand mean) = sample means are very different from each other = reject mean
  • (k=3), compute variance between-groups and variance within-groups (residual) to get an F ratio and compare to the F dsitribution
66
Q

how does ANOVA work

A
  1. divides the variance observed in data into different parts resulting from different sources;
  2. assesses the relative magnitude of the different parts of variance (F ratio)
  3. examines whether a particular part of the variance is greater than expectation under the null hypothesis
    - two sources of variance: between-groups (due to different treatments/levels of a factor across groups) and within-groups (random fluctuations of subjects within each group)
67
Q

F distribution

A
  • varies in shape according to df(B), df(W) - df(B) for the numerator is always given first
  • right-skewed
  • find the critical value according to df(B) and df(W) and the alpha level - if F > CV we reject the null
68
Q

effect sizes for ANOVA

A
  • cohen’s d (standardized mean difference)
  • eta squared (n2): ratio of variance explained in the DV by one or more IVs, which is equivalent for R2 in LR (a biased effect size that estimates the amount of variance explained based on the sample, and not based on the population)
  • omega squared (w2) which is an unbiased version of n2
  • 0.01 is small, 0.06 is medium, 0.14 is large (w2, n2)
69
Q

follow-up tests for ANOVA

A
  • F-test gives a global effect of an IV on a DV (omnibus or overall test)
  • to see which pairs of means are different, we do post hoc (a posteriori/unplanned) comparisons IF three or more means were compared in F-test
  • Tukey’s HSD is the simplest and most accurate (especially when sample sizes are equal and homogeneity of variances is met)
70
Q

ANOVA. vs. LR

A
  • they have the same objective
  • ANOVA can be viewed as a special case of LR with nominal IVs with multiple levels
  • if a nominal variable has more than 2 levels, it cannot be used as it is in linear regression (needs to be transformed into dummy variables/dummy-coded)
  • LR with dummy variables is possible, but it’s usually better to use ANOVA because it has fewer assumptions (but with complex designs like four-way ANOVA or nominal+continuous IVs, it’s better to run LR)
71
Q

dummy coding

A
  • assignment of binary values (0 or 1) to represent membership in each level of a nominal variable
  • expresses group membership of observations using zeroes and ones
  • number of dummy variables is always k-1 (if you have three levels of a nominal variable, you have 2 dummy variables)
72
Q

steps of dummy coding

A

Step 1: Create K – 1 new variables as dummy variables, where K = number of levels/groups.
Step 2: Choose one of K groups as a baseline (a group against which all other groups will be compared). Usually, this is a control group.
Step 3: Assign the baseline group values of 0 for all dummy variables.
Step 4: For the kth dummy variable (k = 1, …, K-1), assign the value 1 to the kth group. Assign all other groups 0 for this variable.
- observations in group 1 will have the value 1 for X1, the value 0 for X2
- observations in group 2 will have the value 0 for X1, the value 1 for X2
- observations in group 3 will have the value 0 for both X1 and X2

73
Q

LR with dummy variables

A
  • equation: Ŷi =α+β1X1i +β2X2i
  • intercept: mean of Y for group 3 (baseline group), X1 = X2 = 0
  • B1: difference in Y between the means of G1 and G3
  • B2: difference in Y between the means of G2 and G3
  • H0: B1 = B2 = 0 (and B1 = u1-u3 = 0, B2 = u2-u3 = 0, so H0: u1 = u2 = u3) - both LR and ANOVA are testing the same thing
74
Q

extraneous variables

A
  • random assignment avoids systematic bias, but leaves individual differences uncontrolled (subjects may not be well-matched)
  • these extraneous variables (age, sex, etc.) can still affect the DV
  • extraneous variables in ANCOVA are covariates or concomitant variablees
  • ANCOVA allows you to compare group means AND control covariates at once
75
Q

adjusted means

A
  • differences found in ANCOVA while controlling covariates (mean after eliminating effect of covariate)
76
Q

ANCOVA in a LR context

A
  • Y= α + β1X1+ β2X2+ β3Z + e, where X1 and X2 are dummy variables, and Z is the covariate
  • alpha: mean level of Y only for the baseline group (X1=X2=0) and controlling for covariate (z=0)
  • B1: adjusted mean difference between group 1 and baseline group (difference in means controlling for covariate)
  • B2: adjusted mean difference between group 2 and baseline
  • B3: effect of covariate on DV controlling for other IVs (X1 and X2)
  • B1 and B2 only control for z, not other IVs (because X1 and X2 are not correlated and cannot overlap, so B1 doesn’t need to control for X2 and B2 doesn’t need to control for X1)
77
Q

adjusted means

A
  • B1 and B2 are adjusted mean differences
  • adjusted means are individual linear regression models computed based on the same value of covariate (so they can be compared)
  • adjusted means are regression lines with different intercepts but the same slope
78
Q

partitioning variance in ANCOVA

A
  • linear regression analysis which includes both nominal (dummy-coded) and continuous variables as IVs
  • SS(Reg): The variation in Y explained by the regression model - df(Reg) = J
  • SS(E): The variation in Y unexplained by the regression model - df(E) = N-J-1
  • SS(Reg) comes from two sources: SS(Group) and SS(Covariate)
  • SS(G): the effects of groups/treatments, group mean differences (H0: u1A = u2A = … = ukA) - df(G) = J-P(#covariates)
  • SS(CV): effects of covariates (Bcv = 0 or Bcv1 = Bcv2 = … = Bcvk) - df(CV) = P(# covariates)
  • BUT SS(G) + SS(CV) =/= SS(Reg) due to potential correlations between group and covariates
79
Q

ANCOVA assumptions

A
  • same assumptions as for ANOVA and Linear Regression
  • also the assumption of homogeneity of regression slopes across difference groups (H0: Bz1 = Bz2 = … = Bzk), the effect of Z on Y for each group must be equal across groups (the intercepts can still be different)
  • equal slopes assumption means there is no interaction between IV and covariate (JASP table with interaction effect must be nonsignificant)
  • ANCOVA only looks at main effects of IVs, not interaction effects (it requires a lack of interaction)
80
Q

interaction effect

A
  • The extent to which the effect of one factor
    depends on the level of the other factor.
  • An interaction is present when the effect of one factor on the DV changes at the different levels of the other factor.
  • The presence of an interaction indicates that the main effects alone do not fully describe the outcome of a factorial experiment.
  • can be graphed in a cell means plot for two-way ANOVA (if lines do not have the same slope, there is an interaction, but if lines are parallel there is no interaction)
  • same interpretation in LR: If one predictor’s effect on the dependent variable depends on another predictor, we say that the two predictors interact, or that there is an interaction effect between them on the dependent variable (instead of assuming that a regression coefficient is independent of another regression coefficient)
81
Q

interaction in LR

A
  • The effect of a predictor (X1) on Y depends on the value of another predictor (X2), or the effect is moderated by X2
  • X1 is the focal predictor and X2 is the moderator
  • 3 types: interaction between 2 continuous predictors, interaction between 1 nominal and 1 continuous predictor (the nominal predictor can be binary or multi-categorical), or interaction between 2 nominal variables (ANOVA)
82
Q

interaction between two continuous predictors regression equation

A
  • Y = alpha + B1X1 + B2X2 + B3X1X2
  • product term to represent the interaction effect
  • rephrased as Y = alpha + B2X2 + X1(B1 + B3X2)
  • = B0 + B1*X1
  • B1* = B1 + B3X2, which is a simple linear regression equation (B1 is the intercept and B3 is the slope) to denote how the slope of X1 (the effect of X1 on Y; B1*) changes based on a 1-unit increase in X2 (the moderator)
  • if B3 is 0, the effect of the focal variable X1 doesn’t change across values of X2, so no interaction
  • if B3 =/= 0, the effect of X1 on Y depends on the value of X2 (moderator)
83
Q

interpret the coefficients in the regression equation for an interaction between two continuous predictors

A
  • Y = alpha + B1X1 + B2X2 + B3X1X2
  • alpha: average value of Y when all IVs = 0 (so the product term is also 0)
  • B1: average change in Y when X1 increases by one unit, controlling for X2 (X2=0) *the product term also includes X1, so B1 cannot be interpreted as a main effect
  • B2: effect of X2 on Y when X1=0 (average change in Y when X2 increases by one unit when X1 = 0)
  • B3: average change in the slope of X1 when you increase one unit of X2 (the moderator), this is the difference between the slopes
  • the value of B1* (B1 + B3X2) at each value of X2 is called a simple/conditional effect/slope of X1 (the effect of X1 on Y for a particular value of X2)
84
Q

interaction between continuous predictors: inference

A
  • H0: B3 = 0 (no interaction effect, the slopes are equal)
  • H1: B3 =/= 0 (interaction effect present)
  • use a t-test (one parameter)
  • rejection of H0 suggests that the effect of X1 depends on X2
85
Q

covariates in an interaction between two continuous predictors

A
  • Y = alpha + B1X1 + B2X2 + B3X1X2 + B4X3
  • alpha, B1, B2, B3 all have the same interpretation, with the added phrase “controlling for X3” (holding the covariates constant)
  • B4: effect of X3 on Y controlling for all the other IVs (because X3 doesn’t interact with other variables)
  • you can add more covariates to expand the regression equation with B5X4, B6X5, etc.
86
Q

interaction between binary and continuous predictors

A
  • effect of X1 on Y may differ between two response categories in X2
  • same regression equation: Y = alpha + B1X1 + B2X2 + B3X1X2
87
Q

interpret the coefficients in the equation for an interaction between binary and continuous predictors

A
  • Y = alpha + B1X1 + B2X2 + B3X1X2
  • in this example, X2 is the binary moderator
  • alpha: average value of Y when X1 = X2 = 0 (so product term is also 0) ONLY for the baseline group (coded with 0)
  • B1: effect of X1 on Y when X2 = 0 only for the baseline group (average change in Y when you increase one unit of X1 when X2 is 0 for the 0 group)
  • B2: effect of X2 (binary) on Y when X1 = 0 (the mean difference in y between the two groups when X1 is 0) - the difference in between intercepts on a graph
  • B3: difference in the slopes for X1 between the two groups (change in the effect of X1 on average when X2 increases by one unit) - the difference in slopes on a graph (if it’s 0, no interaction; if it’s not 0, there is an interaction)
88
Q

interaction between a multicategorical nominal predictor and a continuous predictor

A
  • if X2 is a three-category predictor with two dummy-coded variables D1 and D2 (group 3 is the baseline)
  • Y = alpha + B1X1 + B2D1 + B3D2 + B4X1D1 + B5X1D2
89
Q

interpret the coefficients for an equation for an interaction between multicategorical nominal and continuous variables

A
  • Y = alpha + B1X1 + B2D1 + B3D2 + B4X1D1 + B5X1D2 (where D1 and D2 are dummy-coded, so now can be considered binary, X1 is the focal IV, B4 and B5 are interaction effect product terms)
  • alpha: average of Y for group 3 (baseline; D1=D2=0) when X1 = 0
  • B1: effect of X1 on Y for group 3 (baseline) when D1=D2=0 (average change in Y when you increase one unit of X1 only for the baseline group)
  • B2: mean difference in Y between group 1 and baseline group when X1=0 (effect of D1 on Y when X1=0) - difference in intercepts between groups 1 & 3
  • B3: mean difference in Y between group 2 and baseline when X1=0 (effect of D2 on Y when X1=0) - difference in intercepts between groups 2 & 3
  • B4: difference in slopes for X1 between group 1 and baseline
  • B5: difference in slopes for X1 between group 2 and baseline
  • if either B4 OR B5 =/= 0, there is an interaction effect
  • if BOTH B4 AND B5 = 0, there is no interaction
90
Q

hierarchical principle

A
  • whenever you have an interaction term, you also need to include the main effects (even if they’re not statistically significant)
  • X1X2 is typically correlated with X1 and X2, and excluding X1 and/or X2 tends to alter the meaning of the interaction
91
Q

mean-centering

A
  • widely believed that we should never test the interaction between X1 and X2 by including their product term without first mean-centering both predictors
  • instead of B1X1, it would be B1(X1-mean of X1), same for other coefficients
  • believed that since X1X2 is highly correlated with X1 and/or X2 (multicollinearity problem), mean-centering will alleviate this problem (increasing the tolerance of the product and lowering the standard error of the regression coefficient of the product term), but IT CANNOT
  • mean-centering will increase the tolerance, but will also change the variance of the product term (these two cancel each other out, so SE will still be large)
92
Q

why do we mean-center

A
  • it doesn’t work for addressing multicollinearity
  • we only consider doing it when variables are continuous
  • it can be helpful to interpret regression coefficients (When we mean-center X1 and X2, the means of the centered X1 and X2 are zero. Then, β1 indicates the effect of X1 on Y among those average on X2. β2 indicates the effect of X2 on Y among those average on X1)
93
Q

mediation analysis in LR

A
  • used to quantify pathways of influence or the process by which an independent variable can influence a dependent variable
94
Q

types of effects in mediation analysis

A
  • direct effects: influence of one variable on another that is not mediated by any other variables. Regression coefficient estimates indicate direct effects
  • indirect effects: Influence of a variable mediated by at least one intervening variable. They are estimated as the product of direct effects
  • total effects: Direct effect + Indirect effect
95
Q

simple mediation model

A
  • X1 affects Y: direct effect (B2)
  • X2 affects Y: direct effect (B3)
  • X1 affects X2: direct effect (B4)
  • X1 affects Y through X2: indirect effect (B3B4, multiply these coefficients)
  • total effect of X1 on Y = B2 + B3B4
  • if you add covariates X3 and X4 (which also affect Y), the interpretations remain the same with the added phrase “holding X3 and X4 constant”
96
Q

Sobel test

A
  • we may test the statistical significance of unstandardized indirect effects with a single mediator based on the Sobel test (in large samples only)
  • Sobel test: method for estimating standard error of B3B4, then the ratio B3B4/SE(B3B4) is approximated as a z statistic (only for large samples)
  • Sobel test requires that B ~ Normal (B3~Normal and B4~Normal, but their product term isn’t normal), so this test is based on a false assumption
  • Sobel test tends to be lower in power
97
Q

inference about indirect effects

A
  • bootstrap and Monte Carlo are better ways to test statistical significance using confidence intervals because they don’t make assumptions about the distribution of the indirect effect
98
Q

JASP mediation analysis

A
  • predictor: X1
  • mediator: X2
  • outcome: Y
  • background confounder: covariates
  • in all tables, make sure the CI is based on bootstrap method, then check whether 0 is included in the 95% CI
  • if 0 is included, the effect is not statistically significant (if 0 is excluded, the effect is significant)
  • z and p in the tables will be based on the Sobel test