Lecture 5 Flashcards
is it possible that your population mean is not within your sample confidence interval?
yes.
in general, you would be highly unlikely to know the true population mean.
is the width of the confidence intervals of all samples in a population the same everywhere?
width of interval may not be the same for all samples in a population, because the population standard deviation is always unknown
why in multiple regression, the corresponding value for each IV is not the same as the one in simple regression?
because in multiple regression, the correlation among IVs in their relationship to the DV is partialled out (removed)
in multiple regression equation, each IV’s regression coefficient is independent of the other IV
what does intercept indicate in a regression model?
it is the predicted value of DV when an individual scores 0 on the IV
0 cancels out the regression coefficient found in each IV
what does it mean when you use an unbiased interval estimator
means that the actual coverage rate will be 95% over the long run
what does it mean when you use a biased interval estimator
means that the actual coverage rate would be SMALLER or LARGER than the nominal rate over the long run (eg: 89% or 96%)
what does it mean when you use a consistent interval estimator
means that the actual coverage will get closer to 95% over the long run as sample size get larger
what does nominal level of coverage rate refers to
it refers to the 95% in the description of a confidence interval
what does multiple R-squared = 0.47 mean
means that 47% of all variation is explained by IV 1 and IV 2
if sample size is 100, what is the df?
97
df= sample size - numerator df (no of IV) - 1
4 things we need in order to calculate R2
- estimated value (of R2)
- numerator df (no of IVs)
- denominator df (n - no of IVs -1)
- desired level of confidence (0.95)
what does it mean if your R2 and adjusted R2 are different
means that the R2 estimate is biased
adj r2 is not unbiased but it’s less biased than r2 estimate
what affects the possible difference between R2 and adjusted R2?
- the no. of IVs (eg: 2 IVs vs 20 IVs –> many IVs = greater obs r2, adj r2 would be smaller)
- the sample size (smaller sample = obs r2 larger, adj r2 smaller)
many IVs + large sample size = larger difference between obs r2 and adj r2
2 ways to make regression coefficient directly comparable
- standardised partial regression coefficient
- use squared semi-partial correlation
how does standardised partial regression coefficient work
- all in standard deviation unit
- transform all obs scores to to z scores (all standardised ot have sd of 1)
how do you interpret standardised regression coefficient?
same as reporting unstandardised regression coefficient, but need to acknowledge what the scaling of one unit change represent (which is in standard deviation unit )
what does it mean when you have standardised regression coefficient of 0.74 in IV1
means that 1 sd increase in IV1 would result in 0.74 sd increase in DV, holding constant IV2
what does it mean when you have standardised regression coefficient of -0.13 in IV2
means that for 1 sd increase of IV2, there would be a decrease of 0.13 in DV, holding constant IV1
what is semi-partial correlation
semipartial correlation is the correlation of DV and IV1 when the correlation between IV1 and IV2 has been removed
why is squared semi-partial correlation useful?
- it indicates direct correspondence to R2
- it also indicates the unique proportion of variation in the DV explained by each IV
assumption of linear regression
- independence of observations
- linearity
- constant variance of residuals (homoscedasticity)
- normality
what does independence of observations mean?
one person’s scores is independent to the others’
what does linearity mean?
scores on dv are additive linear function of scores on the set of iv
what does homoscedasticity mean?
variance of residuals is the same for each score on each iv
what does normality mean?
well-modeled by a normal distribution
qqplot –> within the ‘window’ ???
how to make sure independence of observation is met?
- don’t duplicate scores (to make bigger sample)
- as long as responses on one variable do not determine the responses on the other
scatter plot with LOESS line –> how to tell if non linearity is present?
red dotted line straight or not straight?
in multiple regression equation, each coefficient for each IV is called ……..
partial regression coefficient
how do you get values of intercept and regression coefficients in multiple linear regression equation?
by the method of Ordinary Least Squares (whereas intercept and regression model is estimated in such a way that would minimise the Sum of Squared residuals)
What is a residual score in a linear regression model?
That part of the observed scores on the dependent variable not being explained by the regression model.
which summary characteristics are of most interest to us in analysing data using a linear regression model with two independent variables
- The overall strength of prediction of the model (indicated by the size of the R-squared statistic)
- the relative strength of prediction of each independent variable (indicated by the standardised regression coefficient
Why might prediction be viewed as indicating an asymmetric relationship?
one variable is defined to have a different function and role in the relationship to any other variables
what defines an important deviation score that is central to multiple linear regression?
The predicted Y score deviating from the observed Y score.
residual scores can be either positive or negative in value, true or false?
true
what is R-squared statistic
It is a measure of the strength of prediction in the regression model.
what does R-squared value of 0 mean
no prediction
What is the difference between an unstandardised partial regression coefficient and a standardised partial regression coefficient?
A standardised partial regression coefficient is estimated in a multiple linear regression model using z scores rather than observed scores.
- NOT simple linear regression
If a standardised regression coefficient is ‒0.25, what does it mean?
decrease in IV1 by 1 sd unit leads to increase of 0.25 unit at DV, holding constant IV2
Why is using a straight line function different to displaying a two-dimensional scatterplot of real data ?
Real data would (almost certainly) not form a straight line of individual data values.
If real data formed a straight line in a two-dimensional scatterplot, then this implies that the values of X variable are each perfectly predicting the values of the Y variables.