Sabinas lecture 1 to 7 Flashcards
Variance & SD
Variance & SD
sigma2 or V = the degree to which a variable ‘varies’ around its
mean V = sum (X-X)2/N-1 = SS/df
SD = squ root sigma 2 or V (in the same units, easier to interpret)
› Covariance
CoV = the degree to which two variables ‘vary’ simultaneously or co-vary
Note: the variance of a variable is… its covariance with itself.
Correlation
degree of linear
relationship between two variables and, essentially, it is a
standardised covariance
Continuous versus discrete variables
continuous and discrete (categorical)
Regression sum of sqares
Regression sum of sqares is about something we can predict. (1-R2) is what we can not predict.
how to work out t
b/ SEb = t
df Residual
df residual is proportion of the variable we cannot predict. N-K-1 predictors.
Confidence Intervals (CI)
b is an estimate of the population parameter. Ultimately, we want to know the true value of the regression coefficient. Having the CI helps to illustrate this idea (i.e., if we conducted this research 100 times, there is XX% chance that the true (yet unknown) slope is within the specified range of values) .
how to use CIs
If the range includes 0, then we can conclude that the findings are NOT n statistically significant, and vice versa.
› We can also use the CI to test whether the slope is different from a particular value (e.g., whether this slope is different from the one found in previous studies).
› SPSS does not calculate CI automatically
CI is sort of our parameter line. If I perform the experiment 100 times, this is the range I expect the B to be in.
Converting from b to β [in italics!]
ß = b x (SDx/SDy) b = ß x √(Vx/Vy)
If B is equal to zero
there is no equation. It is not important. It will still be featured in the regression equation (DO NOT TAKE IT OUT).
The most common null hyp is that b = zero. Slope is not different to zero, nothing systematic is happening.
It doesn’t have to be zero, the slope is stagnant. Is the new slope different to 1.5 or not? If the CI includes this, it is fine as a null hyp.
MR advantages
Can use both categorical and continuous independent
variables
› Can easily incorporate multiple independent variables
› Is appropriate for the analysis of experimental or
nonexperimental research
Factors Affecting the Results
of the Regression Equation
Sample size (N) The amount of scatter of points around the regression line [indexed by (Y-Y’)2 or SSresidual] = Other things being equal, the smaller SSresidual, the larger SSregression, and hence larger the F-ratio
›The range of values in the X variable,
indicated by (X-X)2
Assumptions Underlying MR (only a
glimpse now)
Dependent variable is a linear function of the IVs
- can be overlooked if one selects extreme cases of X… selection of only extreme cases can ‘force’ the regression to appear linear, even if it might be curvilinear for the X values. Bad practice…
› Each observation is drawn independently
› Errors are normally distributed
› The mean of errors is = 0
› Errors are not correlated with each other, nor with the IV
› Homoscedasticity of variance
- Variance of errors is not a function of IVs
- The variance of errors at all values of X is constant, meaning that it is the same at all levels of IV
reg df
number of IVS
do you report the non significant parts in regression conclusion?
YES