Exam 1 Flashcards
What are three uses of multiple regression
Description
Prediction
Theory Testing
What three criteria must be met for making causal statements?
Covariation
Temporal Precedence
Ruling out alternative explanations
How do you calculate sample variance?
∑(x-xbar)^2/(n-1)
How do you calculate sample standard deviation
√∑(x-xbar)^2/(n-1)
How do you calculate variation?
∑(x-xbar)^2
How do you calculate covariation?
∑(x-xbar)(y-ybar)
How do you calculate covariance
SP/(n-1)
How do you calculate correlation?
SP/√SSx * √SSy
What is a phi correlation?
Phi correlations are correlations between two true dichotomies. For example, we can correlate treatment condition and gender.
What are point-biserial correlations?
Point biserial correlations are correlations between a continuous variable and a true dichotomy (categorical). For example, we can correlate the treatment condition by number of sick days.
What is the difference between true and artificial dichotomies?
True dichotomies are discrete categories, while artificial dichotomies are categories that are created by making a cut score on a continuum.
What does the notation in the subscript of byx signify?
This is the slope of the regression line that summarizes the relationship between the predictor variable and the criterion. It’s also called the regression coefficient or the coefficient for the regression of Y on X.
What is the function of the regression intercept?
This tells us the y-intercept of the regression equation. It’s the regression constant. It tells us the value of Y when X=0. The function is to produce equality of the means of the observed and predicted scores. The mean of the predicted scores always equals the mean of the observed scores
If there is no relationship between predictor X and criterion Y, what is the best prediction of Y for any value of X?
The best prediction of Y for any value of X will be the mean of Y.
Into what two components can we partition any observed criterion score? What does each of these components measure? Are they orthogonal or do they share overlapping variance?
A criterion score can be partitioned into the part that can be predicted from X and the part that cannot be predicted from X (the residual). These parts are orthogonal (non-overlapping). An observed score equals the mean of the predicted score plus the residual.
What does r^2yyhat measure? What is it called?
This is the squared multiple correlation and it is a global effect size measure of a complete regression equation. There is only one R2 multiple for the whole regression equation. It’s the proportion of variation in the criterion Y accounted for by the set of predictors.
Given SSy, SSyhat (predictable variation), and SSy-yhat (residual variation). How would you predict r2multiple?
You would divide SSyhat by SSy.
In general, what hypothesis is tested in the analysis of regression?
Hypothesis tests are always written in population parameters. In regression, our null hypothesis is that the population regression coefficient (rho) equals 0, meaning that the proportion of variance explained by the predictor equals 0. The alternative hypothesis is that the population regression coefficient is greater than 0.
What is the structure of the F test and its degrees of freedom in the one-predictor case?
The structure of the F-test is the ratio of the mean squares regression (predictable) over the mean squares of the residuals. Said differently, it is the systematic variance among the predicted scores plus the random variance among the predicted scores divided by the random variance among the residual scores. df= (p, n-p-1)
What are meant by biased versus unbiased estimators?
An unbiased estimator is one whose expected value equals the corresponding population parameter. An example of this is the estimation of the sample mean and the expected value of the sample regression coefficient. A biased estimator is one where the value is not the same as the corresponding population parameter. It can be negatively (smaller than population) or positively (greater than population) biased. An example of a negatively biased estimator is degrees of freedom in sample standard deviation. An example of a positively biased estimator is the sample r2 multiple.
What is a sampling distribution? How would you create a sampling distribution of a regression coefficient? Is the sample regression coefficient an unbiased statistic or a biased statistic?
A sampling distribution is a distribution made up of j samples containing n people from a population with replacement. It’s a relative frequency distribution of a sample statistic. We can create a sampling distribution of a regression coefficient by continuously sampling sample regression coefficients, which would create a sampling distribution of the regression coefficient. This shows how stable or unstable the sample regression coefficient will be from sample to sample. This would be an unbiased statistic.
What is the standard error, what does it measure? What two things can we do to decrease the standard error of the regression coefficient?
The standard error is the measure of instability of the sample regression coefficient as an estimate of the population regression coefficient. A standard error is the Std deviation of a sampling distribution. We can increase the sample size and sample widely on x in order to decrease the standard error.
What is the standard error of estimate?
This is the standard deviation of the residual scores. It helps to measure which extreme scores bias the outcome.
What is the general structure of all confidence intervals?
C [ byx – A < Byx < byx + A ] = 1 - alpha
What two values are contained in the allowance factor? What is another name for the allowance facto
An allowance factor is a margin of error that contains the critical t value and the standard error of the regression coefficient