Statistics: Regression analysis Flashcards
Give another name for a regression equation
A prediction equation
What is meant by a model in statistics
a simple approximation of how variables relate in population
What is meant by a conditional distribution?
The probability of y values at a fixed value of x
When you have several quantitative variables what can software provide?
A correlational matrix
Describe the relationship between x and y in regards to their distance from the mean
If an x value is a certain number of standard deviations from the mean the y value is r times that many standard deviations from its mean
show two forms of predictions error
- The error using the regression line to make a prediction (y-y^)
- The error using the mean of y to make a prediction (y- mean of y)
What formula do you use to summarize the size of the errors in each scenario?
total sum of squares for means, residual sum of squares for regression equation
How do we eliminate the dependance of units when summarising errors?
by calculating the proportional reduction in error using r^2 formula
what does it mean if r^2= 0.6?
The error using y^ to predict y is 60% smaller than the error using the mean of y to predict y.
Name two disadvantages of r^2
- It’s easier to interpret the original scale than a squared scale
- The direction of the relationship is lost
Name two factors that have strong influence on on the size of the correlation apart from outliers
- If the subjects are grouped for the observations rather than individually the correlation tends to increase in magnitude
- The size of the correlation depends on the range of the x values sampled. The correlation is smaller when we sample only a restricted range of x values than when we use the entire range.
What is the name given to the error of making predictions about individuals based on the summary results of groups?
Ecological fallacy
State the basic assumption for for using a regression line for description
The population means of y at different values of x have a straight line relationship with x in the form my=a+Bx
State the extra assumptions for using regression to make statistical inference
- Data was gathered using randomisation
- The population values of y at each value of x follow a normal distribution, with the same sd at each x value
What is the null hypothesis that x and y are statistically independent?
H0:B=0