Agresti chapter 12 Flashcards
Regression analysis
Y is the response variable and x the explanatory variable
Regression model, Population regression equation:
Uy = alpha + Betax.
alpha is a population y-intercept
beta is a population slope.
(parameters, in practice values are unknown)
Uy denotes the population mean of y for all the subjects at a value of x.
* not linear, approximates the relationship: a model
Conditional distribution
Probability distribution of y values at a fixed value of x. With an additional parameter of omega describing the SD for each distribution
Regression towards the mean
If an x value is a certain number of standard deviations from its mean, then the predicted y is r times that many standard deviations from its mean
Total sum of squares
The error measured with the Y-bar (mean, ignoring x): Sum of (y - y-bar)^2.
Proportional reduction in error (asses with squared correlation)
R^2 = sum of squares x residual sum of squares / sum of squares. Between 0 and 1. If r2=0,40, the error using y-hat to predict y is 40% smaller than the error using y-bar to predict y.
Property R^2
The closer r^2 is to 1, the stronger the linear association. The more effective regression equation is compared to y-bar to predict y.
Ecological fallacy
Making predictions about individuals based on the summary results of groups (should be avoided).
Statistical inference about regression
Data gathered using randomization
Population values of y at each value of x follow a normal distribution, with the same standard deviation.