Linear regression and correlation Flashcards
Fitting a line to scatterplot?
Method of least squares; makes use of differences between line and points being positive and negative so must square them. Similar to standard deviation! Choose line that has smallest value for sum of squares.
In regression, does σ of FEV1 (Y) depend on height (X axis)?
No: assumed that SD is constant for all heights (i.e. spread of FEV1 about the mean FEV1).
In regression, is it true that the FEV1 of a given student depends on his height?
No - only that the mean of the population of FEV1 depends linearly on the other variable. Y axis given as mean; X is actual value. Consider the FEV1s for the population of students with a given height and assume that this mean varies linearly with height.
Three parameters to estimate in regression?
σ (SD about the line), α (intercept) and β (slope). Just as for a single variable can estimate the population mean μ and σ, here μ depends on α + βx, which are population parameters and cannot be known. Instead, use sample, fit line where y = a+bx and use these sample estimates. Also need to estimate σ about the line using s. Intercept (a) is on the same scale as y so uses same units. B is y per x.
b and β?
Just as use SE to see if m is a good estimate of μ, can see if b is a good estimate of β. Again, use SE, but more complicated.
Hypothesis testing in regression?
Has a limited role; more about how good estimate of β is. The one hypothesis of interest is whether β = 0. This is because if β=0 then the mean of Y does not change with the x variable (i.e. no association).
Regression in minitab?
Skip all but “the regression equation is”. Gives “FEV1” when should be mean FEV1. Also look at Predictor, Constant, Ht (x). Constant = intercept. Get SE of Ht listed under SE Coef, and get P value for Ht too. Next bit of output is S, the same as our s, i.e. estimate of SD (spread of FEV1 about the fitted line).
In summary, get estimated slope and intercept (under Coef), the SE of the slope, under SE Coef, the P value for β=0, and the SD about the line (s).
How can intercept be negative?
This is because intercept is mean FEV1 when height = 0 which would obviously not make sense. For this reason intercept is only important for plotting the slope.
What does a P value of 0.005 mean for the hypothesis β=o?
Means that there is very strong evidence that FEV1 mean does depend on the height. Does not mean that FEV1 can be determined once height is known, just that mean FEV1 changes with height.
“Naming” regression?
Regression of y on x (seeing how mean y varies with x).
What does s account for in regression?
The variation left in FEV1 values AFTER height has been accounted for
Clinical applications of regression?
Some important variables might be very difficult or invasive to measure so instead predicting it from other, related variables would be helpful. In practice, the natural variability often gives wide limits.
Single value estimate of FEV1 for height of 180cm?
As FEV1 = α+hβ, but we must use a and b, mean FEV1 for his height is -9.19 + 180 * 0.07 = 4.20L.
Intervals in regression?
Before, acknowledge uncertainty in m as an estimate of μ by computing interval where expect μ to lie. These intervals widen as sample from which the interval is calculated increases (as estimate of μ becomes more precise). The same in regression: estimate the mean using a+bh means that uncertainty in a and b will give estimate of h (4.20L) uncertainty. This is more complicated, but intervals still get smaller as sample size increases.
Why are confidence intervals not appropriate for regression?
Because if estimating mean FEV1 for one person of given height, it does not follow that making the sample as large as possible will make this interval indefinitely smaller as variation will still remain. Therefore cannot have a formula that allows sample size to reduce interval to 0.