Hypothesis testing 3 Flashcards
how do you fit a regression line to data by the least squares method?
Equation of line: Y = a + bX, where:
a = Y-axis intercept
b = slope of line
Conditions of ‘least squares method’:
line passes through the centre of the cluster of points.
Sum of distances, d, from fitted line must
be zero (i.e. ∑d = 0).
d = y1 - yL where:
- y1 = actual Y value of any
datum point
- yL = corresponding value
of Y on fitted line
sum of squares of the distances ( ∑d^2 ) must be as small as possible.
Slope of line (regression coefficient):
b = {∑ xy - (∑x∑y/n)}/{∑x^2 - (∑x)^2/n}
where:
n = number of pairs of observations
What are the assumptions of linear regression?
Three assumptions:
For each X there is a normal distribution of Y from which the sample
values of Y are drawn at random.
The normal distribution of Y corresponding to a specific X value has a
mean that also lies on a straight line termed the population regression
line.
Deviations of the points from the fitted line are normally distributed
with zero means and constant variance.
how do you test the goodness-of-fit to the line(coefficient of determination)?
r = correlation coefficient
r^2= coefficient of determination
- proportion of variance of
Y attributable to the
linear regression on X
- provides estimate of
strength of relationship
regression line commonly accounts for only a small amount of the variation in
Y thus leaving much of the variation to be explained by other variables
How do you check the goodness-of-fit to the line (Analysis of variance)
determines statistical significance of the line rather than the strength of the relationship of the two variables and is therefor a test of the null hypothesis that any observed relationship occurred by chance
The total variation:
Yss = ∑ y^2 - (∑ y)^2/n where:
n = number of pairs of observations
Linear effect = {∑xy - (∑x)(∑y)/n}^2/{∑x^2 - (∑x)^2/n}
portion of variance accounted for by the line
Error Variance = Yss - Linear effect
deviation from the line
Linear effect and error variance are compared using variance ratio (F) test
How do you check the goodness-of-fit to the line (t-test of the slope of the line)?
Ratio of slope to its standard error (t):
t =[ √ mean square of error effect]/ [∑x^2- {(∑x)^2/n}]
where:
mean square error is taken from the analysis of variance table
n = number of pairs of observations
P value obtained from t tables with n - 1 degrees of freedom
What is chi square distribution?
A variable has a chi-square distribution if its
distribution has the shape of a special type of
right-skewed curve.
Properties of 𝜒^2 – Curves:
- The total area under a -
curve equals 1.
- x^2 -curve starts at 0 on
the horizontal axis and
extends indefinitely to
the right, approaching,
but never touching,
the horizontal axis.
- x^2 -curve is right
skewed.
As the number of degrees of freedom becomes larger, 𝜒^2
-curves look increasingly like normal curves.
What is the equation for the chi-squared distribution?
𝝌^𝟐 = 𝜮 (𝒇𝒐 − 𝒇𝒆)^2/𝒇𝒆
where:
fo = observed frequencies
fe = expected frequencies
Chi-square goodness of fit test: interpretation*look at card
Look at slide 11-13 lecture 9