Lecture 6: p-values and confidence intervals Flashcards
What is a point estimate?
An inference made about the population based on the sample.
Left-skewed distribution
a distribution that has a concentration of data on the upper end and the tail on the left
Skewness depends on where the tail is (so tail on left is left skewness)
What does correlation measure?
Correlation measures the degree of relationship between two or more variables
It looks at association
What is the goodness of fit of a model?
Goodness of fit describes how well data fits a set of observations
What is standard error of regression?
The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable
R^2 (Goodness of Fit)
R^2 is a goodness-of-fit statistic.
Values: between 0 and 1.
Interpretation: The larger the better.
Meaning: Proportion of the outcome’s variability that the model explains.
What are some questions to think about for generalising?
*Would I get the same coefficient if I built my model using different data?
*How likely am I to estimate the correct value?
confidence interval
statistical range, with a given probability, that takes random error into account
A confidence interval refers to the probability that a population parameter will fall between a set of values for a certain proportion of times.
What is meant by interval width?
Interval width = boundaries of your sample’s estimate
When should you be concerns about confidence intervals?
Be concerned about confidence intervals if they include a contradictory estimate
Match the confidence level with the interval width
99%
90%
10%
99% - very wide
90% - wide
10% - narrow
Confidence level is proportional to interval width
ANALOGY
Analogy: A bigger net is more likely to catch the fish
you are looking for.
If we assume there is no association, what will you expect?
Assuming there is no association, you will expect:
- a zero coefficient is very likely
- tiny coefficients are somewhat likely
- and big coefficients are unlikely
What is p-value?
the probability of a coefficient at least as big as estimated assuming the coefficient is actually zero
Small p-value
1.When we assume the True coefficient is zero, the probability of sampling 0.32 is small, i.e. our estimate is unlikely.
2.But we know our sample estimate is 0.32.
3.Therefore, we concede that our assumption is probably wrong.
4.We conclude that an association is likely
Large p-value
1.When we assume the True coefficient is zero, the probability of sampling 0.32 is high, i.e. our estimate is likely.
2.And we know our estimated coefficient is 0.32.
3.Therefore, we concede that our assumption could still be right.
4.We conclude that an associations is not likely.
Difference between small p-value and large p-value
Small p-value:
*Zero-assumption is probably
wrong.
*An association is likely.
Large p-value:
*Zero-assumption is probably right.
*An association is unlikely
Example:
p-value is so small that it is <0.001
Typically, p-value <0.05 is statistically significant, the threshold is called alpha
GRL
What do neither confidence intervals nor p-values tell us?
Neither confidence intervals nor p-values tell you anything about the effect of the exposure or how meaningful your conclusion is.
What is the order for dealing with data?
O
G
T
A
Observe - look at data
Guess - build a model to estimate relationship
Test - how well does model fit (R2 and S)
Assess - confidence intervals etc
R^2 summary
R^2
large = good
small = bad
Confidence interval summary
If a confidence interval includes contradiction = bad
P-value summary
P-value summary
Small = good
Large = bad.
Relationship between confidence level and confidence interval width
The larger your sample, the more sure you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval
What does it mean if the confidence interval for a coefficient encompasses zero?
Zero is the null value of the parameter (in this case the difference in means). If a 95% confidence interval includes the null value, then there is no statistically meaningful or statistically significant difference between the groups.
What does a regression coefficient mean?
Regression coefficients are estimates of the unknown population parameters and describe the relationship between a predictor variable and the response.
What does it mean if the P-value is less than your significance level?
If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant –> reject null
(GRL)