L4: Causality and statistical interpretation part 2 Flashcards
summarising categorical data?
either use the
Number of events e.g: no. of people or Proportion of events
univariate analysis?
Univariate Analysis
Univariate analysis refers to analyzing a single variable to understand its distribution, central tendency, and variability.
Continuous Variable: This is a variable that can take on a wide range of values (e.g., height, weight, age). For example, if you have a dataset of people’s ages, the mean would be the average age, and the standard deviation would measure how spread out the ages are around the mean.
Mean: The average value of the variable.
Standard deviation: A measure of how spread out the values are from the mean.
Categorical Variable: This type of variable falls into categories or groups (e.g., gender, blood type, smoking status). For categorical variables, you analyze the count (how many people are in each category) and proportions (what percentage of people are in each category).
Count: The total number of people in each category.
Proportion: The percentage of people in each category.
comparing distribution?
-test:
The t-test is used when you are comparing the distribution of a continuous variable (e.g., weight, age) between two groups (a dichotomous variable, which means it has two categories, e.g., male vs. female).
Example: Comparing the average age (continuous variable) between male and female (dichotomous variable).
ANOVA:
ANOVA (Analysis of Variance) is used to compare the distribution of a continuous variable across more than two groups (i.e., a categorical variable with more than two categories).
Example: Comparing the average weight (continuous variable) between different ethnic groups (categorical variable with more than two groups).
Chi-squared test:
The Chi-squared test is used when you’re comparing two categorical variables to see if there is a relationship between them.
Example: Comparing smoking status (e.g., smoker vs. non-smoker) and gender (male vs. female).
- P-value and Observational Studies
In observational studies, table 1 typically presents descriptive statistics (e.g., counts, means, and proportions). P-values are not always required in descriptive tables (like Table 1) because they are more relevant when testing hypotheses or comparing groups.
P-values are calculated when you conduct hypothesis tests (like the t-test, ANOVA, or chi-squared test). They show whether any observed differences are statistically significant.
p values and hypothesis?
Need to see P values in randomised controlled trial - p>0.05= no stat
p<0.05= reject null hypothesis accept alternative.
T-test
Null hypothesis: no difference in mean age between those who continued on the exercise pathway and those who did not.
Alternative hypothesis: difference in mean age between those who continued on the exercise pathway and those who did not.
🡪 Compare the mean of a continuous variable between two groups (i.e., dichotomous variable)…. If the categorical variable has more than 2 categories, then you need to run an ANOVA test.
Chi-squared
Null hypothesis: no difference in distribution of a categorical variable X between those who were exposed and those who were not (i.e. another categorical variable).
Alternative hypothesis: difference in distribution of a categorical variable X between those who were exposed and those who were not (i.e. another categorical variable).
correlation coefficient
The association between 2 continuous variables
Continuous vs continuous
Do not talk about correlation when saying: causation vs association
Not causation vs correlation
To deal with confounders at the level of study design (last lecture)-matching, randomisation, etc.
regression analysis?
Regression Analysis Basics:
Regression analysis is used to predict an outcome (dependent variable) based on one or more exposure variables (independent variables). The idea is to find a relationship or pattern in the data that can help you predict the outcome.
For example, predicting blood pressure (outcome) based on age, weight, or exercise habits (exposure variables).
Univariate (Crude) Regression:
When you have only one exposure variable in your model, it’s called univariate regression or crude regression.
Univariate means you’re looking at one factor (exposure) to predict or explain the outcome, without considering any other potential confounders or additional variables.
This kind of model is simple and doesn’t account for other variables that might influence the relationship between your exposure and outcome.
If you add more exposure variables to the model, you move into multivariate regression, which is adjusted regression. This allows you to account for multiple variables at once and see how each one independently affects the outcome while controlling for others.
For example, when predicting blood pressure, you could look at age, weight, and exercise all at once to see how each contributes to blood pressure, adjusting for the others.
dependent variable= outcome
independent variable= exposure and potential confounders
So if shown table in exam with crude risk ratio and adjusted risk ratio different as adjusted- take into account the confoundings? In example on slide 48 Walking speed goes up with 0.03%?
different types of outcome?
- Continuous Outcome 🡪 Linear Regression
Continuous outcome = A variable that can take on any value within a range, e.g., weight, blood pressure, cholesterol level.
Linear regression is used when the outcome variable is continuous.
Equation: Outcome = Intercept + (Slope × Exposure Variable)
Interpretation: The coefficient (slope) tells you the change in the outcome for each unit increase in the exposure.
Example: If you’re studying the effect of physical activity (exposure) on blood pressure (outcome):
Blood Pressure = Intercept + (Slope × Physical Activity)
The slope represents the change in blood pressure associated with a 1-unit change in physical activity.
- Dichotomous Outcome (e.g., Case or Control in a Case-Control Study) 🡪 Logistic Regression
Dichotomous outcome = A variable that has two categories or outcomes, such as yes/no, disease/no disease, or case/control.
Logistic regression is used when the outcome is binary or dichotomous.
Odds Ratio (OR) is the key result you get from logistic regression. It measures the odds of the outcome happening in one group compared to another group.
If you’re using case-control studies, logistic regression is typically used.
Example: If you’re studying the effect of smoking on the likelihood of lung cancer:
Lung Cancer (Yes/No) = Intercept + (Slope × Smoking Status)
The odds ratio tells you how much more likely people who smoke are to develop lung cancer compared to non-smokers.
- Time to Event Analysis 🡪 Cox Proportional Hazards Regression
Time to event outcome = An outcome that involves the time until a specific event occurs, e.g., time until death, time until disease progression, etc.
Cox proportional hazards regression is used for survival analysis or time to event analysis.
This type of regression considers both the time to the event and the hazard (rate) of the event occurring.
Example: If you’re studying survival time after cancer treatment:
hazard= Intercept+Slope×Log(exposure)
Survival Time = Intercept + (Slope × Treatment Type)
The hazard ratio (HR) obtained from the Cox regression tells you how the risk of the event (e.g., death) changes based on the exposure (e.g., treatment type).
look at
Table 1 - descriptive statistics
Slide 53: looked at pre-menosposal and post-menaposal because of effect modifiers?
Slide 54- adjusted odds ratios. Adjusted as they are trying to deal with confoundings and these are the variables they considered to be confounders as they are associated with exposure and outcome and not an intermediate in the pathway. Give this answer in the exam and essay
Say if you agree whether they are confounders, if they fit this criteria and what specific confounders they accounted for.
From table on 54- interpreting odds ratio at 95% confidence interval. See direction (below or above 1)
Slide 57: do crude- univariate variable.
Not dealing yet with any potential confounders.
(watch lecture capture)
18 month lag time- clock started 18 months after. Prevent reverse causation. Everyone that dies from prostate cancer is out so can test effect of physical activity.
Potential for information bias so there is a variety of measurements ??
Watch slide- interpret tables are in exam
Even when adjusting for confounders always say there is a way for residual confounders.