Statistics Flashcards
Categorical data
Data that is finite and countable
Children squared
Odds ratios
Continuous data
Data that has infinite values
Eg measurements
T squared test
Provide mean difference with confidence intervals
Reference ranges
Used to describe the variations of a measurement for a defined population
Taken from standard deviation
Shows consistency of results
And whether it failed in some people
Confidence intervals
Use 95%
Means 95% confident that the value lies between the reference ranges
Measure reliability
Hypothesis testing
Whether the results happened by chance or whether the association is real
Start with a null hypothesis
Compare actually results with expected results
Work out the value
P values
The probability that the observed association is due to chance
P reject null hypothesis
Chi squared test
For categorical data that isn’t paired
1) compute expected numbers under null hypothesis
Expected count=row total x column total/ overall total
2) calculate difference between observed and expected valued
X2 = sum of (observed-expected)/ expected
3) look up on value on c squared table for 1 degree of freedom
Independent t test
Independent continuous data that is normally distributed
1) difference between means
2) calculate standard error
3) T= difference/SE
4) find p value on t distribution on N-2 degrees of freedom
Paired samples t test
Paired continuous data that is normally distributed
Difference between means over standard error
Use n-1 degrees of freedom for finding p value
Data that isn’t normally distributed
Try log transforming
Use a parametric test-> Mann-Whitney
Correlation
Mutual relation of two of more variables
Doesn’t show causation only association
Regression
Measure of relation between the mean values
Used to determine the strength of association
Assumes a one way causal effect
Finds the line of best fit
R= degree of linear relationship
R2= % of variation explained by exposure
Prevelance
Number of people with disease/total pop
Incidence risk
Number of new cases/total population
Incidence rate
Number if new cases/person time at risk
Ecological fallacy
Can’t assume the relationship we see can be directly transfer to individual level
Relative risk ratio
Association between an exposure and outcome are compared to a baseline group
Risk ratio= risk of event in exposed/risk of event in unexposed
=1-> no difference
1 more likely to have outcome
Categorical data
Odds ratio
How likely an event is to happen compared to how likely the event won’t happen
Odds ratio= odds in exposed/odds in unexposed
Mutations be used in case control studies
Categorical data
Effect size in continuous data
Mean difference
= 0 no difference
0 outcome is higher in exposed
Confidence intervals
For measures of effect
Require a % confidence
Estimate of size is in the middle
If the interval crosses 0 there is no difference as for some people it has had the opposite effect
Paired data
Difference of the responses between a pair
Eg before and after, wives and husbands
When we know in advance that observations in one data set are directly related to those in another data set
Independent data
Responses of one treatment group compared to another
Two unrelated sets of units are measured
Univariate analysis
Descriptive
Summaries data one variable at a time
Bivariate analysis
Comparison of 2 groups -> relationship between them
Correlation and measure of effect tests
Regression co efficient
Y= a + bx
Regression co efficiently
-> a= y intercept when x=P
-> b= gradient
B coefficient
Gradient
An estimate of how much, on average Y increases/ decreases for each unit increase in x
Positive b-> outcome increases as exposure increases-> positive correlation
Negative B-> opposite
B= 0 -> outcome and exposure not related
T test for b coefficient
Test of the relationship between the dependent variable and a specific independent variable
T value= b coefficient/ std error of b
Use to find p value
Correlation r values
Measures the degree of linear association between two variables 0-0.3 -> weak positive 0.3-0.5 moderate >0.5 strong Still need p values
Analysis of variance
Extension to t test Continuous data Compares means in > 2 groups so used for multiple outcome measures Eg demographic analysis -> how heterogenous is your group
Multiple logistic regression
Linear regression for binary outcomes eg yes/no
Single outcome and more than one independent variable
Still gives you regression co efficients
Used to predict probabilities of different possible outcomes of categorically distributed dependent variable given a set of independent variables
Multiple linear regression
Continuous data
Two or more explanatory variables and one response variable
Type 1 error
False rejection of a null hypothesis -> find an effect that isn’t there
Often the result of excessive statistical testing
Decease effect by reducing the level of significance
Avoid multiple significance testing or multiple sub group analysis
Defining the hypothesis with primary and secondary outcomes reduces type 1 error
Type 2 error
Failure to reject a false null hypothesis -> miss an effect that is there
Not enough power to find significant difference-> sample size to small
Need at least 80% power