All stats Flashcards
Name the 2 broad categories data can be split into
- Categorical
2. Quantitative
What can categorical data be split into?
- Binary
- Nominal
- Ordinal
What can quantitative data be split into?
- Discrete
2. Continuous
What is binary data?
Data split into 2 categories
Give an example of binary data
Success/ failure
Yes/ No
What is nominal data
More than 2 categories
Give an example of nominal data
Eye colour
Hair colour
Hair type
What is ordinal data
Ordered data
Give an example of ordinal data
Happiness rating on a scale of 1-10
Customer server rating of 1-5
What is discrete data
Data in the form of numerical values
Give examples of discrete data
- Number of kids
2. Movie rating in stars
What is continuous data
Uninterrupted data
Give examples of continuous data
Height
Time
Weight
Name the best way to represent categorical data
In a bar chart
Name the best way to represent continuous data
Histogram or box plot
Define skewness
Skewness is a measure of probability distribution around the mean
Name the 3 ways be describe skewness
- Left skew
- Symmetrical
- Right skew
Describe the relationship between median and mean in a data set that is left skewed
Mean < median
Describe the relationship between median and mean in a data set that is right skewed
Mean > median
What is central tendency
Measures of specific points in a data set
Give examples of central tendency measures
Mean
Median
Mode
What are variation measures?
Measures of spread of variability
Give examples of variation measures
- Variance
2. Standard deviation
What is the standard deviation
A measure of the average scatter around the mean
greater the spread of data greater the SD
What is normal distribution used to describe?
Used to describe continuous data that forms a bell shaped symmetrical curve
What is a key characteristic of normally distributed data
Mean, median and mode are all equal
What symbol to we give to represent the mean?
μ
What symbol to we give to represent the SD
σ
Give examples of data that could be normally distributed
Height Ade Weight Bone density Exam scores BP
How do we check for normality
- Look at the histogram does it appear bell shaped
- Are mean, median and mode similar
- Do 2/3rds of the data lie within 1 sd from the mean
- Run numerical tests of normality
Describe a Q-Q plot for normally distributed data
- Follows a straight line
Give examples of numerical tests we can use to assess normality
- Kolmogorov-Smirnov
2. Shapiro Wilk
What requirements must a qualitative data set fulfil before we can calcite a central limit theorem for it?
Sample size must be larger than 30
What does μ+σ mean and what does it determine on a curve for normally distributed data?
mean+standard deviation
Determines the shape of the curve
What does μ mean and what does it determine on a curve for normally distributed data?
μ is the mean and it determines the line of symmetry on a bell curve
What does σ mean and what does it determine on a curve for normally distributed data?
σ Is the standard deviation and it determines the spread of data around the mean
What does the empirical rule state?
All curves are standardised where:
μ= 0
σ= 1
How much of the populations represented 1 standard deviation +/-mean
68%
How much of the populations represented 2 standard deviation +/-mean
95%
How much of the populations represented 3 standard deviation +/-mean
99.7%
Define population
A group of all items of interests
Define sample
A set of data drawn from the population
Define parameter
A descriptive measure of a population
Define statistics
A descriptive measure of a sample
What is inferential statistics
Drawing conclusions. inferences about characteristics of a population based on SAMPLE data
What is descriptive statistics
Is using data to provide descriptions of the population through numerical calculations or graphs or tables
What is a statistical inference?
Is the process of making an estimate, prediction or decision about a population based on the data from a sample
What is standard error?
The standard deviation of the sample mean
How do we calculate confidence interval
Sample statistic +/- measure of how confident we want to be (1.96)*SE
What does the sample statistic equal
The sample mean
What do we mean when we say we are 95% confident
We are 95% confident that our true population mean lies in this bar
What is hypothesis testing
Testing whether the difference in values obtained is significant or not
Talk through the steps of hypothesis testing
- Decide statistical question
- Assume the null hypothesis
- , Predict the sampling variability assuming the null hypothesis
- Do the experiment
- Calculate the p value
- Hypothesis test
When do we accept our null hypothesis
If the p value is greater than 0.05 (p>0.05)
There is no association between the 2 factors
When do we reject our null hypothesis
If the p value is LESS than 0.05 (p<0.05)
There IS an association between the 2 factors
What gives us more information hypothesis test or confidence interval?
Confidence interval
What does a confidence interval overlapping with zero indicate
There is no difference and therefore we reject the null hypothesis
What is a type I error
When you reject the null hypothesis when it it true (false positive)
What is a type II error
When you accept the null hypothesis when it was false (false negative)
What does power mean in term of statistics?
The probability of finding a difference in 2 groups if one truly exists (the probability of NOT making a type II error)
Do want our study to have a high or low power?
High power (at least 0.8/80%)
List some factors that affect the power
- Size of effect
- Standard deviation
- Sample size
- Significance level
How does size of effect affect the power of our study
A larger difference in observed values will increase the power as values are further from 0
How does standard deviation affect the power of our study
A larger SD decreases the power as it means more variability meaning a shallower curve
How does sample size affect the power of our study
A larger sample size increases the power as it narrows the curves so less of the observed data is likely to fall within “rejection” region
How does significance level affect the power of our study
Increasing significance level decreases the power
What is correlation?
Describes the relationship between two variables
What is regression
Regards one variable as the predicted and one as the outcome
What is the ‘predictor varibale’
Independent variable
What is the ‘outcome variable’?
Dependant variable
What assumptions do we make when looking at regression
- Y is normally distributed at each normal value of X
2. The variance of Y at every value of X is the same (
How do we calculate the residual of a data set
observed value-predicted value
How do we calculate the observed value when calculating regression?
We extrapolate data from a linear graph
What formula does a linear graph follow
y=mx+c
List some functions of multivariate analysis
- Control for cofounders
- Test for interactions between predictors
- Improve predictions
Define risk ratio
Rate of condition in exposed: rate of condition in no exposed
When are risk ratios used
WUsed for categorical data
What is an odds ration
Odds of event occurring in a treatment group: odds of event occurring in a control group
What does an odd ratio of 1 mean
No difference between control and treatment group
What does an odds ration of not 1 mean
There is an association between the groups
What is survival analysis?
A statistical method for analysing longitudinal data on occurrence of events
Name the curve commonly used to describe survivorship of study populations
The Kaplan Meier curve
What does a correlation co efficient of -1 mean
Negative relationship as the x variable increases y decreases
What does a correlation co efficient of +1 mean
Positive relationship as the x variable increases y increases
What does a correlation co efficient of 0mean
no association as x increases y stays the same (straight line on a graph)