Biostatistics Flashcards
What is biostatistics/ the purpose
The collection and analysis of data ( so statistics), except specifically related to understanding the effects of a drug or medical procedure on people and animals
Its used to understand medical and pharmacy journals and helps us be able to answer clinical questions from patients and providers. Ex: on a question we should be able to determine if a drug is appropriate for a patient based on if they meet the exclusion criteria for a study ex: consider relative risk
What is a study manuscript
A description of the research completed with the results
What is peer review
When a researcher sends their manuscript to a journal and the editor sends it to experts in the field to be reviewed to assess the research design, the methods, the value of the results, the conclusion, how well it’s written, and whether it is appropriate/fitting for the journal. Reviewers decide whether to accept (usually with revisions or to reject it.
List the steps to publication
Research Question
Design the Study
Enroll the Subjects
Collect the Data
Analyze the Data
Publish the Data
What is continuous data + two types
Data (usually numerical) that has a logical order with values that continuously increase or decrease by the same/a measurable amount
Two types are ratio data and interval data
What is ratio data
continuous type of data with an equal difference between the values and there IS a meaningful zero. ex: age, height, BP, weight - ex: zero blood pressure is meaningful because the pt would be dead
What is interval data
continuous type of data with equal difference between values but there is NO MEANINGFUL ZERO
ex: celsius and farenheit scales - the zero temp doesn’t mean no temperature, but it’s not meaningful because it just means its cold
What is discrete data and the two types
Categorical data
Two types: nominal and ordinal data
What is nominal data
It’s yes/no data. Data that goes into arbitrary categories (names) like male vs female, ethnicity, marital status, mortality
What is ordinal data
It is ranked and in logical order such as a pain scale NYHA Functional class but the categories do not increase by the same amount. (pain of 4 is higher than 2, but that doesn’t mean it is twice the amount)
What are the measures of central tendency and when are they preferred for which data types
Mean (preferred for continuous data that is normally distributed)
Median (preferred for ordinal data or continuous data that is skewed)
Mode (nominal)
Describe standard deviation
how spread out the data is away from the mean SD+/- a certain amount from the mean.
68% of the data will fall between 1 SD of the mean
95% of the data will fall between 2 SD of the mean
99.7% of the data will fall between 3 SD of the mean
What is the range
The highest value - the lowest value
What is the mode
The value that occurs most frequently
What is a gaussian or “normal” distribution vs skewed data
It’s a bell curve that is normal and usually seen in continuous data with large sample sizes. The curve is symmetrical.
68% of the values fall within 1 SD of the mean and 95% of the values fall within 2 SD of the mean. You can use mean** or median or mode to describe your middle.
You lack normal distribution or have “skewed data” when the sample size is small or there are outliers in the data - when there is a small number of values, the outlier has a large impact on the mean. In these cases the median** is a better indicator of central tendency. Wherever the outliers are the graph will skew to that direction. Median is used to describe the middle for ordinal data too.
Negative skew = left skew
Positive skew = right skew
(skew refers to the tail of the data not the hump)
Distortion of central tendency can be fixed by collecting more values
independent variable
Changed /manipulated by researcher
dependent variable
outcome
Null hypothesis
states that there is no statistically significant difference between groups. It’s described as H0 or “Hnot”
- The null hypothesis is what the researcher tries to disprove and the alternative hypothesis (Ha) is what they’ve made up, are testing, and trying to prove is acceptable as true.
what is an alpha level and what does it mean in relation to the p value
represents the maximum error margin aka the tails on a normally distributed bell curve - The alpha level is determined by the investigator and usually set at 5% or 0.05 and it can be lower (which is better) but that just requires more subjects, more treatment effect, and more data - aka more money.
If the alpha level is set at 5% and the p-value is actually < 0.05, this means that we can reject the null because the data is deemed statistically significant and the alternate hypothesis can be accepted
If the p value is greater than or equal to alpha (p>/= 0.05) the study failed to reject the null hypothesis and its not statistically significant
If alternative hypothesis is accepted, phrasing that can be used:
This means you reject the null hypothesis or fail to accept it
If null hypothesis is accepted, phrasing that can be used:
This means you accept the null hypothesis or fail to reject it
The alpha level correlates with the values in the tails of a normally distributed graph
If 95% of values are within 2 SD of the mean, this correlates with an alpha of 5%
If 99.7% of values are within 3 SD of the mean, this correlates with an alpha of 1%
Formula for confidence interval (CI)
CI = 1- alpha
CI can be expressed as a range of values (ex: 95% CI 6%-34%)
This means that you are 95% confident that the true value of ____ for the general population lies somewhere between 6-35%. The more narrow the range, the more precise and the wider, the less precise
if alpha is greater than or equal to 0.05, what does this mean about the p-value and the significance and CI
that means p value is also greater than 0.05, CI is < 95% and its not statistically significant