DL2: Quiz 1 Flashcards
Define sensitivity?
TP/All+
The proportion of pt with dz who test + over all +
Define specificity?
TN/All-
The proportion of pt without dz who test - over all -
Define PPV?
Probability that people who test positive have the disease
Define NPV?
Probability that people who test negative do not have the disease
Define population?
All possible subjects of interest to the study
Define sample?
A subset of the population the is to represent the population
Define statistic?
A number that represents a property of the sample
Define ratio?
One number divided by another
Define proportion?
ratio (a part divided by the whole)
Define probability?
The chance of an event occurring
Define risk?
Probability of an event occurring
Define rate?
Proportion with a time period
Define incidence?
new cases that occurred/population at risk
Proportion of people who develop a condition during a time period
Define prevalence?
new cases that occurred/population at risk
Proportion of people who have a condition at one interval of time
Qualitative data?
Categorical
Nominal: pertaining to names
Ordinal: categories have an order or rank
Quantitative data?
Continuous
Interval: No absolute zeros (addition and subtraction)
Ratio: has absolute zero, no negative numbers (multiply and divide)
Independent variable?
The one we can manipulate
Dependent variable?
The one we measure
Covariants/Cofounder?
Any variable other than the chosen independent variable the may affect the dependent variable
Mean?
Sum of all observation/number of observations
Median?
Middle number when observations are placed in numerical order
Mode?
Most frequent observationz
Range?
Highest value minus lowest
Variance?
Subtract the mean from each measurement and square the result
Standard dev?
The square root of the variance
A: Lowest observation
B: lower quartile
C: Median
D: Upper quartile
E: Highest observation
Descriptive stats?
Organizes and summarizes data (skewness, mean, median, mode, standard dev, scatter plots)
Inferential stats?
Estimate population parameters, and how confident we can be in our conclusions
Simple randoming
Probability sampling
Every subject has equal probability of being selected
Systemic random?
Probability sampling
Select every nth subject
Randomly selects subjects with known sampling strategies
Stratified sampling?
Probability sampling
Divide population into relevant strata and take random samples from each stratum
Cluster sampling?
Probability sampling
Divide population into cluster and randomly select a subset from each cluster
Convenience sampling?
Non-Probability sampling
Select subjects based on availability, not representative of population
Volunteer sampling?
Non-Probability sampling
Take all subjects who volunteer
Why is probability better than non-probability sampling?
Not based on probability and susceptible to selection bias
Stratified vs cluster sampling
Stratified:
1. Partition population into mutually exclusive homogenous groups based on factor that may influence the measured variable
2. Obtain a simple random sample from each group
3. Collect data on each subject the was randomly sampled from each group
4. Heterogenous is split into homogenous sub pops (starts collection is exhaustive)
Cluster:
1. Divide population into groups
2. Obtain a simple random sample of clusters
3. Collect data on every subject in each of the randomly selected clusters (heterogeneous)
4. Useful when target of an intervention is a system rather than individual
What type of distribution?
Normal
What type of distribution?
Binomal
Poisson distribution?
Discrete, quantitative data that occurs independently and randomly in time at some constant mean rate.
Primarily used to estimate the probability of rare events and predict the number of times an event occurs
Give probability that an outcome will occur a specified number of times when the number of trials is large and probability of an occurrence is small
Ex: Used to calculate number of deaths from lung cancer in a year in a town. Info is used to compare observed and expected values to decide if the number of deaths from cancer is higher or lower than expected
What type of distribution?
Poisson distribution
What causes skewness?
Outliers
Kurtosis?
A measure of the combined weight of the tails relative to the rest of the distribution
Mean
Median
Mode
What is the purpose of data transformation?
To change skewed or unknown distributions to a normal distribution in order to calculate p-value
What is central limit theorem?
When equally sized samples are drawn from a non-normal distribution, the plotted mean from each sample will approximate a normal distribution as long as the non-normality was not due to outliers
Sufficiently large sample is generally considered 30 or more
What is p-value?
The probability of obtaining a measurement as extreme as the one obtained, assuming the null hypothesis is true.
What is null-hypothesis?
A hypothesis that states that there is no significant difference between 2 sets of data.
Type 1 error?
Rejecting the null hypothesis when the null hypothesis is true
False positive
Type 2 error?
Accepting the null hypothesis when the null hypothesis is false
False negative
What is 𝛂?
Critical value for rejecting the null hypothesis (0-1)
When would you reject the null?
P<𝛂
- a small p-value (i.e., less than alpha) is an “unlikely” result to obtain, allowing us to reject the null hypothesis (i.e., we see a statistically significant difference in the two groups).
- a large p-value (i.e., larger than alpha) is a “likely” result to obtain, allowing us to accept the null hypothesis (i.e., we will not see a statistically significant difference in the two groups).
What is ß?
Probability of a type II error (FN)
What type of graph? What does it do?
Histogram
Presents data as frequency counts over some interval
What type of graph? What are its components?
Boxplot
1. Thin lined box indicates the IQR – the 25th to the 75th percentiles of the data.
2. Within the thin lined box is the bolded line – the median.
3. From both ends of the thin lined box is the tail (or whiskers) which shows the minimum and maximum points up to 1.5 IQRs beyond the median.
4. The circle is an outlier, defined as data between 1.5 to 3.0 IQRs beyond the median.
5. The asterisk is an extreme outlier, defined as data points beyond 3.0 IQRs beyond the median.
What type of graph? What are its components?
Scatterplot
Presents data from 2 variables both measured on a continuous scale
Useful for accessing the association between 2 variables and assessing assumptions of tests such as linearity and absence of outliers
Confidence interval?
Range of values in which we have some level of confidence the true population value will lie
Smaller CI means less variability
95% CI is same as 5% alpha
Narrow CI: little variation and more precise
Wide CI: Greater variation and less precise
What does overlap of CI box plots mean?
Directly related to p-value
less overlap = larger difference and lower p-value
p«alpha = reject null and statistically significant
More overlap= smaller significant and higher p value over alfa = accept null and no statistical significance
Calculate risk ratio?
Risk in people with risk factor/risk in people w/o risk factor
RR = (a/(a+b)) / (c/(c+d))
Calculate absolute reduction or increase?
ARR
EER-CER
Risk of experimental-risk of control
Calculate relative risk reduction?
RRR
(Risk of experimental-risk of control)/ risk of control
(EER-CER)/CER
Calculate number needed to treat?
NNT
1/ARR (absolute risk reduction)
Calculate number needed to harm?
NNH
1/ARI (Absolute risk increase)
Calculate odds of risk factor in cases (with event)?
a/c
Calculate odds of risk factor in control (no event)?
b/d
Calculate odds ratio?
(a/c)/(b/d) = ad/bc
Ratio of the odds of an exposure in the case group to the odds of an exposure in the control group
Cohort studies?
Observes development of disease in exposed and unexposed groups
Case control studies?
Select subjects with event, compare presence of risk factor in cases with event to controls with out event
CI interpretation?
- RR CI contains 1: no difference in risk. Do not reject H0.
- RR entire CI > 1: risk in intervention group > risk in control group.
- RR entire CI < 1: risk in intervention group < risk in control group.
OR interpretations?
- OR CI contains 1: no difference in odds. Do not reject H0.
- OR entire CI > 1: Odds in Case(or event) group > odds in control group. Reject H0
- OR entire CI < 1: Odds in Case (or event) group < odds in control group. Reject H0
______________ tests make assumptions about the parameters of the population distribution from which the sample data is taken.
PArametric
______________ tests do NOT make assumptions about data distribution. However, they do require groups to have approximately the same dispersion.
Non-parametric
When should non-parametric test be used?
- Data don’t seem to follow distribution
- Assumptions underlying parametric tests are not met
- Sata appear to be very skewed
- Data has significant outliers
Types of parametric tests?
- Paired t-test
- Unpaired t-test
- Pearson correlation
- One way ANOVA
Types of non-parametric equivalent?
- Wilcoxon Rank sum test
- Mann-whitney u test
- spearman correlation
- Kruskal Wallis test
What is paired variable?
Compare for 2 different variables for same group
What is dependent variables?
Compare outcomes on the same variable fro 2 different groups
Which is which?
a: one tailed (5%)
b: 2 tailed (2.5%)
What are t-tests? Types?
Test for differences between means, larger the stat the tmore difference between the groups
Independent sample: compares means of 2 groups
Paired: compares means from same group at different times
One sample: compares the mean of one group to known mean
Degrees of freedom?
A measure of the amount of independent data that can be used to estimate a parameter
The probability distributions of the test statistics of hypothesis tests
Number of data points which are free to vary
What is degrees of freedom dependent on?
1 Number of groups compared
2. Number of parameters needed to estimate the standard deviation
How do you test for independence?
- Random samples
- Categorical data (counts)
- Non-Parametric
- Tests whether a categorical variable is related to another
How do you test for goodness of fit?
- Random samples
- Categorical data (counts)
- Non-Parametric
- Tests whether data is representative of the full population.
- Compares observed data to a theoretical model
What does chi square observe?
frequency with expected frequency
What is survival analysis?
Branch of stats for analyzing the expected duration of time until an event occurs
Must deal with censored data
What are the causes of censored data?
- event doesn’t occur during study period
- subject lost to follow up
- subject dies from something other than studied cause
What is kaplan-meier analysis? Assumptions?
Non-Parametric survival analysis method – no assumptions about how event probability changes over time.
- Censoring is independent of event probability
- Survival probabilities are comparable in early and later recruited subjets
- Censoring is not more likely in one group than another
What is hazard rate?
The relative risk of complications based on comparison of event rates.
What is intention to treat?
Every patient randomized enters the primary analysis
What is per-protocol?
Analysis includes only those patients who strictly adhered to the protocol
Identifies effect under ideal conditions
When would you use a forest plot?
Key way data from multiple papers is summarized in a single image