Statistics and Study Design Flashcards by Ashley Kim

What are descriptive studies vs analytical studies?

Descriptive studies: describe characteristics of a population/phenomenon (descriptive surveys, case reports, cross-sectional studies)

Analytical studies: test hypotheses, determine associations (observational - survey, cohort, case-control vs experimental - RCT)
- Analytical studies employ inferential statistical tools

How well did you know this?

Not at all

Perfectly

Data Types
- Nominal
- Ordinal
- Interval
- Ratio scale

Nominal - includes dichotomous/binary (pregnancy yes/no) and categorical

Ordinal - ranked (e.g. OHSS severity)

Interval - units with linear relationship to each other but NO absolute zero (e.g. temperature in Celsius or Fahrenheit)

Ratio scale - absolute value CAN be zero (e.g. Kelvin temperature scale, weight in lbs, age)

How well did you know this?

Not at all

Perfectly

Line charts are useful for…
Bar charts are useful for…
Pie charts are useful for…
Scatter plots are useful for…

Line charts - Seeing trends over time
Bar charts - Categories
Pie charts - Showing parts of a whole
Scatter parts - showing data distribution

How well did you know this?

Not at all

Perfectly

What defines a normal distribtion?

Observations are independent with “thin” tails (few extremes)

How well did you know this?

Not at all

Perfectly

Central limit theorem

sums of independent events will converge towards a normal distribution even if their underlying distributions are not normal

How well did you know this?

Not at all

Perfectly

Define SD in a normal distribution

1 SD - 68.2%
2 SD - 95.4%
3 SD - 99.6%

How well did you know this?

Not at all

Perfectly

Scope of data collection
- Census
- Sample or “study population”

Census - data compiled about the total population (e.g. SART)

Sample - process of drawing sample contains an element of randomness and may not truly represent the overall population

How well did you know this?

Not at all

Perfectly

law of large numbers

if sample is large enough, the distribution of sample approximates the distribution of the population from which it was derived

How well did you know this?

Not at all

Perfectly

Type 1 error vs Type 2 error

Type 1: null hypothesis is falsely rejected (false positive), alpha “arrogant”

Type 2: null hypothesis fails to be rejected and an actual relationship between populations is missed (false negative), beta “bashful”

How well did you know this?

Not at all

Perfectly

Statistical calculations:
- Sensitivity
- Specificity
- PPV
- NPV
- Accuracy
- Precision
- ROC curve

SENSITIVITY (true positive rate): test’s ability to correctly identify those with the condition:
- True positive / (true positive + false negative = or total positives)

SPECIFICITY (true negative rate) = true negative / (true negative + false positive = or total negatives)

PPV: TP/(TP + FP = total test positives) = probability that a positive test accurately indicates presence of the condition

NPV: TN/(TN + FN = total test negatives)

ACCURACY = the proportion of all test results that are correctly identified, both positive and negative. Represents the overall effectiveness of the test across all outcomes. Also, how close is the measured value to the true value/standard?
- TP + TN / total population

PRECISION = the proportion of positive identifications that were correct (e.g. PPV). Also, the consistency of repeated measurements, indicating how close the measurements are to each other, regardless of their proximity to the actual/real value.

ROC curve: (plot of true positive rate as a function of false positive rate)
- Random classifier is a straight diagonal line
- A better test is shifted up and to the left

How well did you know this?

Not at all

Perfectly

Measurement error types

Random (noise)
Systemic (bias)
Missing data (either random or bias)
Censoring (how was data excluded from analysis?)

How well did you know this?

Not at all

Perfectly

p-value

Probability of observing the results

Lower p-value (<0.05 typically) indicates observed data are unlikely under the null hypothesis, indicating significant evidence against it

How well did you know this?

Not at all

Perfectly

Statistical power
- Factors influencing?
- Relationship to type 2 error?

Probability that a test will correctly reject a false null hypothesis (detect a true effect)

Factors influencing power:
- Significance level (alpha)
- Sample size
- Effect size
- Variance

HIGH power REDUCES risk of type 2 error

POWER (typically 0.8) = 1- beta (beta=likelihood of type 2 error, typically 0.2)

How well did you know this?

Not at all

Perfectly

Power calculation considerations
- For non-inferiority trials
- For superiority trials

Must be done a priori (prior to study), not post-hoc (after study)

Cannot do power calculation without prior info (effect size, variance) (e.g. pilot study)

Non-inferiority vs superiority design will affect power calculations:

Non-inferiority trials in lieu of “equivalence” trials (would need infinite subjects for this): non-inferiority margin typically set as a specified upper bound of the 95% CI for a difference in outcomes between groups)
Superiority trials are typically placebo trials and require fewer subjects

How well did you know this?

Not at all

Perfectly

Standard deviation equation

Variance =

Standard error =

SD = square root [sum ((difference between each observation and the mean) ^2) / size of population]

Variance = SD ^2

Standard error = SD / square root (N)

*Appropriate for normal distributions, but not in skewing

How well did you know this?

Not at all

Perfectly

What is a limitation of observational studies?

Study These Flashcards

Higher risk of confounding variables

What is a limitation of experimental studies?

Study These Flashcards

May not always replicate real-world conditions

Effect size

Study These Flashcards

Clinical vs statistical significance
Cannot generally be estimated precisely unless population studied is very LARGE and statistical power is very HIGH
Rough estimate, frequently represented as a 95% CI
Overestimation of effect size is more likely than underestimation

Overestimation of effect size ___ as power decreases

Study These Flashcards

increases

Confidence interval

Study These Flashcards

Acknowledges that the mean of the sample population (considered a “point estimate”) will yield a different result than the mean of the population

Used to describe the likelihood that the actual population mean falls within a certain range

Error can be reduced by increasing sample size

Greater the confidence interval (99 vs 90%) the broader the interval

Totally unrelated to variance measurements. Assumes sample distribution will approximate a normal distribution

Relative risk (risk ratio)

Study These Flashcards

Probability of outcome in an exposed group vs UNexposed group

Measures association between exposure > outcome

RR>1: outcome is increased by exposure
RR<1: exposure is protective factor

Odds ratio

Study These Flashcards

Quantifies strength of association between 2 events (A and B)

Ratio of the odds of A in the presence of B : odds of A in the absence of B

Which statistic is used in case-control studies?

Study These Flashcards

Odds ratio

The __ approximates the relative risk when the likelihood of outcome is rare

Study These Flashcards

Odds ratio

What are the advantages/disadvantages of applying parametric statistics?

parametric statistics: data drawn from specific probability distribution (e.g. normal distribution) ex: t-test, ANOVA, regression provides estimates of mean, variance (population parameters) unreliable if assumptions are violated not suitable for all types of data

t-test can be used to determine... one-sample t-test independent samples t-test paired t-test

If the means of 2 sets of data are significantly different from each other one-sample t-test: compares mean of a single sample to a known mean (or expected value) to see if the sample mean significantly differs from the known mean independent samples t-test: compares the means of 2 independent groups to determine if there is a statistically significant difference between them paired t-test aka dependent t-test: compares means of 2 dependent groups, e.g. before/after intervention

What is a t-distribution? When is it used?

Used when the population SD is UNKNOWN and estimated from the sample Similar to normal distribution, but with fatter tails. Exact shape depends on sample size and degrees of freedom Becomes similar to normal distribution (at > 30 observations)

z statistic - What is it? - When can it used? - What is it used for?

Measures the number of SD a data point or sample mean is from the population mean Population distribution is normal SD should be known (if unknown, use t-statistic) Used in statistical hypothesis testing and CI estimation

What is non-parametric statistics?

Branch of statistics not solely based on parametric statistics (mean, variance) Non-parametric statistics: being either distribution free or having distribution with unspecified parameters (non-normal distributions, when assumptions violated) Includes descriptive statistics, statistical inference (categorical, ordinal data)

Non-parametric tests

- Wilcoxon rank-sum test - Wilcoxon signed-rank test - Mann-Whitney U test - Kruskal-Wallis test

Chi-square test Fisher's exact test

Compares categorical variables Fisher's exact test used with small sample sizes, 2x2, uses OR

What test used for multiple comparison testing?

ANOVA (parametric): comparing multiple groups to each other Only indicates that a significant difference between groups exists (or does not exist); does not reveal which groups differ

What is considered "significant" correlation?

Correlation coefficient > 0.3

Regression

Examines relationship between 1+ independent variable & dependent variable (how y changes based on x) Best fit line/curve Simple linear - single independent and single dependent variable Multiple linear - multiple independent variables Non-linear (includes logistic)

Logistic regression

A type of non-linear regression Dependent variable is categorical/binary "Adjusted" odds ratio (adjusts for specific confounding variables)

Statistics and Study Design Flashcards

(35 cards)