Statistics and Study Design Flashcards

1
Q

What are descriptive studies vs analytical studies?

A

Descriptive studies: describe characteristics of a population/phenomenon (descriptive surveys, case reports, cross-sectional studies)

Analytical studies: test hypotheses, determine associations (observational - survey, cohort, case-control vs experimental - RCT)
- Analytical studies employ inferential statistical tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Types
- Nominal
- Ordinal
- Interval
- Ratio scale

A

Nominal - includes dichotomous/binary (pregnancy yes/no) and categorical

Ordinal - ranked (e.g. OHSS severity)

Interval - units with linear relationship to each other but NO absolute zero (e.g. temperature in Celsius or Fahrenheit)

Ratio scale - absolute value CAN be zero (e.g. Kelvin temperature scale, weight in lbs, age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Line charts are useful for…
Bar charts are useful for…
Pie charts are useful for…
Scatter plots are useful for…

A

Line charts - Seeing trends over time
Bar charts - Categories
Pie charts - Showing parts of a whole
Scatter parts - showing data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What defines a normal distribtion?

A

Observations are independent with “thin” tails (few extremes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Central limit theorem

A

sums of independent events will converge towards a normal distribution even if their underlying distributions are not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define SD in a normal distribution

A

1 SD - 68.2%
2 SD - 95.4%
3 SD - 99.6%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Scope of data collection
- Census
- Sample or “study population”

A

Census - data compiled about the total population (e.g. SART)

Sample - process of drawing sample contains an element of randomness and may not truly represent the overall population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

law of large numbers

A

if sample is large enough, the distribution of sample approximates the distribution of the population from which it was derived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Type 1 error vs Type 2 error

A

Type 1: null hypothesis is falsely rejected (false positive), alpha “arrogant”

Type 2: null hypothesis fails to be rejected and an actual relationship between populations is missed (false negative), beta “bashful”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistical calculations:
- Sensitivity
- Specificity
- PPV
- NPV
- Accuracy
- Precision
- ROC curve

A

SENSITIVITY (true positive rate): test’s ability to correctly identify those with the condition:
- True positive / (true positive + false negative = or total positives)

SPECIFICITY (true negative rate) = true negative / (true negative + false positive = or total negatives)

PPV: TP/(TP + FP = total test positives) = probability that a positive test accurately indicates presence of the condition

NPV: TN/(TN + FN = total test negatives)

ACCURACY = the proportion of all test results that are correctly identified, both positive and negative. Represents the overall effectiveness of the test across all outcomes. Also, how close is the measured value to the true value/standard?
- TP + TN / total population

PRECISION = the proportion of positive identifications that were correct (e.g. PPV). Also, the consistency of repeated measurements, indicating how close the measurements are to each other, regardless of their proximity to the actual/real value.

ROC curve: (plot of true positive rate as a function of false positive rate)
- Random classifier is a straight diagonal line
- A better test is shifted up and to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measurement error types

A
  • Random (noise)
  • Systemic (bias)
  • Missing data (either random or bias)
  • Censoring (how was data excluded from analysis?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p-value

A

Probability of observing the results

Lower p-value (<0.05 typically) indicates observed data are unlikely under the null hypothesis, indicating significant evidence against it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Statistical power
- Factors influencing?
- Relationship to type 2 error?

A

Probability that a test will correctly reject a false null hypothesis (detect a true effect)

Factors influencing power:
- Significance level (alpha)
- Sample size
- Effect size
- Variance

HIGH power REDUCES risk of type 2 error

POWER (typically 0.8) = 1- beta (beta=likelihood of type 2 error, typically 0.2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Power calculation considerations
- For non-inferiority trials
- For superiority trials

A

Must be done a priori (prior to study), not post-hoc (after study)

Cannot do power calculation without prior info (effect size, variance) (e.g. pilot study)

Non-inferiority vs superiority design will affect power calculations:

  • Non-inferiority trials in lieu of “equivalence” trials (would need infinite subjects for this): non-inferiority margin typically set as a specified upper bound of the 95% CI for a difference in outcomes between groups)
  • Superiority trials are typically placebo trials and require fewer subjects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard deviation equation

Variance =

Standard error =

A

SD = square root [sum ((difference between each observation and the mean) ^2) / size of population]

Variance = SD ^2

Standard error = SD / square root (N)

*Appropriate for normal distributions, but not in skewing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a limitation of observational studies?

A

Higher risk of confounding variables

17
Q

What is a limitation of experimental studies?

A

May not always replicate real-world conditions

18
Q

Effect size

A
  • Clinical vs statistical significance
  • Cannot generally be estimated precisely unless population studied is very LARGE and statistical power is very HIGH
  • Rough estimate, frequently represented as a 95% CI
  • Overestimation of effect size is more likely than underestimation
19
Q

Overestimation of effect size ___ as power decreases

A

increases

20
Q

Confidence interval

A

Acknowledges that the mean of the sample population (considered a “point estimate”) will yield a different result than the mean of the population

Used to describe the likelihood that the actual population mean falls within a certain range

Error can be reduced by increasing sample size

Greater the confidence interval (99 vs 90%) the broader the interval

Totally unrelated to variance measurements. Assumes sample distribution will approximate a normal distribution

21
Q

Relative risk (risk ratio)

A

Probability of outcome in an exposed group vs UNexposed group

Measures association between exposure > outcome

RR>1: outcome is increased by exposure
RR<1: exposure is protective factor

22
Q

Odds ratio

A

Quantifies strength of association between 2 events (A and B)

Ratio of the odds of A in the presence of B : odds of A in the absence of B

23
Q

Which statistic is used in case-control studies?

A

Odds ratio

24
Q

The __ approximates the relative risk when the likelihood of outcome is rare

A

Odds ratio

25
Q

What are the advantages/disadvantages of applying parametric statistics?

A

parametric statistics: data drawn from specific probability distribution (e.g. normal distribution)

ex: t-test, ANOVA, regression

provides estimates of mean, variance (population parameters)

unreliable if assumptions are violated

not suitable for all types of data

26
Q

t-test can be used to determine…

one-sample t-test
independent samples t-test
paired t-test

A

If the means of 2 sets of data are significantly different from each other

one-sample t-test: compares mean of a single sample to a known mean (or expected value) to see if the sample mean significantly differs from the known mean

independent samples t-test: compares the means of 2 independent groups to determine if there is a statistically significant difference between them

paired t-test aka dependent t-test: compares means of 2 dependent groups, e.g. before/after intervention

27
Q

What is a t-distribution?
When is it used?

A

Used when the population SD is UNKNOWN and estimated from the sample

Similar to normal distribution, but with fatter tails. Exact shape depends on sample size and degrees of freedom

Becomes similar to normal distribution (at > 30 observations)

28
Q

z statistic
- What is it?
- When can it used?
- What is it used for?

A

Measures the number of SD a data point or sample mean is from the population mean

Population distribution is normal
SD should be known
(if unknown, use t-statistic)

Used in statistical hypothesis testing and CI estimation

29
Q

What is non-parametric statistics?

A

Branch of statistics not solely based on parametric statistics (mean, variance)

Non-parametric statistics: being either distribution free or having distribution with unspecified parameters (non-normal distributions, when assumptions violated)

Includes descriptive statistics, statistical inference (categorical, ordinal data)

30
Q

Non-parametric tests

A
  • Wilcoxon rank-sum test
  • Wilcoxon signed-rank test
  • Mann-Whitney U test
  • Kruskal-Wallis test
31
Q

Chi-square test
Fisher’s exact test

A

Compares categorical variables

Fisher’s exact test used with small sample sizes, 2x2, uses OR

32
Q

What test used for multiple comparison testing?

A

ANOVA (parametric): comparing multiple groups to each other

Only indicates that a significant difference between groups exists (or does not exist); does not reveal which groups differ

33
Q

What is considered “significant” correlation?

A

Correlation coefficient > 0.3

34
Q

Regression

A

Examines relationship between 1+ independent variable & dependent variable (how y changes based on x)

Best fit line/curve

Simple linear - single independent and single dependent variable

Multiple linear - multiple independent variables

Non-linear (includes logistic)

35
Q

Logistic regression

A

A type of non-linear regression

Dependent variable is categorical/binary

“Adjusted” odds ratio (adjusts for specific confounding variables)