Stats Flashcards

1
Q

Why do we need statistics?

A

Statistics allows us to draw from the sample, conclusions about the general population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Central Limit Theorem

A

The sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Gaussian Distribution

A

1SD - 68%
2SD - 95%
3SD - 99.7%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Parametric Statistics

A

A class of statistical procedures that rely on assumptions about the shape of the distribution (assume normal) in the underlying population and about the form or parameters (means, SD) of the assumed distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-Parametric

A

A class of statistical procedures that do not rely on assumptions about the shape or form of the probability from which the data were drawn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measures of Location

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measures of Dispersion

A
Range
Variance
Standard Deviation
Standard Error
Confidence Interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Left Skewed (Negative)

A

Mean –> Median –> Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Right Skewed (Positive)

A

Mode –> Median –> Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Boxplots

A

Largest observed Value that is not an outlier
75th percentile
Median
25th percentile
Smallest observed value that is not an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Range

A

The difference between the largest and smallest sample values. (Without outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variance

A

The average of the square distance of each value from the mean
S^2 = [Sigma(X-M)^2] / [N-1]

Reliable but not user friendly so not often reported.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Deviation

A

Tells you how tightly each sample is clustered around the mean.
Square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard Error

A

Measure of how far the sample mean is away from the population mean.

SEM = SD/sqrt(N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When to use SD vs SEM?

A

SD - If the scatter is caused by biological variability and you want to show that variability.

SEM - If the variability is caused by experimental imprecision and you want to show the precision of the calculated mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence Interval

A

An estimate of the range that is likely to contain the true population mean.

CI = X +/- (SEM x Z)

X: sample mean ; Z: 1.96, critical value for normal distribution

If it includes zero, invalid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Outliers

A

Grubb’s Test
Z = (mean - value)/SD
If greater than Z tab, it is an outlier. If less than Z tab, must keep it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Precision

A

The degree to which repeated measurements under unchanged conditions show the same results. High precision lowers SD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Accuracy

A

The degree of closeness of measurements of a quantity to that quantity’s true value. High accuracy reflects the true population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Repeatability

A

Same as precision (if you can repeat it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Reproducibility

A

The ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Sources of Variability

A

Random error

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Null Hypothesis (Ho)

A

States that there is NO difference between groups. A Study is designed to disprove this assertion by testing for a statistically significant difference b/w A and B (this is called the alternate hypothesis)

Presumed true until statistical evidence proves otherwise.

Ho: u1 = u2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Alternative Hypothesis (Ha)

A

There IS a treatment difference between groups. If you fail to accept (or reject) Ho, you are accepting the alternative hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Type I Error

A

alpha
Occurs when Ho is true, but it is rejected in error.
False positive.
When a p-value is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Type II Error

A

beta
Occurs when Ho is false, yet it is accepted in error.
False negative.
Power = 1 - beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

z - statistic

A

A z-test is any statistical test for which the distribution of the test statistic can be approximated by a normal distribution. Because of the central limit theorem, many test stats are approximately normally distributed for large samples (n>30).

z = (x-u)/ [SD/sqrt(n)]

Then you compare it to a z table of critical values and find the probability of getting greater than or equal to a z value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

t - statistic

A

Similar to z-test.

Use when n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

p - value

A

The probability that the result obtained was due to chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Power

A

The probability that the test will reject the null hypothesis when the null hypothesis is false (avoiding type II error).

Power = 1 - beta

A higher statistical power means that we can be more certain that the null hypothesis was correctly rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Student t - test

A

N

N ttab , then reject Ho and conclude that the sample means are significantly different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Degrees of Freedom

A

If you have an N of 4, you have 3 degrees of freedom
(N-1)

df = 2N-2 ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Paired t-test

A

The observed data are from the same subject or from a matched subject and are drawn from a population with a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Unpaired t-test

A

The observed data are from two independent, random samples from a population with a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

One tailed t-test

A

Will test either if the mean is significantly greater than x or if less than x, but not both. Provides more power to detect an effect in one direction by not testing the effect in the other direction.

36
Q

Two tailed t-test

A

More robust than 1 tailed

Will test both if the mean is significantly greater than or less than x.

37
Q

ANOVA

A

ANalysis Of VAriance

  • To compare three or more means
  • We use sum of squares

F = MS(bg) / MS (wg) –> F-statistic
signal:noise ratio
The higher the F value, the more likely you can reject Ho that the means are equal. Table will give you corresponding p value?

38
Q

One way ANOVA

A

1 measurement variable and 1 nominal variable

Ex: measure glycogen content for multiple samples of heart, liver, kidney, lung, etc.

39
Q

Two way ANOVA

A

1 measurement variable and 2 nominal variables

Ex: measure response to 3 different drugs in men and women. Drug treatment is one factor and gender is the other.

40
Q

Post Hoc analyses

A

ANOVA only tells us that the smallest and largest means likely differ from each other. What about other means?
- run a post hoc test!

Only used if Ho is rejected.

41
Q

Mann-Whitney U Test

A

Non-parametric alternative to two-sample t-test
- Uses rank of measurement instead of actual measurement

  • Calculate and look up value in table. If calc
42
Q

Pearson Correlation Coefficient (r)

A

Measure of the linear correlation between two variables.
1 - positive correlation
0 - no correlation
-1 - negative correlation

43
Q

Linear Regression

A

Goal is to create a line that minimizes the sum of the squares of the vertical distances of the points from the line.

It assumes that your data is linear, and finds the slope and intercept that make a straight line that best fits your data.

44
Q

Categorical Data

A

No mean, median, mode, or normal distribution.

Data divided into groups (Yes/no)

Discrete data? (nominal, ordinal?)

45
Q

Contingency Table

A

To measure associations bw categorical variables

           Cat 1 
C    a        b
a
t     c        d 
2
  • Assumes that all data are independent (each person fits into one box only)
  • Can be any size
46
Q

Chi Square

A
  • Measures the observed frequencies and compares them to the expected.

How to calculate:
1. Expected measures in each box are calculated
- (Total in row * total in column) / TOTAL
2. Calculate Chi Square
X2 = Sum (obs - exp)^2 / exp
3. Compare to table
- Calculated X2 should be > critical value to reject Ho

47
Q

Fischer’s Exact

A

X2 is not valid for 2x2 contingency tables with very small samples. Use Fischer’s exact

48
Q

Chi Square Assumptions

A
  • Data are frequency data
  • There is an adequate sample size
  • Measures are independent of each other
49
Q

Odds Ratio

A

Odds - Probability of the event occurring compared with the probability that it will not occur.

Odds ratio is the ratio of 2 odds. Used mostly in case-control studies. Measure of association between an exposure and an outcome.

OR = (a/c)/(b/d) = ad/bc
= odds that a case is exposed/ odds that control is exposed

OR = 1 (No difference in odds of exposure)
OR > 1 (Increased odds of exposure)
OR

50
Q

RCT

A

Bias

  • Randomization removes selection bias
  • Investigator bias (use blinding)
  • Subject bias (use blinding)

Things to control:

  • Diet
  • Health changes
  • Non compliance
  • Drop outs
  • Lifestyle
  • Events
51
Q

Investigator Bias

A

Allocation concealment: ppl randomizing individuals are blinded as to which subjects go into which group

Investigator blinding

52
Q

Subject Bias

A

Hawthorne Effect: People change behavior in a study

Subject blinding

53
Q

Case Control Study

A
  • Compare patients who have a disease with patients who do not, and look back retrospectively to compare how frequently the exposure to a risk factor is present

Threats to internal validity:

  • Control group selection (matching)
  • Recall bias
  • Can’t determine risk directly (use odds ratio)

–> look at people who already have disease and determine odds of exposure. What are the odds that the diseased group was exposed?

54
Q

Cohort Study

A

A cohort is a group of people who share a common characteristic or experience within a defined period. Follows the cohort over time and the outcomes are compared to a subset of the group who were not exposed to the intervention.
- Incidence studies

–> measure how many people develop disease out of a total. What is the relative incidence of disease in both groups?

55
Q

Risk

A

= number subjects with unfavorable event in arm/total number subjects in arm
(Absolute risk)

–> makes risks/benefits look SMALLER

56
Q

Relative Risk (RR)

A

risk in treatment/risk in control
(Exposed/nonexposed)

RR = [a/(a+b)] / [c/(c+d)]

R = 1 : no difference in risk
R 1 : more events in tx group vs control

–> makes risks/benefits look BIGGER

57
Q

Exposure

A

Can occur at a single point in time or over a period of time.

Characterizing exposure:

  • Ever been exposed
  • Current dose
  • Largest dose taken
  • Total cumulative dose
  • Years of exposure
58
Q

Attributable Risk

A

The additional incidence of disease related to exposure, taking into account the background incidence of disease from other causes.

  • Implies that the risk factor is a cause and not just a marker

Also called RISK DIFFERENCE - difference bw 2 absolute risks

ARR = Risk control - Risk treatment

59
Q

Relative Risk Reduction (RRR)

A

By how much the treatment reduced the risk of bad outcomes relative to the control group.

RRR = 1 - RR
= [Risk control - Risk treatment] / risk control

60
Q

NNT

A

NNT = 1/ARR (in decimal) OR 100/ARR (in percent)

61
Q

Sensitivity

A

Probability of testing positive, given patient has disease

= a/(a+c)

Overall Accuracy = (a+d)/(a+b+c+d)

62
Q

Specificity

A

Probability of testing negative, given that patient does not have disease

= d/(b+d)

Overall Accuracy = (a+d)/(a+b+c+d)

63
Q

Prevalence

A

Proportion of a group of people possessing a clinical condition or outcome at a given point in time.

= (a+c)/(a+b+c+d)

Other names

  • prior probability
  • pretest probability
64
Q

Positive Predictive Value

A

Probability of having dz, given a positive test

= a/(a+b)

65
Q

Negative Predictive Value

A

Probability of not having dz, given negative test

= d/(c+d)

66
Q

Likelihood Ratio

A

Positive
LR+ = Sensitivity / (1 - Specificity)

Negative
LR- = (1 - Sensitivity) / Specificity

The probability of that test result in people with the disease divided by probability of the result in people without the disease.

Ratio expresses how many time more (or less) likely a test result is to be found in diseased, compared with non-diseased, people.

67
Q

Posttest Odds

A

Pretest Odds x Likelihood ratio

68
Q

The 4As of EBM

A

Ask
Acquire
Appraise
Apply

69
Q

New EBM pyramid

A
Systems (Not in effect--> put sx in computer) 
Summaries (Up to date) 
Synopses of Syntheses (Dynamed) 
Syntheses (Systematic reviews) 
Synopses of Studies (Journal club) 
-------
Individual studies (primary studies)
70
Q

Berkson’s Bias

A

If the sample had been taken from a hospitalized population.
- Systematically higher exposure rate among hospital patients, distorting odds ratio, etc.

71
Q

Review Article

A
  • Synthesize results and pull together major findings

Strength
- Provide good discussion from experts

Weakness
- Subject to bias of author

72
Q

Systematic Review

A

Reviews of the literature that follow a prescribed protocol to remove bias

Goals

  • Provide up to date summary of all good published lit
  • Assimilate large amounts of data
  • Objective collation of results
  • Reliable recs
73
Q

Elements of a Systematic Review

A
  1. Define a specific question (PICO)
  2. Find all relevant studies (pub and unpub)
    • Inc sensitivity and reduce bias
  3. Select strongest studies
    • RCTs/no obs?
  4. Describe scientific strength of selected studies
  5. Determine if quality is assc with results
    • (4&5) examine internal validity
    • Review for bias
    • Review includes > 1 researcher
  6. Summarize studies in figures (forest plots)
  7. Determine if pooling of studies (meta analysis) is good
  8. If yes, calculate summary effect size and CI
  9. Identify reasons for heterogenity if present
74
Q

Meta Analyses

A
  • If results are similar, can be pooled and analyzed together
  • Requires study question similarity

Results weighted by sample size

  • Fixed effects model: when studies ask the same Q
  • Random effect model: assumes studies are asking diff Qs but somewhat similar
75
Q

Systematic review Sources of Bias

A

Study bias

  • Biased samples
  • Berkson’s bias
  • Subject/investigator bias
  • Author bias
  • > 1 researcher to prevent bias
  • Publisher bias
76
Q

Web of Causation

A
  • To counter reliance on single - cause model
  • Includes biological, behavioral, and social factors
  • Graphical depiction
77
Q

Cause - Sufficient

A

With it the effect will result regardless of the presence or absence of other factors

78
Q

Cause - Necessary

A
  • Without it, the effect will not occur

- Need other factors for the event to occur

79
Q

Single Cause Model

A

Koch’s Postulates

80
Q

Bradford Hill Criteria

A
  1. Temporality
    - Exposure precedes dz
    - ONLY required criteria on the list for causality
  2. Strength of association
    - Strong ass doesnt ALWAYS –> causality
    - Week ass doesnt negate
  3. Dose response
    - lack doesn’t negate
  4. Reversibility
    - lack doesn’t negate
  5. Consistency with other knowledge
    - Consistent results across study designs
  6. Biological plausibility
    - lack doesn’t negate
  7. Specificity of the associaiton
    - exposure ass with one specific dz outcome
    - weakest of criteria
    - many dz have multiple causes
  8. Analogy
    - cause is analogous to other established relationships

(9. Coherence)

The more criteria fulfilled, the stronger the case for causality.

81
Q

Ecological Studies

A

Level of analysis is groups rather than individuals
- Aggregate studies

Fallacy

  • Ascribing group characteristics to individual members of that group
  • Assumption only valid if exposure is homogenous within groups
  • We don’t know whether the individuals had high rates
  • All we have is average values of smoking levels and rates of lung ca mortality in each country
82
Q

Components of a Medical Paper

A
  1. Abstract
  2. Introduction
    - Study design, subjects, sampling, sample size, outcome, stats
  3. Methods
  4. Results
  5. Discussion
  6. References
83
Q

Cross Sectional Study

A

A study that examines the relationship between diseases and other variables of interest at a single point in time.

Determine your sample first and see what exists in that sample.

Prevalence

84
Q

Peer Review Process

A
  1. Author submits manuscript
  2. Editor reviews
    - Rejects manuscript
    - Assigns to reviewers for external eval
  3. Reviewers review
    - Make recs to editor
  4. Editor makes decision
    - Rejection, modification, publication
  5. Publisher publishes finalized manuscript
85
Q

Levels of Evidence

A
  1. Systematic review of randomized trials
  2. Randomized trials
  3. Non-randomized controlled cohort/follow-up study
  4. Case-series, case-control, or historically controlled studies
  5. Mechanism-based reasoning
86
Q

How to critically appraise a medical paper?

A
  • Choose appropriate tool
  • Answer all questions in the tool
  • Draw your own conclusions
  • May stop if the quality of a paper is of concern
  • RWJMS general article review sheet
  • Oxford centre for EBM
  • EQUATOR network
87
Q

How to Report trials???

A

CONSORT - Consolidated standards of Reporting Trials

  • 25 item checklist
  • A flow diagram

OR???

The equator network to report things.