Biostatistics Flashcards

1
Q

Definition of random variables

A

A variable whose observed values may be considered outcomes of an experiment and whose values cannot be anticipated with certainty before the experiment is conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two types of random variables

A

1) Discrete variables (e.g. dichotomous, categorical)

2) Continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete Variables

A

1) Can only take a limited number of values within a given range
2) Nominal: Classified into groups in an unordered manner and with no indication of relative severity (e.g., sex, mortality disease presence, race, marital status)
3) Ordinal: Ranked in a specific order but with no consistent level of magnitude of differences between ranks (e.g NYHA functional class
4) COMMON ERROR: measure of central tendency - In most cases, means and standard deviations should not be reported with ordinal data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Variables

A

(Sometimes referred to as Counting Variables)

1) Continuous variables can take on any value within a given range.
2) Interval: Data are ranked in a specific order with a consistent change in magnitude between units; the zero point is arbitrary (e.g., degrees Fahrenheit)
3) Ratio: Like “interval” but with an absolute zero (e.g., degrees Kelvin, heart rate, blood pressure, time, distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive Statistics

A

Used to summarize and describe data that are collected or generated in research studies. This is done both visually and numerically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Visual methods of describing data

A

1) Frequency distribution
2) Histogram
3) Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Numerical methods of describing data: Measures of central tendency

A

a. Mean (i.e., average)
b. Median
c. Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean

A

i. Sum of all values divided by the total number of values
ii. Should generally be used only for continuous and normally distributed data
iii. Very sensitive to outliers and tend toward the tail, which has the outliers
iv. Most commonly used and most understood measure of central tendency
v. Geometric mean

SHOULD ONLY BE USED FOR NORMALLY DISTRIBUTED CONTINUOUS DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Median

A

i. Midpoint of the values when placed in order from highest to lowest. Half of the observations
are above and below.
ii. Also called 50th percentile
iii. Can be used for ordinal or continuous data (especially good for skewed populations)
iv. Insensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mode

A

i. Most common value in a distribution
ii. Can be used for nominal, ordinal, or continuous data
iii. Sometimes, there may be more than one mode (e.g., bimodal, trimodal).
iv. Does not help describe meaningful distributions with a large range of values, each of which
occurs infrequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Numerical methods of describing data

A

Measures of data spread and variability

a) Standard deviation
b) Range
c) Percentiles
d) Inferential Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard Deviation

A

i. Measure of the variability about the mean; most common measure used to describe the spread
of data
ii. Square root of the variance (average squared difference of each observation from the mean);
returns variance back to original units (nonsquared)
iii. Appropriately applied only to continuous data that are normally or near-normally distributed
or that can be transformed to be normally distributed
iv. By the empirical rule, 68% of the sample values are found within ±1 SD, 95% are found within
±2 SD, and 99% are found within ±3 SD.
v. The coefficient of variation relates the mean and the SD (SD/mean × 100%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Range

A

i. Difference between the smallest and largest value in a data set does not give a tremendous
amount of information by itself.
ii. Easy to compute (simple subtraction)
iii. Size of range is very sensitive to outliers.
iv. Often reported as the actual values rather than the difference between the two extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Percentiles

A

i. The point (value) in a distribution in which a value is larger than some percentage of the other
values in the sample. Can be calculated by ranking all data in a data set
ii. The 75th percentile lies at a point at which 75% of the other values are smaller.
iii. Does not assume the population has a normal distribution (or any other distribution)
iv. The interquartile range (IQR) is an example of the use of percentiles to describe the middle
50% values. The IQR encompasses the 25th–75th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inferential Statistics

A
  1. Conclusions or generalizations made about a population (large group) from the study of a sample of that
    population
  2. Choosing and evaluating statistical methods depend, in part, on the type of data used.
  3. An educated statement about an unknown population is commonly referred to in statistics as an
    inference.
  4. Statistical inference can be made by estimation or hypothesis testing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Beta

A
  • the probability of a Type II error

- the larger the Beta the lower the power of the study and the greater the chance of making a Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Probability of NOT making a Type II error

A

Power = 1 - Beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Type II error

A
  • when the study states there is not difference between the groups but in reality there is a difference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cost-minimization analysis

A

medications are considered to be equal, so just cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cost-effectiveness analysis

A
  • Natural units (mm Hg blood pressure, blood glucose)

- Meds are not considered equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Cost-benefits analysis

A
  • Dollars

- Outcomes are expressed as benefit:cost or as net cost or net benefit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cost-utility analysis

A
  • quality of adjusted life-year (QALY)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Kaplan-Meier curve

A
  • used to assess the “time to an event”

- many times this “event” is death or mortality, but in reality it can be any event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Cox Proportional Hazards Model

A
  • allows the allows the investigators to control for confounding variables or factors that may be influencing the endpoint of thier study in addition to the intervention (or independent variable) being studied
25
Q

corrrelation analysis

A
  • generates a correlation coefficient (r) which can range from -1 to +1
  • done to describe the “strength” of the relationship” between 2 variables
  • the closer to +1 the stronger the “correlation” or “relationship” between 2 variables.
  • it does NOT imply anything about causation
  • the correlation (r) does not have the ability to determine which of the two variables came first in existence to influence the other
26
Q

regressions analysis

A
  • done to provide the “predictability” of one variable on another variable
27
Q

Determining the appropriate statistical test for analyzing an endpoint

A

1) First ask “How many groups or samples” does this study have?
2) Second, “are these two groups (samples) related (i.e. same patient) or independent (i.e., not the same patients in both groups
3) Now determine the endpoint, “What is the endpoint”

28
Q

Chi-square

A
  • two independent samples

- nominal data

29
Q

Fisher’s exact

A
  • two independent samples

- nominal data

30
Q

Mann-Whitney U

A
  • Two independent samples
  • ordinal data
    OR
  • Two independent samples
  • continuous data
31
Q

Wilcoxon Rank Sum

A
  • two independent samples

- ordinal data

32
Q

Student’s t-test

A
  • two independent samples
  • continuous data
  • must have an adequate sample size (30-40)
  • data must be continuous and follow a normal distribution.
  • if data is skewed or has significant outliers then a Mann-Whitney U test should be done instead
33
Q

McNemar Test

A
  • related or paired samples

- nominal data

34
Q

Sign Test

A
  • related or paired samples

- ordinal data

35
Q

Wilcox Signed Rank

A
  • related or paired samples

- ordinal data

36
Q

Paired t-test

A
  • related or paired samples

- continuous data

37
Q

Chi-square for k independent samples

A
  • 3 or more independent samples

- nominal data

38
Q

Kurskal-wallis one way ANOVA

A

ANOVA = analysis of variance

  • 3 or more independent samples
  • ordinal data
39
Q

1-way ANOVA

A

ANOVA = analysis of variance

  • 3 or more independent samples
  • continuous data
40
Q

Cochran Q

A
  • 3 or more related amples

- nominal data

41
Q

Freidman 2 way ANOVA

A

ANOVA - analysis of variance

  • 3 or more related samples
  • ordinal data
42
Q

2-way ANOVA

A

ANOVA - analysis of variance

  • 3 or more related samples
  • continuous data
43
Q

Measures of Correlation with nominal data

A

contingency coefficient

44
Q

Measures of Correlation with ordinal data

A
  1. Spearman
  2. Kendal rank
  3. Kendal coe
45
Q

Measures of Correlation with ordinal data

A
  1. Pearson’s Correlation
46
Q

Nominal variables:

A
  • classified into groups in an unordered manner and with no indication of relative severity
    • sex (M/F)
    • mortality (yes/no)
    • disease state (present/absent)
47
Q

Ordinal variables

A
  • ranked in a specific order but with no consistent level of magnitude or difference between ranks
    • NYHA functional class I, II, III ,IV
48
Q

Standard Deviation

A
  • measure of the variable about the mean

- only applied to continuous data

49
Q

Emperical Rule of Standard Deviation

A
  • 68% within + or - 1 SD
  • 95% within + or - 2 SD
  • 99% within + or - 3 SD
50
Q

Coefficient of Variation (CV)

A
  • SD/mean x 100%
51
Q

Vairance

A

SD2 (squared)

52
Q

Interquartile range (IQR)

A
  • percentile that describes the middle 50%

- encompasses the 25th to 75th percentile

53
Q

Calculate the Standard Deviation

A

See page 141

54
Q

Calculate the SEM (standard error of the mean)

A

See page 141
SEM=SD/sqrt(n)
- quantifies uncertainty in the estimate of the mean, not variable in the sample

Application - 95% CI ~ 1.96 x SEM (or 2xSEM)

55
Q

How do we assess for Normal (Gaussian) Distribution

A
  • Median ~ Mean

- Formal test - Kolmogorov-Smirnov Test

56
Q

Parametric

A

Mean/SD define a normal distribution

57
Q

Null hypothesis

A

No difference between the comparator groups

58
Q

Alternative hypothesis

A

States that there is a difference

Tx A is not equal to Tx B