Review (2) Flashcards

1
Q

normal distribution

A
  • symmetrical bell shape
  • data likely to be dependent on many, often small, random factors
  • if data sets are large, chances are that a parametric test will apply since the data should follow a symmetrical bell curve
  • small data sets may be skewed to one side and thus the parametrics may not apply (use non-parametric counterparts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
properties of a normally distributed variable:
mean
1SD
2SD
3SD
A

■ mean = median
■ 1SD (1 Z score) on each side of the mean encompasses 34% of all values (68% for both sides)
■ 2SDs (2 Z scores) encompass 47.7% or (95% for both sides)
● also described as 97.5th percentile
■ 3SDs encompass 49.8% or (99.6% for both sides)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Central Limit Theorem

A

● Under certain conditions, the sum of a large number of random variables will have an approximately normal distribution
● Other distributions can be approximated by normal distribution (chi-square, t-test, etc)
■ median separates the lower and upper half of observations
■ it provides an estimate of the uncertainty of the mean of a population based upon a sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Central Limit Theorem: Properties

A

■ distribution of sample means will be approximately normal regardless of whether the distribution of the original values in the population is normal or not
■ mean of the means of all possible samples equals the population mean
■ the standard deviation of the distribution of the means of all samples (standard error of the mean) is equal to the standard deviation of the population divided by the square root of the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Difference between standard deviation and standard error (of means)

A

■ standard error is NOT a measure of variability

■ standard deviation is a measure of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Degree of Freedom

A

● in standard deviation: DoF = n -1 for a sample, DoF = n for a population
● one less than the total number of values in a sample
● definition: number of values that are free to vary in a sample
○ if there are a 100 values and you know the mean and standard deviation, if you know 99 values you can determine the 100th. Therefore, it is said that there is 1 fixed value and the rest are variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Degrees of freedom for ANOVA test

A

○ DFtotall=total number of observations - 1
○ DFbetween groups=total number of groups -1
○ DFwithin groups=number of groups x (number of observations per group -1)
○ DFtotal=DFwithin groups+DFbetween groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Underlying concept behind tests of significance

A

■ differences can be “explainable”
● effect of being in a particular group
■ differences can be “unexplainable”
● due to natural variation or unmeasured group difference
■ if explained variation is significantly higher than unexplained variation, we can conclude that the groups really are different
■ based upon the ratio of explained variation to unexplained variation
● if ratio is large, groups are statistically different
● if small, groups are not statistically different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parametric vs. Non-Parametric

A

parametric: follows a normal distribution
● usually comparing group means
non-parametric: does not follow a normal distribution
● usually comparing group medians

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Non-parametric test advantages and disadvantages

A

● Non-parametric test advantages
○ fewer assumptions to fulfill
■ variables do not have to follow a distribution
○ useful for dealing with outliers
○ intuitive and easier to do by hand with smaller samples
○ can be used for categorical data
● Non-parametric test disadvantages
○ less efficient than parametric counterpart
■ following a distribution allows you to take advantage of its properties
■ lack of power
○ hypothesis test over effect estimation
○ too many ties problematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Null vs. alternative hypotheses

A

● Null Hypothesis (Ho ): hypothesize that there is no difference between the two groups
○ Four elements of the sound clinical trial: PICO
■ Patient population or problem
■ Intervention (Tx, usually)
■ Comparative Intervention (if necessary)
■ Outcomes (precisely defined)
● Alternate Hypothesis (H1): hypothesize that the two groups are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p-value

A

probability of observing group difference if differences occurred by natural variation (p-value)
if this probability is sufficiently low, then conclude there is a group effect
p-value cutoff depends on question being asked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When p value is less than alpha, then we can

alpha

beta

A

reject the null hypothesis

Alpha - “the chance we are willing to accept of being wrong by finding a difference between two treatments when none really exists.” (often alpha = 0.05)

Beta - “the chance we are willing to accept of being wrong by not finding a difference between two treatments when there really is a difference”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

P-value fallacy

ex. if we assume a p-value of 0.04:
correct interpretation
incorrect interpretation

A

Assuming that the p-value equals the chance of there being no difference b/t Tx
really → p-value = chance of obtaining difference in means

Correct interpretations of p value: P(results | Ho) - “Assuming the null hypothesis is true, there is a 4% chance of obtaining the difference in means we observed or a greater difference in the trial”
therefore, it is not likely to receive the measured difference between the groups, if the groups were really equivalent. Therefore, we can infer that the two groups are different

Correct interpretations of p value: P(results | Ho) - “Assuming the null hypothesis is true, there is a 4% chance of obtaining the difference in means we observed or a greater difference in the trial”
therefore, it is not likely to receive the measured difference between the groups, if the groups were really equivalent. Therefore, we can infer that the two groups are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Confidence Interval

A

○ measure of precision of estimate of the difference based on the results of the study
○ measure of the magnitude of the difference
○ true diff b/t two population means lies in an interval
■ 95% CI = diff in sample means will lie in this interval 95 out of 100 times
○ size of interval depends on difference b/t sample means, level of significance and corresponding t value, & st error of the diff in sample means
○ “If a procedure were to be repeated on multiple times (repeated sampling), the results should fall within the interval 95% of the time”
■ estimate of precision
○ “We can be 95% sure that the true value will fall within this range”
○ statistic that quantifies the uncertainty in measurement
○ If the CI of RR excludes 1, then the RR is statistically significant, if it includes 1, then the RR is statistically non significant
○ 90% CI is narrower than 95%CI
○ larger sample size → narrower CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how CI is related to p-value

advantages of CI over p-value

A

-How is it related to p-value
more versatile than p-values
if CI → no diff; p-value → no diff

● Advantages of confidence interval over p-value
○ Provide a measure of precision and magnitude of estimates.
○ Less prone to misinterpretation.
○ In general, preferred way to express statistical significance of results of studies of therapies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PICO Method → Developing a clinical question

A
P = Patient population or problem
I = Intervention (usually a Tx)
C = Comparison Intervention (if necessary)
O = 1+ precisely defined outcomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sample size

A

= 16/ (phi)2
An approximate sample size for each group is obtained with the formula n = 16/square of the non-centrality parameter.
The non-centrality parameter is the minimum magnitude of effect worth detecting divided by the standard deviation of the outcome variable squared= (magnitude effect/ standard deviation)^2.
If we want to detect a difference of 10 and our sd = 14, our phi = 10/14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Type 1 vs 2 Error

A

Type 1 Error: rejecting the null hypothesis when there actually is no difference between 2 Tx (false positive)
alpha value = willingness to accept the possibility of making a Type 1 error (usually 0.05)
Type 2 Error: accepting the null hypothesis when there actually is a difference between 2 Tx (false negative)
beta value = willingness to accept possibility of making a Type 2 error (usually 0.20)

20
Q

Study Power

A

● Chance of finding a diff b/t Tx based on sample studied when one really does exist in the population
○ esp important in studies where you conclude that there is no difference b/t Tx
○ low power → important difference may have been missed
● Ratio of Magnitude of Effect to Variability of outcome variable in population
○ aka noncentrality parameter (phi = mag of effect/st dev of outcome variable)
○ larger phi → higher power to detect
● 1 – β is known as the power of a study.
○ Beta - “the chance we are willing to accept of being wrong by not finding a difference between two treatments when there really is a difference”
○ 1 – β is the probability of detecting a difference when one exists.
○ conventionally β = 0.20.
● Determinants of study power
○ the larger the sample size:
■ higher the study power
■ more the sample resembles the population (fewer errors in inference)
■ Researchers usually decide upon a level of power and determine the sample size necessary to achieve it
● If this is helpful, think of Power in terms of Super Heroes. The more power you have, the greater chance of making a difference.

21
Q

Magnitude of Effect

greater magnitude of effect → (smaller/greater) power to detect
greater variation in outcome → (more/less) power to detect
larger sample size → (higher/lower) power

A

● amount of difference you wish to detect
○ smallest magnitude of diff. that is considered clinically important
○ represented by delta
● Variability of outcome
○ variation in outcome variable range
○ represented by st dev sign

greater
less
higher

22
Q

Study Bias

examples (5)

A

○ systematic tendency to produce an outcome that differs from the underlying truth
○ systematic error in a study that distorts results in a non-random way

publication
referral
selection
recall
lead-time
23
Q

study bias: selection

  • probability sample
  • simple random sampling
  • stratified random sampling
  • cluster sampling
  • non probability sampling
    • consecutive
    • convenience
A

● Volunteer
● Sampling
○ selecting members of the accessible part of the target population
○ probability sample - uses techniques to ensure that each member of the population of interest has a known, specific chance of being selected for the study
○ simple random sampling - using random numbers to select the corresponding patient
○ stratified random sampling - divide population into strata (specific groups of interest, such as race) and then use simple random sampling within each strata
○ cluster sampling - takes advantage of natural groupings of patients and is useful when the population of interest is spread geographically
■ if we are studying hospitals across the country
■ list the hospitals and randomly select a hospital
■ then randomly select patients within those hospitals
○ non probability sampling - not random sampling, uses methods like convenience and consecutive sampling
■ consecutive sampling - simply trying to recruit every patient who meets the inclusion and exclusion criteria
■ convenience sampling - taking the patients most readily available

24
Q

study bias: lead-time

A

● form of selection bias
● statistical distortion of results which can lead to incorrect conclusions about the data
people who are screened earlier may be diagnosed earlier than normal population → this changes the length of time between diagnosis and death and may lead to inappropriate conclusions
● can occur when the lengths of intervals are analyzed by selecting intervals that occupy randomly chosen points in time or space
○ favors longer intervals, thus skewing data
● can affect data on screening tests for cancer
○ faster growing tumors generally have a shorter asymptomatic phase than slower-growing tumors are also often associated with a poorer prognosis
○ slower-growing tumors are hence likely to be over-represented in screening tests
○ this can mean screening tests are erroneously associated with improved survival, even if they have no actual effect on prognosis

25
Q

methods to minimize study bias

A

● Altering allocation, the process by which a patient receives one treatment or another in a clinical trial
● Randomization
● Blinding
○ single, double
○ not always possible though
● Allocation concealment
○ always possible
○ keeps clinicians and participants unaware of upcoming assignments to groups (treatment vs. control, etc)
○ those responsible for recruiting people into the trial are unaware of the group to which a participant will be allocated, should that subject agree to be in the study
○ avoids both conscious and unconscious selection bias of patients into the study
○ unlike randomization it can always be accomplished in RCT
○ without it, even properly developed random allocation sequences can be subverted

26
Q

Confounders
Examples
Methods to minimize them

A

○ A third variable that is related to both the exposure and outcome
■ distorts observed relationship
■ can identify a spurious association
■ can mask/hide a real association
○ Can interfere with interpretation of observed exposure/outcome relationship
○ Confounders are often unequally distributed among the groups being compared

○ Examples
■ Instrumentation
● Change in calibration of measuring instruments over the course of the study
■ Regression to the Mean
● tendency of participants selected for extreme scores to be less extreme on retest
■ Selection
● any factor that creates groups that are not equal at the start of the study

○ Methods to minimize them:
■ Randomized Controlled Trials are less likely to be distorted by confounders
■ Can use special techniques to account for confounders
● multivariable logistic regression

27
Q

Case-control studies

A

○ initial investigation of cause-effect relationship
■ but does not give cause and effect
○ quick and cheap
○ don’t need many people
○ looks backward
○ cannot be used to determine incidence of disease
○ potential biases: recall and selection
○ potential confounders
○ subjects are defined by outcomes, not exposures
■ reverse of cohort study

cases (diseased) –> exposed or not exposed
controls (no disease) –> exposed or not exposed

28
Q

Cohort study

A

○ people were exposed to something, outcomes are measured
○ best way to determine risk relating to exposure to a harmful substance
○ looks forward
○ researcher may begin to work on a cohort study after the cohort has been assembled by someone else, even for another purpose
○ Important
■ assemble a cohort
■ Researcher may begin study at any time
● as long as the baseline information collected during a specific time period and
● exposure was determined before measuring outcome
■ Subjects defined by the exposure, NOT the outcome
● reverse of case-control

exposed –> disease or no disease
not exposed –> disease or no disease

29
Q

Similarities & differences b/ case control & cohort sutdies

A

● Case control is easier than cohort study
● case control is ideal for rare diseases
○ cohort study would require too many patients - may not be feasible
● case control study ideal for diseases that takes many years to develop
○ cohort study would require too long to complete

30
Q

Odds ratios for

                Disease +	Disease - Exposure +	A	        B Exposure -   	C	        D
A

○ It is the ratio of odds of exposure in the diseased vs non-diseased
○ Odds Ratio = (odds that a case was exposed) / (odds that a control was exposed)
■ odds that a case was exposed = A/C
■ Odds that a control was exposed = B/D
■ OR = (A/C)/(B/D) = AD/BC
○ used in case control, which starts with the “disease” which has already happened when you start the study
■ we have no information on the incidence of disease in the exposure or non-exposure group
■ we can mathematically calculate RR but that is conceptually wrong to do
○ can also be used in cohort study
○ good estimate of the RR, but it is different
■ may overestimate RR, if disease is common
○ the odds of an event are defined as “the ratio of the number of ways the event can occur to the number of ways the event cannot occur
■ Odds = Risk / (1- Risk)
■ Risk = Odds / (1+Odds)
○ Interpretation:
■ OR = 1: equal odds of the disease
■ OR = x
● if X>1: x fold greater the odds of the disease
● if X< 1: x fold decrease in the odds of the disease

31
Q

Relative risks

A

the ratio of the risk among the exposed group to the risk of the non-exposed group
RR = (A/(A+B))/(C/(C+D))
based upon the incidence of an event given that we already know the study participant’s exposure status
appropriate for a cohort study, NOT case-control
Interpretation:
RR = 1, the same risk of disease between exposure and non-exposure
RR=x
if x > 1: X fold increase in risk of disease
if x < 1: x fold decrease in risk of disease
measure of the strength of association and the possibility of a causal relationship

32
Q

Kaplan-Meier Method

A

○ Also known as the product-limit method.
○ Used to answer the question “What is the probability of survival to a certain point?”
○ Relies upon fundamental probability theory
○ P[A] and p[B] occurring = P[A] x P[B]
○ Conditional probability:
■ P[A]|P[B]
■ Probability of surviving 24 months:
● P[surviving the first 23 months] x P[24th month, conditional upon surviving 23 months]
● Allows us to predict and compare survival with respect to one factor only (univariate).

33
Q

Comparing Survival Curves

A

○ Logrank Test
■ single variable
■ relies on null hypothesis that there is no difference in the probabiltiy of death at any given point between the two groups
■ a form of the chi square test that compares observed to expected values of an event being studied in each group
○ Chi Square Test

34
Q

Censoring

A

● Patient doesn’t experience event of interest during study period
● Patient lost to follow-up
● Patient experiences event that makes follow-up impossible
● Left, right or interval censoring
○ Right = censoring event occurs after followup period
○ Left = censoring event before follow-up period
○ interval = censoring event due to study subjects who come in and out of observation so that follow-up data is missing at some point in the followup period
● Uninformative
○ Small # of subjects
○ Censoring not related to prognosis
● Informative
○ Larger # of subjects
○ Censoring related to prognosis

35
Q

Assumptions of the Kaplan Meier Method

A

● Uninformative censoring - data that was censored was not informative
● Survival probabilities the same for people recruited at different time points
● Events happened at time specified
○ Consider recurrence of disease

36
Q

Cox Proportional Hazard Model

  • hazard
  • hazard ratio
  • when to use
  • assumptions
A

● Hazard is the probability that an individual experiences an event at a specific time
● Survival describes the cumulative probability of an event not occurring, while hazard describes the instantaneous risk of an event occurring

● Hazard Ratio = (O1/E1) / (O2/E2)
○ O = observed number of observations in groups 1 and 2
○ E = expected number of observations in group 1 and 2
○ An HR = 1.0 - indicates that the presence of a factor does not increase or decrease the hazard
○ a 95% CI that crosses 1.0 indicates the associated factor is not statistically significant

● When to use
○ Permits you to control for multiple factors, multivariate

● Proportionality assumption
○ Assumes the ratio of hazards remains the same over time for two different patients
○ for Cox model to be valid, the proportionality assumption must be valid for each independent variable
○ if there is crossing of two survival curves: the proportionality assumption does not hold true for the independent variable in question, since the ratio of hazard throughout time was clearly not constant

37
Q

chi square

  • parametric?
  • when to use
  • degrees of freedom
A

Parametric

binomial distribution: variables that have only 2 possible values
(ex. has disease or does not)
expected values should be at least 5

(# rows - 1)(# columns -1)

38
Q

fisher exact test

  • parametric?
  • when to use
  • degrees of freedom
A

Parametric

binomial distribution
an expected value <5

39
Q

student’s t-test

  • parametric?
  • when to use
  • degrees of freedom
A

Parametric

compare 2 groups of continuous data (ex. height)
variable is normally distributed
variance of two samples should be similar

Both groups the same size: 2(n -1)
Different sizes: (n1 -1)+(n2 -1)

40
Q

anova

  • parametric?
  • when to use
  • degrees of freedom
A

Parametric

“t-test” for more than 2 groups
comparison of means, allowing a determination of the significance of the explained differences vs. unexplained differences.

DFTotal: Total number of observation -1

Between Groups
DF Between: Number of groups -1

Within Groups:
DFWithin: Sum of degree of freedom within each group
–or DFTotal = DFBetween + DFWithin

41
Q

mann whitney rank sum

  • parametric?
  • when to use
A

Non parametric

** Non parametric counterpart of t test that compares 2 groups

Assumptions:
independent observation
random sampling

42
Q

wilcoxon signed rank test

  • parametric?
  • when to use
A

non parametric

*** Non parametric counterpart of t-test to compare before and after (paired) differences between groups

Like Mann-Whitney, except data is paired
paired differences must be independent

Assumptions:
same assumption as Mann-Whitney
except lack of independence between each observation pairs

43
Q

kruskal-wallis test

  • parametric?
  • when to use
A

Non parametric

Non parametric of ANOVA
Greater than 2 groups

44
Q

pearson correlation

  • parametric?
  • when to use
  • degrees of freedom
A

Parametric

Two variables are normally distributed
Determines the relationship between groups

n-2

45
Q

spearman rank correlation

  • parametric?
  • when to use
A

n-2

Non parametric counterpart of pearson correlation coefficient
Determines the relationship between groups