objectives - michael Flashcards
what are the three basic types of data?
nominal, ordinal, numerical
what is a nominal scale? what is it used to measure?
assigns individuals one (and only one) category without any specific ordering (e.g. positive/negative, alive/dead, A/B/AB/O, etc.). Reported as the percentage of a population that falls into a particular category.
what is an ordinal scale? what is it used to measure?
assigns individuals one (and only one) categy with some specific order (e.g. stage I, II, III, IV Hodgkin’s lymphoma). Since the categories are still often qualitative in nature, it does not make sense to summarize the data in terms of an “average.” Reported as the percentage of a population that falls into a particular category.
what is a numerical scale? what is it used to measure?
can be discrete or continuous; can be summarized as an average.
what is a distribution of data? how would you graph it?
a set of values of a variable over some population (e.g. pulse rates of smokers over the age of 60). Can be graphed as a histogram with the values of the variable on the x-axis and the frequency of occurrence of that value on the y-axis.
what’s a normal distribution?
a distribution with some nice features: 68% of the population falls within 1 standard deviation of the mean, and 95% of the population falls within 2 standard deviations of the mean.
what is mean?
average of the values of a variable. Take the sum, divide by n, the number of values. If I have 3 stethoscopes, and you have 5 stethoscopes, the average number of stethoscopes between the two of us is (3 + 5)/2 = 4.
what is the median?
the midpoint, the 50th percentile. Out of 3 values, the 2nd value (sorted highest to lowest or lowest to highest). Out of 5 values, the 3rd value. And so on. If there are an even number of values, take the mean of the middle two values.
what is the range?
the highest value in your dataset minus the lowest value in your dataset
what is the variance?
take every value from your dataset and subtract the mean from that value. Then square those numbers, add them all up, and divide by the number of values.
what is the standard deviation?
Square root of the variance.
understand the concept of sampling and estimation of population parameters
a) Flip a coin 10 times. It comes up 6 heads and 4 tails. This is your sample. b) From this sample, you estimate that the coin comes up heads 60% of the time it is flipped. This is an estimation of a “population parameter.” Obviously, this estimate is biased toward heads. You can flip the coin a few more times and see that as the number of coin flips increase, the bias decreases. c) You can do this to estimate the number of people in a population with a certain disease as well: consider a smaller population (sample) and extrapolate from that sample to your entire population. Make sure your samples are representative of the population, however!
define and contrast qualitative versus quantitive assessments of clinical uncertainty
a) Qualitative: likely/unlikely, probably/possible, suspicious/can’t rule out b) Quantitative: probabilities on a scale from 0 (impossible) to 1 (certain)
define prior probability
probability a patient has a disease based on prior clinical data before some additional test is conducted which will result in additional information
define posterior probability
the updated probability after the results of the test come back.
what is sensitivity?
probability of a positive test in a population of only persons who have the disease (true-positive rate; how many people who have the disease will test positive)
what is specificity?
probability of a negative test in a population of only persons who do not have the disease (true-negative rate; how many people who do not have the disease will test negative)
what is positive predictive value? what is the formula for it?
probability of disease in persons with a positive test
sens*p
divided by
sens*p + (1-p) * (1-spec)
what is the negative predictive value? how do you calculate it?
probability of no disease in persons with a negative test
spec*(1-p)
divided by
spec * (1-p) + p*(1-sens)
find ppv and npv: You have a patient with a cough and a history of TB exposure. You know that a test for TB has a sensitivity of 0.75 and a specificity of 0.80, and the test is to be used in a population having a TB-prevalence of 20%
data:image/s3,"s3://crabby-images/3caf3/3caf3d77617617bfe062e6c1529cb9f43dd71f2d" alt=""
List considerations in choosing the right diagnostic test for a given clinical situation.
- a) Typically, there is a trade-off between sensitivity and specificity
b) “Spin”: specific test rules in diseases
c) “Snout”: sensitive test rules out diseases
d) For continuous measurements, apply a “cut-off point”
i) Low cut-off: low specificity and high sensitivity
ii) High cut-off: high specificity and low sensitivity
what is decision analysis? what are the steps?
a) A quantitative approach to making trade-offs in clinical decisions (e.g. quality of life versus years lived or short-term vs. long-term risks)
b) Step 1: all potential outcomes of each strategy under consideration are represented in a decision tree
c) Step 2: Probabilities are assigned to each clinical outcome
d) Step 3: Each outcome is assigned some quantitative value
- e) Step 4: Expected value of each strategy is calculated
what is the standard error? how do you calculate it?
Standard error quantifies the variation of the sample mean
data:image/s3,"s3://crabby-images/10899/10899e904380177d9db8e2f6bee864d90f33ac64" alt=""
what is the confidence interval? how do you calculate it?
quantifies the accuracy of the sample mean by providing an interval based on the sampling distribution
data:image/s3,"s3://crabby-images/c065b/c065bb1da95cfdc5bec24394f37f4c7d52cec2ec" alt=""
what are the three steps to hypothesis testing?
i) Define a null hypothesis: usually the null hypothesis is paradoxically what you are trying to support
ii) Compute a test statistic: some test to describe the difference between your observed data and the null hypothesis
iii) Draw a conclusion: one rejects the null hypothesis if the test statistic t is less than -2 or greater than +2. (2 standard deviations)
what is type I error?
rejecting the null hypothesis when it is true; akin to a false positive
what is type II error?
failing to reject the null hypothesis when it is false; akin to a false negative
how do you interpret a p value for effect?
- convenient way to present the results of a statistical test; number between 0 and 1 which represents the probability than an observed effect is due to chance
- Usually 0.05 is the cut-off for presenting a p-value as statistically significant
what is the relationship between confidence intervals and hypothesis testing?
If a confidence interval does not include 0 (no effect), then it is safe to say the p-value is less than 0.05.
what is statistical power? how can you increase it?
The power represents the probability that the study will exclude the null hypothesis if indeed the alternative hypothesis is true (the type II error rate). One can increase the power of a study by increasing the sample size.
what are the advantages and disadvantages of larger studies
cost more but have less variable results
what is a contingency table?
relationship between two categorical variables in the form of a table
what are the steps for hypthesis testing for a 2x2 contingency table?
- Define null hypothesis
- test null hypothesis using the chi-squared test
- take the square of the difference between the observed and expected value
- divide by the expected value.
- compare your chi-squared value to a standard value (gives you your p value)
what is the equaton for degrees of freedom? what does it tell you?
(r – 1) x (c – 1)
If you know degrees of freedom and the chi-squared statistic, you can find a p-value.
what is required about your data for the independent samples t-test to be used?
must be independent samples…
ie must be comparing the means of two independent samples (not the same patients at different time intervals, etc.)
how do you calculate standard error of the mean difference?
data:image/s3,"s3://crabby-images/b0d2e/b0d2e882dc3d7d5edbdf09a48e333bab06b2bae6" alt=""
how do you calculate the test statistic (t) for a sample with two variables?
data:image/s3,"s3://crabby-images/4056f/4056f5457f94f6df0cda46d39a293fa70f107dd0" alt=""
for what values is a t test significant?
if t > 2 or t < -2
when would you use a paired samples t test?
when you have two data sets that aren’t independent - for example, patients before and after treatment
what are the steps for a paired samples t test?
Step 1: compute a difference score for each pair of observations
Step 2: compute a mean of the difference scores and a standard deviation of the difference scores
what is the equation for the test statistic (t) in a paired samples t test?
data:image/s3,"s3://crabby-images/b114c/b114c47d42808cc2aa89ac0a875cc0336c0d28e8" alt=""
what does survival analysis measure?
time until occurrence of some event (infection, disease relapse, death) after some initial observation period (initial therapy or treatment)
what are the potential problems with survival analysis methods? (2)
some patients may drop out of the study (these patients are said to be censored) and the distribution of the data is likely to be skewed (cannot use normal distribution methods)
how are survival functions displayed?
as a graph called a survival curve: represent the probability that an individual will survive beyond time t
what is the median survival time?
the time such that 50% of the subjects will experience the event before the time and 50% of the subjects will experience the event after the time
what is log-rank test?
a statistical test to determine if two survival curves are different