DL2: Quiz 1 Flashcards

Question 1

Q

Define sensitivity?

Answer

A

TP/All+
The proportion of pt with dz who test + over all +

Question 2

Q

Define specificity?

Answer

A

TN/All-
The proportion of pt without dz who test - over all -

Question 3

Q

Define PPV?

Answer

A

Probability that people who test positive have the disease

Question 4

Q

Define NPV?

Answer

A

Probability that people who test negative do not have the disease

Question 5

Q

Define population?

Answer

A

All possible subjects of interest to the study

Question 6

Q

Define sample?

Answer

A

A subset of the population the is to represent the population

Question 7

Q

Define statistic?

Answer

A

A number that represents a property of the sample

Question 8

Q

Define ratio?

Answer

A

One number divided by another

Question 9

Q

Define proportion?

Answer

A

ratio (a part divided by the whole)

Question 10

Q

Define probability?

Answer

A

The chance of an event occurring

Question 11

Q

Define risk?

Answer

A

Probability of an event occurring

Question 12

Q

Define rate?

Answer

A

Proportion with a time period

Question 13

Q

Define incidence?

Answer

A

new cases that occurred/population at risk

Proportion of people who develop a condition during a time period

Question 14

Q

Define prevalence?

Answer

A

new cases that occurred/population at risk

Proportion of people who have a condition at one interval of time

Question 15

Q

Qualitative data?

Answer

A

Categorical
Nominal: pertaining to names
Ordinal: categories have an order or rank

Question 16

Q

Quantitative data?

Answer

A

Continuous
Interval: No absolute zeros (addition and subtraction)
Ratio: has absolute zero, no negative numbers (multiply and divide)

Question 17

Q

Independent variable?

Answer

A

The one we can manipulate

Question 18

Q

Dependent variable?

Answer

A

The one we measure

Question 19

Q

Covariants/Cofounder?

Answer

A

Any variable other than the chosen independent variable the may affect the dependent variable

Question 20

Q

Mean?

Answer

A

Sum of all observation/number of observations

Question 21

Q

Median?

Answer

A

Middle number when observations are placed in numerical order

Question 22

Q

Mode?

Answer

A

Most frequent observationz

Question 23

Q

Range?

Answer

A

Highest value minus lowest

Question 24

Q

Variance?

Answer

A

Subtract the mean from each measurement and square the result

Question 25

Q

Standard dev?

Answer

A

The square root of the variance

Question 26

Q

Answer

A

A: Lowest observation
B: lower quartile
C: Median
D: Upper quartile
E: Highest observation

Question 27

Q

Descriptive stats?

Answer

A

Organizes and summarizes data (skewness, mean, median, mode, standard dev, scatter plots)

Question 28

Q

Inferential stats?

Answer

A

Estimate population parameters, and how confident we can be in our conclusions

Question 29

Q

Simple randoming

Answer

A

Probability sampling

Every subject has equal probability of being selected

Question 30

Q

Systemic random?

Answer

A

Probability sampling

Select every nth subject
Randomly selects subjects with known sampling strategies

Question 31

Q

Stratified sampling?

Answer

A

Probability sampling

Divide population into relevant strata and take random samples from each stratum

Question 32

Q

Cluster sampling?

Answer

A

Probability sampling

Divide population into cluster and randomly select a subset from each cluster

Question 33

Q

Convenience sampling?

Answer

A

Non-Probability sampling

Select subjects based on availability, not representative of population

Question 34

Q

Volunteer sampling?

Answer

A

Non-Probability sampling

Take all subjects who volunteer

Question 35

Q

Why is probability better than non-probability sampling?

Answer

A

Not based on probability and susceptible to selection bias

Question 36

Q

Stratified vs cluster sampling

Answer

A

Stratified:
1. Partition population into mutually exclusive homogenous groups based on factor that may influence the measured variable
2. Obtain a simple random sample from each group
3. Collect data on each subject the was randomly sampled from each group
4. Heterogenous is split into homogenous sub pops (starts collection is exhaustive)

Cluster:
1. Divide population into groups
2. Obtain a simple random sample of clusters
3. Collect data on every subject in each of the randomly selected clusters (heterogeneous)
4. Useful when target of an intervention is a system rather than individual

Question 37

Q

What type of distribution?

Question 38

Q

What type of distribution?

Question 39

Q

Poisson distribution?

Answer

A

Discrete, quantitative data that occurs independently and randomly in time at some constant mean rate.

Primarily used to estimate the probability of rare events and predict the number of times an event occurs

Give probability that an outcome will occur a specified number of times when the number of trials is large and probability of an occurrence is small

Ex: Used to calculate number of deaths from lung cancer in a year in a town. Info is used to compare observed and expected values to decide if the number of deaths from cancer is higher or lower than expected

Question 40

Q

What type of distribution?

Answer

A

Poisson distribution

Question 41

Q

What causes skewness?

Question 42

Q

Kurtosis?

Answer

A

A measure of the combined weight of the tails relative to the rest of the distribution

Question 43

Q

Answer

A

Mean
Median
Mode

Question 44

Q

What is the purpose of data transformation?

Answer

A

To change skewed or unknown distributions to a normal distribution in order to calculate p-value

Question 45

Q

What is central limit theorem?

Answer

A

When equally sized samples are drawn from a non-normal distribution, the plotted mean from each sample will approximate a normal distribution as long as the non-normality was not due to outliers

Sufficiently large sample is generally considered 30 or more

Question 46

Q

What is p-value?

Answer

A

The probability of obtaining a measurement as extreme as the one obtained, assuming the null hypothesis is true.

Question 47

Q

What is null-hypothesis?

Answer

A

A hypothesis that states that there is no significant difference between 2 sets of data.

Question 48

Q

Type 1 error?

Answer

A

Rejecting the null hypothesis when the null hypothesis is true

False positive

Question 49

Q

Type 2 error?

Answer

A

Accepting the null hypothesis when the null hypothesis is false

False negative

Question 50

Q

What is 𝛂?

Answer

A

Critical value for rejecting the null hypothesis (0-1)

Question 51

Q

When would you reject the null?

Answer

A

P<𝛂
- a small p-value (i.e., less than alpha) is an “unlikely” result to obtain, allowing us to reject the null hypothesis (i.e., we see a statistically significant difference in the two groups).
- a large p-value (i.e., larger than alpha) is a “likely” result to obtain, allowing us to accept the null hypothesis (i.e., we will not see a statistically significant difference in the two groups).

Question 52

Q

What is ß?

Answer

A

Probability of a type II error (FN)

Question 53

Q

What type of graph? What does it do?

Answer

A

Histogram

Presents data as frequency counts over some interval

Question 54

Q

What type of graph? What are its components?

Answer

A

Boxplot
1. Thin lined box indicates the IQR – the 25th to the 75th percentiles of the data.
2. Within the thin lined box is the bolded line – the median.
3. From both ends of the thin lined box is the tail (or whiskers) which shows the minimum and maximum points up to 1.5 IQRs beyond the median.
4. The circle is an outlier, defined as data between 1.5 to 3.0 IQRs beyond the median.
5. The asterisk is an extreme outlier, defined as data points beyond 3.0 IQRs beyond the median.

Question 55

Q

What type of graph? What are its components?

Answer

A

Scatterplot
Presents data from 2 variables both measured on a continuous scale

Useful for accessing the association between 2 variables and assessing assumptions of tests such as linearity and absence of outliers

Question 56

Q

Confidence interval?

Answer

A

Range of values in which we have some level of confidence the true population value will lie

Smaller CI means less variability

95% CI is same as 5% alpha

Narrow CI: little variation and more precise
Wide CI: Greater variation and less precise

Question 57

Q

What does overlap of CI box plots mean?

Answer

A

Directly related to p-value

less overlap = larger difference and lower p-value
p«alpha = reject null and statistically significant

More overlap= smaller significant and higher p value over alfa = accept null and no statistical significance

Question 58

Q

Calculate risk ratio?

Answer

A

Risk in people with risk factor/risk in people w/o risk factor

RR = (a/(a+b)) / (c/(c+d))

Question 59

Q

Calculate absolute reduction or increase?

Answer

A

ARR

EER-CER

Risk of experimental-risk of control

Question 60

Q

Calculate relative risk reduction?

Answer

A

RRR
(Risk of experimental-risk of control)/ risk of control

(EER-CER)/CER

Question 61

Q

Calculate number needed to treat?

Answer

A

NNT
1/ARR (absolute risk reduction)

Question 62

Q

Calculate number needed to harm?

Answer

A

NNH
1/ARI (Absolute risk increase)

Question 63

Q

Calculate odds of risk factor in cases (with event)?

Question 64

Q

Calculate odds of risk factor in control (no event)?

Answer 60

A

(a/c)/(b/d) = ad/bc

Ratio of the odds of an exposure in the case group to the odds of an exposure in the control group

Answer 61

A

Observes development of disease in exposed and unexposed groups

Answer 62

A

Select subjects with event, compare presence of risk factor in cases with event to controls with out event

Answer 63

A

RR CI contains 1: no difference in risk. Do not reject H0.
RR entire CI > 1: risk in intervention group > risk in control group.
RR entire CI < 1: risk in intervention group < risk in control group.

Answer 64

A

OR CI contains 1: no difference in odds. Do not reject H0.
OR entire CI > 1: Odds in Case(or event) group > odds in control group. Reject H0
OR entire CI < 1: Odds in Case (or event) group < odds in control group. Reject H0

Answer 65

A

PArametric

Answer 66

A

Non-parametric

Answer 67

A

Data don’t seem to follow distribution
Assumptions underlying parametric tests are not met
Sata appear to be very skewed
Data has significant outliers

Answer 68

A

Paired t-test
Unpaired t-test
Pearson correlation
One way ANOVA

Answer 69

A

Wilcoxon Rank sum test
Mann-whitney u test
spearman correlation
Kruskal Wallis test

Answer 70

A

Compare for 2 different variables for same group

Answer 71

A

Compare outcomes on the same variable fro 2 different groups

Answer 72

A

a: one tailed (5%)
b: 2 tailed (2.5%)

Answer 73

A

Test for differences between means, larger the stat the tmore difference between the groups

Independent sample: compares means of 2 groups
Paired: compares means from same group at different times
One sample: compares the mean of one group to known mean

Answer 74

A

A measure of the amount of independent data that can be used to estimate a parameter

The probability distributions of the test statistics of hypothesis tests

Number of data points which are free to vary

Answer 75

A

1 Number of groups compared
2. Number of parameters needed to estimate the standard deviation

Answer 76

A

Random samples
Categorical data (counts)
Non-Parametric
Tests whether a categorical variable is related to another

Answer 77

A

Random samples
Categorical data (counts)
Non-Parametric
Tests whether data is representative of the full population.
Compares observed data to a theoretical model

Answer 78

A

frequency with expected frequency

Answer 79

A

Branch of stats for analyzing the expected duration of time until an event occurs

Must deal with censored data

Answer 80

A

event doesn’t occur during study period
subject lost to follow up
subject dies from something other than studied cause

Answer 81

A

Non-Parametric survival analysis method – no assumptions about how event probability changes over time.

Censoring is independent of event probability
Survival probabilities are comparable in early and later recruited subjets
Censoring is not more likely in one group than another

Answer 82

A

The relative risk of complications based on comparison of event rates.

Answer 83

A

Every patient randomized enters the primary analysis

Answer 84

A

Analysis includes only those patients who strictly adhered to the protocol

Identifies effect under ideal conditions

Answer 85

A

Key way data from multiple papers is summarized in a single image