Statistics Flashcards

Question 1

Q

Nominal data

Answer

A

In category, non-parametric

Question 2

Q

Study Power?

Answer

A

The power of a study is the probability of detecting a significant difference between treatments or study groups when there really is one.
Low power increases the likelihood of failing to identify a statistically significant difference when a real difference does exist.
High power (80% or more) is desirable .
Power is affected by sample size, etc.

Question 3

Q

Ordinal data?

Answer

A

In order, with unequal interval,non-parametric

Question 4

Q

Interval data?

Answer

A

Equal interval
No absolute zero
Cannot compute ratio
parametric

Eg Tm in Celsius or Fahrenheit

Question 5

Q

Ratio data?

Answer

A

Equal interval
with absolute zero or true zero
Can calculate ratio
parametric

Eg. Wt, hight, Kelvin Tm

“NOIR”

Question 6

Q

Measurement of central tendency?

Answer

A

Mean
Median
Mode

Question 7

Q

Mean= Median = Mode, what distribution?

Answer

A

Normal distribution

Question 8

Q

Relationship of mean, median and mode in right (positive) distribution?

Answer

A

Right skewed -Tail on the right
Mean>Median>mode

(Rule of thumb: mean always follows the tail)

Question 9

Q

The relationship of mean, median and mode in left skewed distribution?

Answer

A

Tail is on the left of the distribution

Mean<Mode

Question 10

Q

For normal distribution, select statistic method?

Answer

A

Select Parametric statistics test

Eg. Student t-test, chi-square, ANOVA, ANCOVA, regression analysis

Question 11

Q

For non-normal distribution, eg. Bimodal, skewed, etc. test methods selection?

Answer

A

Non-parametric test eg.Fisher’s exact test, McNemar test,Mann-Whitney U test, Wilcoxon’s rank sum test, Kruskall-wallis test

Question 12

Q

Ways of obtaining random sample?

Answer

A

Simple random sampling
Systemic random sampling
Stratified random sampling
Cluster sampling

Question 13

Q

Bias?

Answer

A

Systemic error

Impacts internal validity

Question 14

Q

Chance

Question 15

Q

Confounder?

Answer

A

Associated with exposure (risk) and outcome
An independent risk factor for the outcome
Not in the causal pathway between the risk factor and disease

Question 16

Q

Power

Answer

A

The chance of finding an effect in your sample if it truly exist in the population.

Power is not a question in a study that shows a significant effects.

If a study results had failed to show a significant difference (p>0.05) between the two groups, one may wonder whether the study had sufficient power.

Question 17

Q

When apply to a population,
Given sensitivity and prevalence,
True positive =?
False negative =?

Answer

A

True Positive = Sensitivity x Prevalence

False negative = (1- Sensitivity) x Prevalence

Question 18

Q

When apply to a population, given Specificity and Prevalence,
True negative =?
False positive =?

Answer

A

True Negative = Specificity x (1- Prevalence)

False positive = (1- Specificity) x (1-Prevalence)

Question 19

Q

Regression toward the mean

Answer

A

In any group selected on a characteristic with substantial day-to-day variation, many will have values closer to the population mean when the measurement is repeated and worst pts will improve.

Question 20

Q

Baseline drift

Answer

A

Which occurs with measurement on certain machines that requires frequent calibration.

Question 21

Q

Hawthorne effect

Answer

A

A tendency among study subjects to change simply because they are being studied or watched.

Question 22

Q

1SD =? %
2SD =? %
3SD =? %

Answer

A

1 SD = 68% (Z score = 1)
2 SD = 95% (Z score = 2)
3 SD = 99% (Z score = 3)

Question 23

Q

When two events are independent, the probability of either will occur?

Answer

A

Is the sum of their probability, minus the probability that both will occur.
P (A or B) = P (A) + P (B) - P (A and B)

Question 24

Q

When two conditions are mutually exclusive, the probability that either one will occur is

Answer

A

The sum of their probability

Question 25

Q

Randomization

Answer

A

Assignment occurs by chance

Question 26

Q

ROC curve - Receiver-operator curve

Answer

A

X axis: 1 - specificity, or the false - positive rate

Y axis: Sensitivity

Question 27

Q

ROC curve is used to determine

Answer

A

Optimal Cut-off point for the respective test.
In general, the point closest to the upper-left corner, where sensitivity is highest and the false-positive rate is lowest, is chosen as the cut-off.

Question 28

Q

In ROC cure, the Area Under the Curve (AUC) is used to?

Answer

A

To calculate the diagnostic accuracy (best sensitivity and specificity) of the test, that is the probability of correctly identifying disease based on the result of the test.
The larger the area under the curve, the better the test.

Question 29

Q

Kappa statistic

Answer

A

Used for reliability studies, eg to assess inter-rater reliability or intra-eater reliability.
Used in assessing the degree to which two or more raters, examine the same data, agree when it comes to assigning the data to categories.

Question 30

Q

Effect modification

Answer

A

Occurs when one factor modifies the effect on outcome of another.

Question 31

Q

Confounder

Answer

A

Occurs when the association between two variables is distorted by the fact that both are associated with a third.
Eg. The association between coffee and lung cancer is distorted by smoking

Question 32

Q

CV (coefficient of variation)

Answer

A

CV = SD/X x 100%

Used for compare the relative spread of data for 2 variables (eg. Height and weight)
Used to evaluate precision of the measurement of a single variable (x-ray film reading by two physicians)

Question 33

Q

Histogram

Answer

A

For continuous variables

Question 34

Q

Bar graph

Answer

A

For categorical data

Question 35

Q

Scatter plot

Answer

A

For association

Question 36

Q

Types of random samples

Answer

A

Simple random
Systematic random
Stratified random
Cluster random

Question 37

Q

Simple random sampling

Answer

A

Every unit in the population had the same probability of being selected, chance alone determines whether a particular unit in the population is selected for the sample

Question 38

Q

Systematic random sampling

Answer

A

Every k th member is selected from the population

Question 39

Q

Stratified random sampling

Answer

A

Population is divided into heterogeneous groups (strata) (eg. black, white, Hispanic, Asia) and a random sample is taken from within each group
Ensures equal numbers of each strata in final sample.

Question 40

Q

Cluster random sampling

Answer

A

Population is divided into homogenous group (cluster) and a random sample of these groups is taken. eg a school, a community, etc

Question 41

Q

Z score

Answer

A

Z = (X - U)/sigma

Any normal distribution can be transformed to the standard normal to get a Z score for a given value X

Question 42

Q

Wilcoxon’s signed rank test is an non-parametric equivalent of ?

Answer

A

Paired t-test

Question 43

Q

One sample t-test

Answer

A

To compare the sample mean with the mean of the population

Question 44

Q

Two samples t-test

Answer

A

To compare the mean of two groups

Question 45

Q

Paired t-test

Answer

A

To compare the mean of before and after

Question 46

Q

ANOVA

Answer

A

Used for more than two groups

Question 47

Q

Chi-square test

Answer

A

Compare two proportions

Question 48

Q

Fisher’s exact test

Answer

A

Is used if expected count on a cell is less than 5

Question 49

Q

NcNemar’s chi-square test

Answer

A

For paired proportions

Question 50

Q

Spearman’s rank correlation coefficient is a non-parametric equivalent to ?

Answer

A

Pearson’s correction coefficient

Question 51

Q

Coefficient of determination

Answer

A

% of variation in Y explained by X

Question 52

Q

Simple linear regression

Answer

A

Dependent variable is continuous

One independent variable

Question 53

Q

Multiple linear regression

Answer

A

Dependent variable is continuous

More than one independent variables

Question 54

Q

Logistic regression

Answer

A

Dependent variable is dichotomous

OR is used for estimation

Question 55

Q

Survival analysis

Answer

A

Time to the event

Hazard rate is use for estimation

Question 56

Q

Collinearity

Answer

A

Collinearity is a linear relationship between two explanatory variables.

Collinearity can result in unstable beta coefficient estimates.

Question 57

Q

Funnel plot

Answer

A

A graph designed to check for the existence of publication bias in systematic reviews and meta-analyses

Question 58

Q

When can Poisson distribution be used as a good approximation of a binomial distribution?

Answer

A

In general, p should be small , 15

Question 59

Q

Type 1 error

Or alpha

Answer

A

Reject H0 when it is true.

Question 60

Q

Type 2 error

Or beta

Answer

A

Accept H0 when it is actually false.

Question 61

Q

For Paired data (pre and post, paired), what test to choose?

Answer

A

For parametric data, using
- Paired t test ( pre and post, paired),

For non-parametric data, using
-Wilcoxon’s signed rank test

Question 62

Q

To compare 2 group means, what test to choose?

Answer

A

For parametric data, using
- Student t test

For non-parametric data, using
-Wilcoxon’s rank sum test (also termed Mann-Whitney U test.

Question 63

Q

To compare to proportions, what test to choose?

Answer

A

For parametric data, using
- Chi-square

For non-parametric data, using

Fisher exact probability test
- used when at least 1 cell in a contingency table has an expected count s Chi-square test for paired proportion.

Question 64

Q

More than two groups, what test to choose?

Answer

A

For parametric data, using
- ANOVA

For non-parametric data, using
- Kruskal-Wallis test

Answer 64

A

For parametric data, using
- Pearson’s correlation

For non-parametric data, using
- Spearman’s correlation

Multiple regression
- more than one independent variable s

Answer 65

A

Kaplan-Meier analysis
Cox proportional Hazard Regression
- a combination of multiple logistic regression techniques with survival methods

Answer 66

A

Logistic regression

Answer 67

A

How scattered the data is.

Answer 68

A

Precision of the mean.

How precise the data is.