Statistics Flashcards

0
Q

Nominal data

A

In category, non-parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Study Power?

A

The power of a study is the probability of detecting a significant difference between treatments or study groups when there really is one.
Low power increases the likelihood of failing to identify a statistically significant difference when a real difference does exist.
High power (80% or more) is desirable .
Power is affected by sample size, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ordinal data?

A

In order, with unequal interval,non-parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval data?

A

Equal interval
No absolute zero
Cannot compute ratio
parametric

Eg Tm in Celsius or Fahrenheit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ratio data?

A

Equal interval
with absolute zero or true zero
Can calculate ratio
parametric

Eg. Wt, hight, Kelvin Tm

“NOIR”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measurement of central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean= Median = Mode, what distribution?

A

Normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Relationship of mean, median and mode in right (positive) distribution?

A

Right skewed -Tail on the right
Mean>Median>mode

(Rule of thumb: mean always follows the tail)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The relationship of mean, median and mode in left skewed distribution?

A

Tail is on the left of the distribution

Mean<Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For normal distribution, select statistic method?

A

Select Parametric statistics test

Eg. Student t-test, chi-square, ANOVA, ANCOVA, regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For non-normal distribution, eg. Bimodal, skewed, etc. test methods selection?

A

Non-parametric test eg.Fisher’s exact test, McNemar test,Mann-Whitney U test, Wilcoxon’s rank sum test, Kruskall-wallis test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ways of obtaining random sample?

A
  1. Simple random sampling
  2. Systemic random sampling
  3. Stratified random sampling
  4. Cluster sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bias?

A

Systemic error

Impacts internal validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Chance

A

Radom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Confounder?

A

Associated with exposure (risk) and outcome
An independent risk factor for the outcome
Not in the causal pathway between the risk factor and disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Power

A

The chance of finding an effect in your sample if it truly exist in the population.

Power is not a question in a study that shows a significant effects.

If a study results had failed to show a significant difference (p>0.05) between the two groups, one may wonder whether the study had sufficient power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When apply to a population,
Given sensitivity and prevalence,
True positive =?
False negative =?

A

True Positive = Sensitivity x Prevalence

False negative = (1- Sensitivity) x Prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When apply to a population, given Specificity and Prevalence,
True negative =?
False positive =?

A

True Negative = Specificity x (1- Prevalence)

False positive = (1- Specificity) x (1-Prevalence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Regression toward the mean

A

In any group selected on a characteristic with substantial day-to-day variation, many will have values closer to the population mean when the measurement is repeated and worst pts will improve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Baseline drift

A

Which occurs with measurement on certain machines that requires frequent calibration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Hawthorne effect

A

A tendency among study subjects to change simply because they are being studied or watched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

1SD =? %
2SD =? %
3SD =? %

A

1 SD = 68% (Z score = 1)
2 SD = 95% (Z score = 2)
3 SD = 99% (Z score = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When two events are independent, the probability of either will occur?

A

Is the sum of their probability, minus the probability that both will occur.
P (A or B) = P (A) + P (B) - P (A and B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When two conditions are mutually exclusive, the probability that either one will occur is

A

The sum of their probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Randomization

A

Assignment occurs by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

ROC curve - Receiver-operator curve

A

X axis: 1 - specificity, or the false - positive rate

Y axis: Sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

ROC curve is used to determine

A

Optimal Cut-off point for the respective test.
In general, the point closest to the upper-left corner, where sensitivity is highest and the false-positive rate is lowest, is chosen as the cut-off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In ROC cure, the Area Under the Curve (AUC) is used to?

A

To calculate the diagnostic accuracy (best sensitivity and specificity) of the test, that is the probability of correctly identifying disease based on the result of the test.
The larger the area under the curve, the better the test.

28
Q

Kappa statistic

A

Used for reliability studies, eg to assess inter-rater reliability or intra-eater reliability.
Used in assessing the degree to which two or more raters, examine the same data, agree when it comes to assigning the data to categories.

29
Q

Effect modification

A

Occurs when one factor modifies the effect on outcome of another.

30
Q

Confounder

A

Occurs when the association between two variables is distorted by the fact that both are associated with a third.
Eg. The association between coffee and lung cancer is distorted by smoking

31
Q

CV (coefficient of variation)

A

CV = SD/X x 100%

  1. Used for compare the relative spread of data for 2 variables (eg. Height and weight)
  2. Used to evaluate precision of the measurement of a single variable (x-ray film reading by two physicians)
32
Q

Histogram

A

For continuous variables

33
Q

Bar graph

A

For categorical data

34
Q

Scatter plot

A

For association

35
Q

Types of random samples

A

Simple random
Systematic random
Stratified random
Cluster random

36
Q

Simple random sampling

A

Every unit in the population had the same probability of being selected, chance alone determines whether a particular unit in the population is selected for the sample

37
Q

Systematic random sampling

A

Every k th member is selected from the population

38
Q

Stratified random sampling

A
  • Population is divided into heterogeneous groups (strata) (eg. black, white, Hispanic, Asia) and a random sample is taken from within each group
  • Ensures equal numbers of each strata in final sample.
39
Q

Cluster random sampling

A

Population is divided into homogenous group (cluster) and a random sample of these groups is taken. eg a school, a community, etc

40
Q

Z score

A

Z = (X - U)/sigma

Any normal distribution can be transformed to the standard normal to get a Z score for a given value X

41
Q

Wilcoxon’s signed rank test is an non-parametric equivalent of ?

A

Paired t-test

42
Q

One sample t-test

A

To compare the sample mean with the mean of the population

43
Q

Two samples t-test

A

To compare the mean of two groups

44
Q

Paired t-test

A

To compare the mean of before and after

45
Q

ANOVA

A

Used for more than two groups

46
Q

Chi-square test

A

Compare two proportions

47
Q

Fisher’s exact test

A

Is used if expected count on a cell is less than 5

48
Q

NcNemar’s chi-square test

A

For paired proportions

49
Q

Spearman’s rank correlation coefficient is a non-parametric equivalent to ?

A

Pearson’s correction coefficient

50
Q

Coefficient of determination

A

% of variation in Y explained by X

51
Q

Simple linear regression

A

Dependent variable is continuous

One independent variable

52
Q

Multiple linear regression

A

Dependent variable is continuous

More than one independent variables

53
Q

Logistic regression

A

Dependent variable is dichotomous

OR is used for estimation

54
Q

Survival analysis

A

Time to the event

Hazard rate is use for estimation

55
Q

Collinearity

A

Collinearity is a linear relationship between two explanatory variables.

Collinearity can result in unstable beta coefficient estimates.

56
Q

Funnel plot

A

A graph designed to check for the existence of publication bias in systematic reviews and meta-analyses

57
Q

When can Poisson distribution be used as a good approximation of a binomial distribution?

A

In general, p should be small , 15

58
Q

Type 1 error

Or alpha

A

Reject H0 when it is true.

59
Q

Type 2 error

Or beta

A

Accept H0 when it is actually false.

60
Q

For Paired data (pre and post, paired), what test to choose?

A

For parametric data, using
- Paired t test ( pre and post, paired),

For non-parametric data, using
-Wilcoxon’s signed rank test

61
Q

To compare 2 group means, what test to choose?

A

For parametric data, using
- Student t test

For non-parametric data, using
-Wilcoxon’s rank sum test (also termed Mann-Whitney U test.

62
Q

To compare to proportions, what test to choose?

A

For parametric data, using
- Chi-square

For non-parametric data, using

  • Fisher exact probability test
    - used when at least 1 cell in a contingency table has an expected count s Chi-square test for paired proportion.
63
Q

More than two groups, what test to choose?

A

For parametric data, using
- ANOVA

For non-parametric data, using
- Kruskal-Wallis test

64
Q

For correlation, what test to choose?

A

For parametric data, using
- Pearson’s correlation

For non-parametric data, using
- Spearman’s correlation

Multiple regression
- more than one independent variable s

65
Q

Time to event analysis

A
  1. Kaplan-Meier analysis
  2. Cox proportional Hazard Regression
    • a combination of multiple logistic regression techniques with survival methods
66
Q

Dependent variants categorical (binary, eg. Cured vs not cured), what test to choose?

A

Logistic regression

67
Q

SD

A

How scattered the data is.

68
Q

SEM

A

Precision of the mean.

How precise the data is.