Stats exam Flashcards

1
Q

Confidence interval definition

A

A 95% confidence interval is range of values that contain the true, unknown population mean with a probability of 0.95.
In repeated sampling, 95% of the confidence intervals calculated would include the true mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reference range definition

A

A reference range is the range we would expect to contain 95% of values that an individual measurement from the population could take.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

P value definition

A

The P value is the probability of having observed our data (or more extreme data) when the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard error vs standard deviation

A

The standard deviation is a measure of variability in the population, whereas the standard error is a measure of the degree of uncertainty between the mean of the population and the mean of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give three properties of standard error

A

-it is smaller for large samples than small samples
-it is less than the standard deviation (ie the variability of the individual observations in the population)
-it will increase as the standard deviation increases (ie as the variability among the individual values in the population increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation coefficient definition

A

A measure of positive or negative strength of linear association between two continuous variables. It is represented as a straight line with a value between -1 and +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Three criteria of confounding

A
  1. A confounder is associated with the exposure of interest
  2. A confounder is independently associated with the outcome (i.e. a risk factor)
  3. A confounder is NOT on the causal pathway
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Association definition

A

whether the distribution of one variable varies according to the value of the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical data that is independent should be analysed via… if the assumptions of this test are not met the data should be analysed by…

A

chi-squared test… Fisher’s exact test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical data that is paired should be analysed via… if the assumptions of this test are not met, the data should be analysed by…

A

McNemar’s test… binomial based exact test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Format of test steps:

A
  1. set up a null hypothesis
  2. calculate a test statistic
  3. refer value of test statistic to the appropriate statistical table to obtain a p value
  4. calculate a confidence interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For chi-squared how do you calculate degrees of freedom?

A

(r-1)(c-1)= degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

chi-squared assumptions

A
  1. each subject is independent of all other subjects
  2. All expected cell counts are ≥1. That is no expected cell counts are 0.
  3. No more than 20% of cell counts are <5.

If none of the above hold up, you would do Fisher’s exact test.

NOTE THAT IT IS EXPECTED FREQUENCY COUNTS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is required for McNemar’s test to be valid?

A

That b+c must add by at least 5 i.e. the discordant pairs must be at least 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Drawbacks of chi-squared et al.

A

However the tests have drawbacks:
-If there is evidence against the null hypothesis of no association, they do not indicate the direction of the difference
-We cannot obtain an effect size when dealing with a binary outcome (e.g., 2x2 tables or more generally 2 x R tables), such as risk difference, risk ratio or odds ratio
-We obtain no measure of uncertainty when dealing with a binary outcome - confidence intervals cannot be obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Requirements for binomial distribution

A
  1. multiple trials
  2. two outcomes
  3. p(success) is constant
  4. trials independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Requirements for calculating risk differences in proportions

A

No cell in the 2x2 table is smaller than 10, then exact methods should be used

18
Q

What t test is used in logistic regression?

A

Wald’s test

19
Q

Type I error definition

A

Type I error – risk of a false positive. That is we declare that a difference exists when in truth there is no difference. We call the probability of a type I error alpha (𝜶)

20
Q

Type II error definition

A

Type II error – risk of a false negative. That is we claim a difference does not exist when in truth it does exist. We call the probability of a type II error beta (𝜷).

21
Q

Power definition

A

The power of the study is a measure of how likely it is that the hypothesis test (significance test) will produce evidence of an association, for a population effect of a given size, if an effect truly exists.
Put in probability terms, the power of a test is the probability of rejecting the null hypothesis when it is false (i.e. 1-β).
In general, the larger the study the greater the power.

22
Q

When will sample size increase? (four things)

A

In the previous slides you will have seen that the number of subjects will increase as:-
- the power increases (e.g. from 80% to 90%)
- the significance level decreases (e.g. from 5% to 1%)
- the size of the important difference to be detected decreases (i.e. trying to detect a smaller difference)
-in the case of continuous outcomes, the standard deviation increases (the more variable the outcome under investigation, the less precise our esimate of the difference)

23
Q

In practice, what else needs to be taken into account for sample size?

A

In practice we need to increase the sample size to allow for:
*Non-response and loss to follow up
*Clustering if applicable, e.g., cluster randomised trial, complex survey design
*Potential cross over and contamination of treatments
*Investigation of interactions
*Multiple outcomes

24
Q

Power definition

A

the probability of detecting a specified difference if one truly exists

25
Q

statistical significance vs clinical significance

A

Statistical significance confirms evidence for the existence of an effect (i.e. not due to chance), but makes no
reference to the size of that effect.
Clinical significance confirms the existence of an effect at least as large as that deemed to be clinically
important. Statistical significance does not imply clinical significance.

26
Q

when to use one sample t test vs two sample t test?

A

One-Sample T-Test is used when comparing the mean of a single sample to a predefined standard or theoretical mean while Two-Sample T-Test also reffered to as Independent Samples T-Test compare the means of two independent groups to see if there is a statistically significant difference between them

27
Q

alternative for one sample t test

A

Wilcoxon signed rank test

28
Q

alternative for two sample t-test

A

mann whitney u test

29
Q

alternative for paired t-test

A

wilcoxin signed rank test

30
Q

linear regression assumptions

A
  1. Observations are independent
  2. The relationship between the outcome and explanatory variables are linear
  3. The residuals of the linear regression model:
    a. Follow a Normal distribution
    b. Have mean zero
    c. Have a constant variance
31
Q

Assumptions of two sample t test

A

a. That the LDL levels at year 1 follow a Normal distribution in both groups [1]
b. That the variance (standard deviation) of the LDL measurements after 1 year
is the same in both groups [1]
c. That the LDL measurements after 1 year for each subject are independent of
each other within each group [1]
d. That LDL measurements after 1 year for each subject are independent
between groups [1]
7

32
Q

Power definition

A

This is the probability we reject [1] the null hypothesis when the alternative
hypothesis is true [1

33
Q

List factors affecting sample size

A

a. The statistical power (or type II error) [1]
b. The significance level (or type I error) [1]
c. The minimum difference to be detected [1]
d. The assumed SD in each group [1]

34
Q

Z score equation

A

Z = (mean - value)/standard deviation, then standard normal table

35
Q

Write down the assumptions necessary when performing an analysis of the
means of two paired samples.

A
  1. The distribution of the population of differences, i.e., cholesterol
    measurement in 1990 minus the cholesterol measurement in 1985, follows
    a Normal distribution
  2. Each subject’s difference in cholesterol measurements is independent of
    all other subject’s difference in cholesterol measurement
36
Q

chi-sq degrees of freedom

A

DF =(number of rows – 1)×(number of columns – 1

37
Q

conditions for a 95% CI for a proportion to be valid

A

n general the conditions required are that the sample size is large and the
probability of the event of interest is not small. In practical terms this typically
means that no cell in the table is smaller than 10. In more formal terms if the
sample size is n and the probability of the event of interest is π then we would
require nπ >10 and n ( 1−π ) >10. If this is the case you should use a binomial exact solution.

38
Q

normal distribution what % of values within 1SD or 2SD. P values fall between what…

A

Very Important Note: For a Standard Normal distribution you MUST know the
following facts:
1. Approximately 68% of values lie within 1 standard deviation of the mean
2. Approximately 95% of values lie within 2 standard deviations of the mean
It is also useful to know the following:
3. Approximately 99.75% of values lie within 3 standard deviations of the
mean
4. Approximately 99.99% of values lie within 4 standard deviations of the
mean
If you remember these then the following is true:
1. Z value between 2 and 3 (or -3 and -2) then 0.0025 ≤ p ≤ 0.05
2. Z value between 3 and 4 (or -4 and -3) then 0.0001 ≤ p ≤ 0.0025
3. Z value greater than 4 or smaller than -4 then p ≤ 0.0001
9

39
Q

what are the scores for Pearson coefficients?

A

-0.1 - 0.1 - weak/no
0.1 - 0.4 - weak
0.4 - 0.7 mod
0.7 - 0.99 strong
1.00.- perfect positive

40
Q
A