Inferential statistics Flashcards

Question 1

Q

why is randomisation important

Answer

A

avoid bias and confounders

Question 2

Q

sampling methods

Answer

A

simple random sampling
stratified sampling
convenience samplong

Question 3

Q

what is data provenance

Answer

A

Data provenance is the history of a dataset
– Data cleaning processes
– Imputations for missing data
– How it was collected and by whom
– Access to previous versions etc

Question 4

Q

standard error of the mean

Answer

A

SE = s/√n

Question 5

Q

Hypothesis testing

Answer

A

A hypothesis should be something that is testable and falsifiable

Question 6

Q

procedure for hypothesis test

Answer

A

• a random sample is drawn from a population
• a null hypothesis is formulated
• a test-statistic is calculated, of which we
know the probability distribution
• p-value: evidence for a hypothesis comparing the observed value of the statistic
with the corresponding distribution
• if the p-value<0.05, reject the null hypothesis

Question 7

Q

normality testing

Answer

A

Shapiro tests

H0 = Null hypothesis: Data is normally
distributed
H1 = Alternative hypothesis: Data is not
normally distributed

Question 8

Q

T-test & p value

Answer

A

The probability that these two variables are

from the same population, specifically the means.

Question 9

Q

z test and t test definition

Answer

A

• The probability that two means are from the
same populations
• The probability that two means are from
different populations

Question 10

Q

Type I Error

Answer

A

– errors where the result is statistically significant
despite the fact that the null hypothesis is true
– i.e., a diagnosis of cancer (“positive”) for healthy
subject

Solution: change alpha value from 5% to 1%

Question 11

Q

Type II Errors

Answer

A

– errors where the result is NOT significant despite
the fact that the hypothesis is true
– i.e., a diagnosis of healthy for a subject who has
cancer

Question 12

Q

Sensitivity and Specificity

Answer

A

• Sensitivity (power): proportion of the
positives that are correctly identified by a
test as being positive
• Specificity: proportion of negatives that are
correctly identified by a test as being
negative