Hypothesis Testing Flashcards by Rodwan Halimi

How are outliers detected?

A minimum and maximum are created by calculating Q1 (25th percentile) and Q3 (75th percentile) and then finding the interquartile range.

Minimum is calculated with the following formula: Q1 - 1.5*IQR. Anything below this number is an outlier.

Maximum is calculated with the following formula: Q3 + 1.5* IQR. Anything above this number is an outlier.

How well did you know this?

Not at all

Perfectly

What are the steps to hypothesis testing?

Step 1: Form hypothesis

Step 2: Collect the data, calculate mean and standard error of the sample

Step 3: Compare the sample mean to the hypothesis mean (using statistics)

Step 4: Make a conclusion using a p-value of 0.05 as the cut off.

How well did you know this?

Not at all

Perfectly

What should be expected if a hypothesis is true or false?

True: Sample mean close to hypothesis mean

False: Sample mean far away from hypothesis mean

How well did you know this?

Not at all

Perfectly

What would p-value tell us about a sample mean?

Tells us the probability that the sample mean is different to the mean value just due to random variation in the population.

If true population is 27 and sample mean is 28 then p-value would tell us probability of 28 or higher.

If sample mean is 26 then p-value would tell us probability of having a sample mean of 26 or smaller.

How well did you know this?

Not at all

Perfectly

What is a type 1 error?

Despite something being statistically significant. (eg p-value of 0.03) there is still a chance that the conclusion is incorrect (3% chance if p = 0.03) Errors of this kind are type 1 errors.

I.e probability of rejecting the null hypothesis when it is actually true

How well did you know this?

Not at all

Perfectly

What is a type 2 error?

Failing to reject the null hypothesis when it is actually false

How well did you know this?

Not at all

Perfectly

What are the types of tests used for hypothesis testing?

Independent samples t-test

Paired t-test

Mann-Whitney U test

Wilcoxon signed-rank test

Chi-square test (and Fisher’s exact test)

How well did you know this?

Not at all

Perfectly

Which tests are often used in normal distributions and which are used in non-normal distributions?

Normal:

Independent and paired t-tests. (parametric)

Non-normal (non-parametric):

Mann-Whitney U test

Wilcoxon signed-rank test

Chi-square test

How well did you know this?

Not at all

Perfectly

What is an independent sample t-test used for?

2 independent categorical groups.

A continuous outcome.

Observations are independent (no overlap in groups eg male vs female groups)

Normally distributed outcome (approximately)

No massive outliers.

How well did you know this?

Not at all

Perfectly

What is a paired sample t-test used for?

2 related groups (eg twins before/after)

A continuous outcome (age, Hb, etc)

Observations are independent within groups

Outcome is normally distributed

No massive outliers

How well did you know this?

Not at all

Perfectly

When is Mann-Whitney U test done?

It is a non-parametric test

2 independent categorical groups (Male vs female for example)

A continuous outcome

Observations are independent

Normal distribution is not assumed.

Very similar to independent t-test except we don’t assume a normal distribution

How well did you know this?

Not at all

Perfectly

When is a Wilcoxin signed-rank test used?

2 related groups

A continuous outcome

Observations are independent within groups

Does not assume normal distribution.

same as paired t-test but does not assume normal distribution

How well did you know this?

Not at all

Perfectly

What is a chi-squared test used for?

To assess distribution of a categorical variable between 2 or more groups.

How well did you know this?

Not at all

Perfectly

When is a chi-squared test?

It is non-parametric

Assumes expected cell frequency is at least 5, in each cell (if this assumption is not valid we can use the Fisher’s exact test)

Null hypothesis is that the distribution of observations between columns is independent of the rows.

How well did you know this?

Not at all

Perfectly

A study tests patients’BP before and after taking a magnesium supplement. Patients are independent of each other. Outcome is found to be normally distributed. We wish to test if magnesium has a significant effect on BP. Which test should be used?

Paired t-test

How well did you know this?

Not at all

Perfectly

A study tests a new chemotherapy drug. One group has the new drug, one has a current standard treatment. The outcome is “cancer detected” vs “cancer not detected” 4 weeks after treatment. Which test should be used?

Chi-square test.

A study compares “length of hospital stay” for patients on oral antibiotics vs IV antibiotics. The 2 groups are independent. The outcome is found to be non-normally distributed. Which test should we use?

Man-Whitney U test

Before and after taking an introductory statistics course, university students were asked to answer questions. Data is not normally distributed. The researchers would like to know whether or not the students’ attitudes changed during this time period. What test should be used?

Wilcoxin-signed rank test.

What are the types of categorical data?

Nominal (2 or more categories but no intrinsic order eg house appartments, holiday houses, etc)

Dichotomous (Variables with only 2 categorical levels (eg “yes” and “no”)

Ordinal (Has an order to it which allows it to be ranked but can’t have a value placed on it)

What are the types of continuous variables?

Interval: Measured along a continuum and have a numerical value.

Ratio: Variables are interval variables but with added condition that 0 mean none of that variable. (eg distance, height, etc)

What does zero correlation on a scatter plot look like?

No discernible relationship and points are all over the graph without following a trend.

What does a correlation of 1 and -1 indicate?

Constant rate of increase or decrease.

What are the 3 types of correlation?

Pearson’s correlation: Most widely known. Measures if X and Y are linearly related.

Kendal and Spearman’s correlation: Both rank correlations (Highest ranking X correlated with highest ranking Y?)

What are the assumptions made by Pearson’s correlation?

Both variables should be continuous

Both variables should be normally distributed

Errors are normally distributed around the regression line.

What assumptions do kendall and spearman's correlations have?

Do not assume X and Y are linearly related Non-parametric correlations.

What types of variables use Kendall and Spearman's correlation?

Ordinal variables

What is more common kendall or spearman's correlations?

Spearman's probably more common

What is simple linear regression?

Involves estimating an equation which describes the relationship between 2 variables, a dependent variable, and an explanatory or independent variable.. (linear line that follows the trend)

What is R squared. R^2?

It is called the "Goodness of Fit" model. It describes how variable y is on the regression line.