Data Distributions and Introduction to Inferential Statistics Flashcards

Question

Why is the significance level in hypothesis testing important?

Answer 1

It is important because it helps us control the probability of making a Type I error, which is rejecting the null hypothesis when it is actually true.

Answer 2

p > 0.05, the test is considered non-significant and we cannot reject the null hypothesis

Answer 3

p < 0.05, the test is considered significant, and we can reject the null hypothesis and report the trends in the data

Answer 4

α is the chosen type 1 error rate or significance level

Answer 5

For a test to be significant, the calculated test statistic must be higher than the critical value. However, in practice, we use the p-values that tests output to determine significance.

Answer 6

- graphically using histograms and QQ plots - statistically using tests such as Shapiro-Wilks

Answer 7

- checking the histogram to see if it looks normal (bell-shaped) - checking descriptive statistics (mean, median & mode) - checking if approximately 70% of data falls within +/- one standard deviation of the mean - conducting a QQ plot or Shapiro-Wilk test for normality if the sample size is greater than 30

Answer 8

Using two measures - kurtosis - skewness

Answer 9

Peakedness or flatness - positive kurtosis = a more peaked distribution - negative kurtosis = a flatter distribution

Answer 10

Measure of asymmetry of a probability distribution. - positive skewness = a distribution with a longer tail on the right side - negative skewness = a distribution with a longer tail on the left side

Answer 11

A QQ plot is a graphical tool used to display the pattern of dispersion of the dataset against the theoretical distribution, typically normal distribution.

Answer 12

Used to determine whether a set of data comes from a normal distribution or not

Answer 13

- null hypothesis = observed data comes from a normal distribution - alternative hypothesis = observed data does not come from a normal distribution

Answer 14

No, the Shapiro-Wilks test needs to be performed on one numeric dependent variable at a time - if there are different levels of a categorical variable, then the test needs to be performed for each level of the categorical variable separately

Answer 15

Includes the - test statistic (W) - p-value - a statement indicating whether the data is normally distributed: p-value > 0.05 (accept null hypothesis), p-value < 0.05 (reject null hypothesis). Example: Shapiro-Wilk normality test data: mydata W = 0.935, p-value = 0.002345

Answer 16

Try transforming the data to achieve normality. If this is not possible, non-parametric tests can be used instead of parametric tests that require normality.

Answer 17

T-tests are used when the data is normally distributed and we want to test whether two means are significantly different - e.g. control vs treatment

Answer 18

The two samples are drawn from the same statistical population and will have the same mean. - no significant difference between means

Answer 19

The two samples are drawn from different statistical populations and have different means.

Answer 20

T-test: Paired Two Sample for Means T-test: Unpaired Two Sample for Means

Answer 21

Used for repeat measures on the same individuals - e.g. before and after a treatment

Answer 22

Comparing the means of two independent groups. - e.g. before and after a treatment

Answer 23

- mean - df - t stat - P(T<=t) two-tail

Answer 24

“There was a significant difference (t = __, df = __, p = __); ... ”

Answer 25

An F-Test to test for equality of variances

Answer 26

- df - F - P(F<=f) one-tail

Answer 27

calculated F-value > the critical F-value (for p=0.05), then the variances are significantly different. null = variances of two populations are equal alternative = variances of two populations are not equal

Answer 28

“There was no significant difference between variances (F = __, p=___), therefore a t-test with equal variances was performed.”

Answer 29

The Mann-Whitney U test is a non-parametric statistical test that is equivalent to a t-test.

Answer 30

Raw data is first converted to ranks before calculating the test statistic.

Answer 31

The Wilcox.test (also known as Mann-Whitney U test)

Answer 32

- takes two sample vectors as input - returns the test statistic, p-value, and alternative hypothesis

Answer 33

1: Identify whether you want to check if the means for a numerical variable are different between two groups of a categorical variable. 2: If the categorical variable is paired, proceed to Step 3a. Otherwise, proceed to Step 3b. 3a: Check if the numerical variable is normally distributed within both groups of the categorical variable. If it is, perform a paired t-test. Otherwise, perform a Wilcoxon Signed-rank test. 3b: Check if the numerical variable is normally distributed within both groups of the categorical variable. If it is, proceed to Step 4. Otherwise, perform a Mann-Whitney U test. 4: Do an F-test. If variances are equal, perform an unpaired t-test assuming equal variances. Otherwise, perform an unpaired t-test assuming unequal variances. Finally, make conclusions and STOP.

Answer 34

- the tests used for what purposes - the software used to implement those tests - any citation required for the software used (e.g., R and RStudio, not excel)

Answer 35

- describe the outcome of each test result - report test-statistics (e.g. t or F or Wilk’s lambda, a measure of effect size) - df (an indicator of sample size) - p-value and your decision (can you reject the null hypothesis or not) - a statement of the biological meaning of your result

Data Distributions and Introduction to Inferential Statistics Flashcards

(59 cards)