E-module 2 - Choosing statistics Flashcards

1
Q

What are the 2 types of analysis of data?

A

Correlation

Comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definitions of correlation and comparison as types of data analysis.

A

Correlation
- hypothesis tests to evaluate relationship between variables

Comparison
- hypothesis tests to evaluate differences between groups/populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 types and 4 subtypes of data?

A

Types

  • Quantitative - numeric information
  • Qualitative/Categorical - information that can’t be measured

Subtypes

  • Quantitative gives rise to CONTINUOUS and DISCRETE (counted) data
  • Qualitative/categorical gives rise to NOMINAL (unordered) and ORDINAL (ordered) data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which subtypes of data are PARAMETRIC and NON-PARAMETRIC?

A

PARAMETRIC
- continuous quantitative

NON-PARAMETRIC

  • discrete quantitative
  • nominal categorical
  • ordinal categorical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the principle segregating continuous and discrete data?

A

Continuous can be subdivided (potentially) infinitely where discrete cannot
- e.g. age is continuous if measured exactly in months, days, hours etc, but discrete if measured in years (overlap here within a category between continuous and discrete data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is important to remember about continuous vs discrete data?

A

Means or rates are always continuous data. Likelihood is they were generated using discrete data BUT they themselves are continuous

e. g. heart rate is continuous but number of heartbeats in a minute is discrete
- this is because continuous data can take ANY value (e.g. 2, 2.5, 3) but discrete data cannot take certain values (e.g. 2, 3 but NOT 2.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you check for normality in data?

A

When you have CONTINUOUS data - discrete and qualitative data is ALWAYS NON-PARAMETRIC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does normality measure?

A

Measures central tendency and dispersion of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 2 tests used for testing normality and their conditions?

A

Shapiro-Wilk test - n<50

Kolmogorov-Smirnov test - n>50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When can the data be conferred as normal or not normal?

A

p-value of normality test

if p<0.05 data is NOT normal, otherwise data is normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 outcomes and subsequent distributions of data after normality testing?

A

YES - Gaussian/Normal distribution
NO - Skewed distribution
NO - Kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 2 main features of normal distributions?

A
  • 68-95-99.7 rule - 2/3rds of data lies within 1 SD of the mean, 95% within 2 SDs, 99.7% within 3 SDs
    AND
  • Distribution is symmetrical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the features of skewed distributions?

A

ASYMMETRICAL

  • mean, median and mode all separated (usually found together in normal dist.) (mode at top of curve, median just downslope from top, mean just further downslope from the median)
  • skew is named according to which direction has the long tail e.g. right/positive skew = long positive/right tail and vice versa
  • uneven tails with many data points at high/low end of range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the features of Kurtosis?

A

Kurtosis is where data is heavy or light-tailed with respect to a normal distribution

  • heavy-tailed = outliers create a wide distribution (graph is flattened)
  • light-tailed = lack of outliers creates a narrow distribution (graph is steepened)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition of unpaired/independent and paired/dependent data/groups and an example of a study employing this?

A

Paired/dependent = when two (or more) sets of data have come from the same individual e.g. same subject at different points of the day
- longitudinal study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Definition of unpaired/independent data/groups and an example of a study employing this?

A

Unpaired/independent = comparing data from two groups with no common factors (two independent groups)
- cross-sectional study

17
Q

Which statistical tests will you use for parametric data and what are their individual constraints?

A

Paired t-test = parametric, 2 groups, paired

Unpaired t-test = parametric, 2 groups, independent

Repeated measures, one-way ANOVA = parametric, 3 groups or more, paired

One-way ANOVA = parametric, 3 groups or more, unpaired

18
Q

Which statistical tests will you use for non-parametric data?

A

Wilcoxon (signed rank) test = non-parametric, 2 groups, paired

Mann-Whitney U test = non-parametric, 2 groups, unpaired

Kruskal-Willis test = non-parametric, 3 or more groups (paired)

(Friedman test = non-parametric, 3 or more groups, unpaired)
- medlearn tree only has K-W test for 3 or more groups

19
Q

When testing for correlation, which statistical tests would you use and when?

A

Pearson’s test - data is continuous and follows a normal distribution

Spearman’s rank test - data is continuous but does not follow a normal distribution

Chi-squared test - data is discrete

20
Q

What is the range of resulting values from Pearson’s/Spearman’s rank tests and what symbol indicates them?

A

Pearson’s - r - from -1 to 1, perfect negative to perfect positive correlation (strong and weak in between)

Spearman’s rank - rho (looks like p) - from -1 to 1, perfect negative to perfect positive correlation (strong and weak in between)

21
Q

Which type of data, parametric or non-parametric, is better to use and why?

A

PARAMETRIC

  • easier to understand
  • more powerful as LESS LIKELY TO incorrectly reject/fail to reject the hypothesis
22
Q

What are descriptive statistics?

A

Descriptive statistics are used to categorise large data-sets into a tangible format
Raw data is usually presented in the form of descriptive statistics e.g. provide mean +/- SD/SEM of a collection of data points

23
Q

What are measures of central tendency?

A

Mean, mode, median

24
Q

What are measures of data dispersion?

A

Variance, standard deviation (SD), standard error/standard error of the mean (SE/SEM)

25
Q

How do you calculate variance and standard deviation of a sample and a population?

A

Variance - sum the squared differences between each data value and the mean. divide all of this by the number of values (n)
- here for population, use n on the bottom of the fraction, for sample use n-1

Standard deviation - for both, this is the square root of the variance so calculate it as such

26
Q

How do you calculate the standard error of the mean?

A

This is standard deviation divided by square root (n) (square root of the number of values)

27
Q

Which measure of central tendency should you use in cases where the data is normally distributed and not normally distributed?

A

Normally distributed = mean

Not normally distributed = median

28
Q

What is the standard error used for and hence, when can it only be used?

A

Standard error is used as a measure of how well the sample data reflects the population
- can only, therefore, be used with sample SD/variance