Lecture 7 - Non parametric statistics and measures of association Flashcards

1
Q

Parametric vs Non parametric

A

• Parametric data:
– Assumes normal distribution, homogenous variance, and data sets are typically ratio or interval.
– Can draw more conclusions.
• Non-Parametric data:
– No assumption on distribution or variance relationship, and data sets are typically ordinal or nominal.
– More simple and less affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation and Correlation Coefficient

A

• Technique for investigating the relationship between two numerical variables
• A correlation coefficient is a measure of the relationship between two numerical measurements
– Magnitude of relationship – Direction of relationship – Bivariate distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Positive and negative correlation

A

• Positive Correlation (Direct)
– Present when high values in one variable are associated with high values of another variable or vice versa
• Negative Correlation (Indirect)
– When one values on one variable are associated with low values of other variable or vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation: Scatterplot

A

• Scatterplot
– A two dimensional graph displaying the relationship between two numerical characteristics of variables
• Whether there is an association between variables
– What the association looks like (linear? nonlinear?)
– The trend of the association (positive, negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pearson correlation coefficient (r)

A

• Measures the strength of linear association between two quantitative variables
– The r value has no units
• Level of measurement of the data for the two variables are either interval or ratio scale
Interpretation of r:
Negative correlation gets stronger as it approaches zero
Positive correlation gets stronger as it approaches 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Usefulness of scatterplot

A
  • We learn the truth by simply looking at the graphs:
  • The upper-left graph looks what we may have expected from the regression output: a straight-line relationship with some scatter about the best line.
  • The upper-right graph shows a strong relationship between x and y, but it is NOT linear.
  • In the lower-right graph, it doesn’t make any sense to fit a line since there is essentially no variability in the x values.
  • In the lower-left graph, there is a strong linear relationship with the exception of one outlier.
  • The moral of this example is: ALWAYS FIRST GRAPH YOUR DATA and don’t rely solely on summary output.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spearman Correlation Coefficient (rs)

A
  • The non parametric equivalent of Pearson product moment correlation
  • Measures the strength of association between two ranked variables
  • The Spearman correlation can be used when the assumptions of the Pearson correlation are markedly violated.
  • A second assumption is that there is a monotonic relationship between your variables.
  • It is calculated by first ranking the data for each quantitative variable and then applying the linear correlation coefficient formula on the ranked data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlation and causality

A

Correlation does not imply causation

• Example:
– MMR vaccination and autism spectrum disorders
– Gender and IQ
– Alcohol and lung cancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression analysis

A

• It is a common way of estimating the relationship among variables
– E.g.: Given the age of an individual, can we estimate their income levels?
– Also, can we use the age of the individual to predict their income levels?
Liner regression is the most basic and common type of predictive analysis
– At the centre of the regression analysis is the task of fitting a single straight line through a scatter plot
• Regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Non parametric statistics for hypothesis testing

A

• The population median  instead of the population mean μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sign test (+,-)

A

• Testing hypothesis concerning the median

H0: n = n0
H1: n /= n0

• If the null hypothesis is true, there is approximately an equal number of observations greater and less than the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly