Stats relationships Flashcards

1
Q

What is the Chi-square test?

A

It is used to discover if there is an association between categorical variables.
Used to analyse contingency tables.
Most common is Pearson’s chi-square test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions of the Chi-square test?

A

Variables should be categorical - ordinal or nominal.
Should compare two or more groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Chi-square test for differences?

A

Tests whether observations are different from expected values?
Used to measure the difference between observed and expected frequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the equation of the chi-square test for differences?

A

See picture.
O = observed values.
E = expected values.
K = categories.
Degrees of freedom (df): k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the null hypothesis for the chi-square test for differences?

A

There is no differences between the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the method for chi-square test for differences?

A

First calculate the expected values.
I.e. if the expected ratio is 3:1, then multiply the total observed by 3/4 and 1/4 to get the expected values E.
Then calculate observed (O) - E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the analysis of the chi-square test for differences?

A

Compare the results in chi-square distribution.
If the p-value is more than 0.05, there is insufficient evidence to reject the null hypothesis (cannot assume difference between observed and expected values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the chi-square test for associations?

A

Tests whether the frequencies of two or more groups are associated in any way.
Uses the same methodology as the test for differences once the expected values have been determined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are the expected values calculated if there is no association for the test for association?

A

E = column total x row total / grand total.
See picture example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is testing for relationships?

A

Determines if there is a true relationship between measures, or whether the pattern observed is due to random variation.
Methods: correlation measures (e.g. Pearson) and linear regression.
There can be lots of scatter- natural variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the variables in tests for relationships?

A

Predictor variable - explanatory or independent variable.
Outcome variable - dependent or response variable.
The predictor variable may be related to the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the method of testing for relationships?

A

Draw a scatter plot - visual inspection is informative.
Plot the predictor variable on the x-axis, and the outcome variable on the y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the types of relationships?

A

Random
Positive
Negative
Complex
Linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a random scatter?

A

No trend or relationship
See picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a positive relationship?

A

As the predictor variable increases, the outcome variable increases.
See picture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are negative relationships?

A

As the predictor variable increases, the outcome variable decreases.
See picture

17
Q

What are complex relationships?

A

The outcome variable rises and falls as the predictor variable increases.
This may require data transformation prior to statistical analysis.
See picture.

18
Q

What are linear relationships?

A

The outcome variable (y) is related to the predictor variable (x) by the linear equation:
Y = a + bx
B is the gradient
A is the intercept
See picture

19
Q

What is linear regression?

A

Quantifies the linear relationship between two sets of paired measurements.
Relation defined by the equation y = a + bx
Regression analysis estimates the line of best fit through the scattered plotted points.
See picture.

20
Q

What is the aim of the linear regression?

A

The line of best fit aims to minimise the variability of the sum of squares of the distances from the line.
If the variability is small, regression is significant.
If variability is high, regression is non-significant.
See picture.

21
Q

How is linear regression interpreted?

A

Coefficient b gives the change in the outcome variable (y) for a unit change in the predictor variable (x).
The intercept gives the value of y when x=0.
The line gives the mean of y for each value of x.

22
Q

What are outliers?

A

Extreme values found in a data set - might represent artefacts.
They can skew data and affect statistical measures and linear regression models.
Visualise them in plots - box, scatter, histograms.

23
Q

What is importance about outliers?

A

Are the outliers actual plausible values or due to a measurement error.
Outliers need looking at in the context of the experiment and data collection.
Outliers can be removed from analysis if you conclude they are artefacts that can affect data interpretation.

24
Q

What is Pearson’s correlation test?

A

Tests to determine whether there is a linear association between two sets of paired measurements.
Examines the strength with which two sets of measurements show positive or negative linear association.
It measures the extent of association by calculating a single statistic - the Pearson’s correlation coefficient r.

25
Q

What are the assumptions of Pearson’s correlation?

A

Linear relationship between variables.
At least one variable is approximately normally distributed.

26
Q

What is the interpretation of Pearson correlation?

A

The correlation coefficient ranges between -1 and +1.
A value of +1 indicates a perfect positive correlation.
A value of -1 indicates a perfect negative correlation.
A value of 0 indicates no correlation at all.

The further r is from 0 and the larger the sample size, the less likely a correlation could have occurred by chance.

27
Q

What do Pearson correlation graphs look like?

A

see picture

28
Q

What is spearman’s correlation?

A

Nonparametric measure of rank correlation.
Same as Pearson’s, but uses the ranks of the data rather than actual data.
Indicates direction of association - Y tends to decrease/increase when X increases.
Used if Pearson’s assumptions cannot be met.

29
Q

What is the significance of correlations?

A

Is the observed correlation real or just a coincidence?
Need to test for significance - using p-value.
e.g. there is a positive correlation, and the p-value is less than threshold, so it is significant.
Or there is a positive correlation, but the p-value is bigger than threshold, so is not a significant correlation.

30
Q

What is important about relationship tests?

A

Relationships do not imply causation .
Relationships between variables do not imply changes in one variable cause the changes in the other.

31
Q

What is the summary of relationship and associations?

A

Association/difference tests - compare observations with expected values.
Correlation tests - evaluate relationships between variables.
Pearson’s correlation - tests for linear relationship between variables. Linear regression models this relationship.