Data analysis Flashcards
Conditions for chi square
- Independence
- 5+ Sample size
- > 1 degree of freedom
What do rows in data represent?
Observations
What do columns in data represent
Variables
Experimental Study
Manipulation of an independent variable to measure the impact on a dependent variable (control group etc.)
Observational study
Study of the relationship between variables with no manipulation/intervention
Where does the explanatory variable belong
Horizontal axis
Where does the response variable belong
Vertical axis
What is sample standard deviation
Variance
Calculate the probability (A AND B)
P(A) x P(B)
Calculate the probability (A OR B)
P(A) + P(B) - P(A AND B)
- e.g. (0.3 + 0.7) - (0.7 x 0.3) = 0.79
Calculate the Probability A given B
If the events are independent, the probability is just P(A) - the occurrence of B does not affect the probability of A
A given B - conditional
P(A and B)/P(B)
Bayes Theorem
P(A/B) = P(B/A)P(A) / P(B)
Z score formula
x-mean / standard deviation
Binomial distribution
- Fixed number of independent trials
- 2 possible outcomes ‘success’ and ‘failure’
T-distribution
t = x - μ / s / root(n)
x - sample mean
μ - population mean
s - sample SD
n - Sample size
How to calculate degrees of freedom?
Sample size (n) - 1
What does mutually exclusive mean?
When P(A and B) = 0
Events are mutually exclusive when they cannot happen at the same time, and share no basic outcomes
Events are independent if…
The occurrence of one has no impact on the probability of the other occurring
Two tailed or One tailed test
- One tailed - only want to know either lower or higher
- Two tailed - testing for a difference in either direction
Standard error
- Standard deviation / (root)Sample
Key notation
𝑥- sample mean
𝑝- sample proportion
𝜇- population mean
𝑝- population proportion
𝑠- sample standard deviation
𝜎- population standard deviation
𝑛- sample size
𝛽- population regression coefficient
𝑏- sample regression coefficient
Σ- summation operator
Extrapolation
- When you predict about something outside the range of data you already have, based on the value of the predictor variable
- e.g. using the adults weight for the baby weight model
Residual
Observed value - predicted value