Week 7 Descriptive & inferential stats Flashcards
What is statistics?
Practice or science of collecting & analysing numerical data in large quantities, especially to make inferences on a population based on a representative sample
Descriptive stats
Make descriptions & summaries of population through numbers,graphs: central tendency, data spread, count, proportion, skewness etc
Inferential stats
- Provide meaningful inferences/conclusion on population, based on sample data
- To make generalisations & predictions
What are the types of statistics?
- Descriptive
- Inferential
What are the measures of central tendency?
Mean, Median, Mode
When is mean usually used?
Suitable for symmetric distribution, often with SD
When is median usually used?
Suitable for skewed distribution, often with IQR
Why is median most used in skewed distribution?
It is less sensitive to extreme values unlike mean where it is pulled with the direction of skew
What is variance?
Average of squared differences of each data point from mean, squared unit of mean
What is standard deviation?
Square root of a variance
What does a small & large SD mean?
Small - data points are closer around the mean
Large - data points are further to mean
What does small & large variance mean?
Small - data are close to mean & each other
Large - data are far from mean & each other
What is the empirical rule?
68% - within 1 SD from mean
95% - within 2 SD from mean
99.7% - within 3 SD from mean
What is the purpose of inferential stats?
- To generalise sample characteristics to population parameters where they are just estimations & have to account for inaccuracies & errors using confidence interval (CI)
- Test hypotheses
What is confidence interval?
a range of values where the true mean lies
What is the distribution assumed for parametric tests?
Normal (Gaussian) distribution
What are the limitations of inferential stats?
- Can never be fully accurate bc using sample data to estimate that of a population
- Interpretation of data is subjected to researchers reasoning
What is non parametric test based on?
No need to follow normal distribution mostly based on rank order or how common data is
What central tendency does parametric & non parametric measure?
Para - mean
Non para - median
What type of variables does parametric measure?
Continuous
What type of variables does non parametric measure?
Continuous and discrete
Assumptions for parametric test?
- DV is continuous
- DV follows normal distribution
- Homogenity of variance between groups (same)
- Comparison groups are independent
- Preferably no significant outliers
How to check for normality?
- Visualisation - Q-Q plot or histogram
- Statistical hypothesis testing: Shapiro Wilk test
T-test
Determine whether there is a significant difference between the means of two groups. It is widely used in hypothesis testing when comparing sample means to make inferences about population mean
F-test: ANOVA
To compare variances or equality of means among 3 or more groups/conditions
What is analysis of variance (ANOVA) for?
Analyses how entire set of group means are spread out regardless of group differences
What does a measure of linear association mean?
Produces a coefficient used to quantify the strength & direction of a rs/association between 2 or more variables where value of coefficient is -1 to +1
Examples of non parametric tests
Wilcoxon signed rank test
Mann-Whitney U test
Kruskal Wallis test
Friedman’s test
Bivariate correlation
Focuses on relation between 2 variables, need best fit line for continuous variable - not applicable to curvilinear or discontinuous rs
Chi square test of independence
Measure significance of association between 2 categorical variables e.g no numerical meaning
Linear regression
An estimation of association between a continuous DV and more than 1 IV
What are the measures of linear association?
- Bivariate correlation
- Chi square test
- Regression
Assumptions of linear regression
- Linear rs
- Independence
- Homoscedasticity - variance of scores are similar
- Normality
What are the types of regression?
- Simple linear
- Multiple linear
- Nonlinear
Simple linear
Compare 2 variables only y=a +bx + c
Multiple linear regression
Multiple IV and DV - best fit line
Logistic regression
An estimation of association between a binary DV (yes/no) and more than 1 IV