First year Statistics Flashcards
To recap what was learnt about statistics in my first year.
Give the types of data.
Discrete:
Nominal - lowest form, categorical, basic
Ordinal - has a rank, but categorical e.g. Likert Scales
Continuous:
Interval - regular intervals, ranked, no absolute 0 e.g. height, time
Ratio - highest power, ranked, absolute 0 e.g. %s
What are the assumptions for the different types of tests?
Sample size, data type, skewness, peakedness
Non-parametric: <30, all discrete tests, continuous data that is not normally distributed
Parametric: >30, continuous data that is normally distributed
Name the inferential statistical tests for both types.
Non-parametric: Chi-Squared, Mann-Whitney, Kruskal-Wallis
Parametric: T-tests 1 and 2-tailed, ANOVA
Name the types of relational statistics.
Measure correlation (not causation) based on distance from 0.
Non-parametric: Spearman’s Rank
Parametric: Pearson’s r
What are the different confidence intervals?
- 05 = 95%
- 01 = 99%
- 001 = 99.9%
What is Simple Linear Regression? What does it permit us to do?
Permits us to make numerical predictions of one variable by reference to another
Involves the comparison of a dependent variable (changed) and independent variable (measured)
Attempts to predict the changes in Y based on changes observed in X; proportion of changes in Y explained by X
What is variance?
Sum of squares/degrees of freedom
Explained = diffrence between mean and predicted values
Unexplained = difference between observed and predicted (a.k.a. RESIDUAL)
What are the outputs of a simple linear regression model? Explain their significance.
R-square = type and strength of relationship; +1 is strong positive, -1 strong negative
F-ratio = explained / unexplained variance;
The further from 1, the better the model at explaining changes in Y
Test statistic = if P > 0.05, then can accept alternative hypothesis and reject the null, and conclude that the model explains a significant proportion of changes in Y
Summarise the overall aim of simple linear regression.
To maximise explained variance which is implied by a large F-ratio. The higher the R-square, the more variance in Y explained by C i.e. there is a higher proportion of variance explained by the model. This also suggests values were closer to the line