Statistics 1 Flashcards
Plotting data, trendlines, residuals
What colours should you plot data in?
Black and white (simple as possible)
What is the benefit of plotting data before analysis of results?
To spot powerful trends.
What is categorical/nominal data?
Variables existing in a small number of alternative states (typically cannot order categories) e.g. blood groups.
What is ordinal/ranked data?
Variables existing in a small number of alternative states that has a clear order e.g. 1st/2nd/3rd place in a race.
Distance between categories is undefined.
What is discrete data?
Variables existing in a limited number of values.
Complete integers that cannot be divided into fractions.
Values convey order and size of difference.
e.g. number of students in a class.
What is continuous data?
Variables can take any value e.g. height.
What is derived data?
Variables created from other variables using an expression.
Converting original variables into more convenient form that is more easily understood e.g. ratios/proportions/percentages/indices.
How does the type of variable impact a write up for a practical?
Can influence how to plot data/restrict statistical analysis.
What is an R squared value?
Represents the proportion of variation explained by the trendline.
Helps to choose which trendline fits data.
R = coefficient of determination.
What kind of R squared value is normally better?
A higher one (but not the only factor).
What is Occam’s Razor/The Parsimony principle?
When faced with competing hypotheses, select the one that makes the fewest assumptions.
What is a residual?
The distance between a fitted line and observed values.
What is least squares regression?
Method to find the line of best fit to minimise the sum of squared residuals.
R^2 =
1 - (SSres/SStot)
What is SStot?
total sum of squares
sum of ((values of y - mean values of y)^2)