Statistics 1 Flashcards

Plotting data, trendlines, residuals

1
Q

What colours should you plot data in?

A

Black and white (simple as possible)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the benefit of plotting data before analysis of results?

A

To spot powerful trends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is categorical/nominal data?

A

Variables existing in a small number of alternative states (typically cannot order categories) e.g. blood groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is ordinal/ranked data?

A

Variables existing in a small number of alternative states that has a clear order e.g. 1st/2nd/3rd place in a race.

Distance between categories is undefined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is discrete data?

A

Variables existing in a limited number of values.
Complete integers that cannot be divided into fractions.
Values convey order and size of difference.
e.g. number of students in a class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is continuous data?

A

Variables can take any value e.g. height.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is derived data?

A

Variables created from other variables using an expression.
Converting original variables into more convenient form that is more easily understood e.g. ratios/proportions/percentages/indices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the type of variable impact a write up for a practical?

A

Can influence how to plot data/restrict statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an R squared value?

A

Represents the proportion of variation explained by the trendline.
Helps to choose which trendline fits data.

R = coefficient of determination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of R squared value is normally better?

A

A higher one (but not the only factor).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Occam’s Razor/The Parsimony principle?

A

When faced with competing hypotheses, select the one that makes the fewest assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a residual?

A

The distance between a fitted line and observed values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is least squares regression?

A

Method to find the line of best fit to minimise the sum of squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R^2 =

A

1 - (SSres/SStot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SStot?

A

total sum of squares

sum of ((values of y - mean values of y)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is SSres?

A

the sum of the squares of residuals

sum of ((values of y - predicted values from the regression model for data point)^2)

17
Q

What does it mean to extrapolate a trendline?

A

To extend a relationship into the unknown.

18
Q

Why shouldn’t you extrapolate a trendline beyond reach of data?

A

You cannot predict relationships.
Extrapolating a trendline beyond the range of data is generally unreliable because the trendline is based on patterns observed only within the data range and may not hold outside that range.