Descriptive Statistics Flashcards

1
Q

Name 3 types of data
(CDC)

A

Categorical

Discrete

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical
Binary or Nominal

A

Has two or more categories with no ordering to them.

E.g. Hair colour, Job title

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous
Ratio or Interval variables

A

Can take any fractional value
E.g. Reaction times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete
Ordinal, Ratio, or Interval variables

A

Has a fixed value with a logical order
E.g. Shoe size, Score out of 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median Cons?

A

Ignores a lot of the data

Difficult to calculate without a computer

Can’t use this with NOMINAL data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median Pros?

A

Insensitive to outliers

Often gives a real, meaningful data value

Useful for ordinal data, and skewed interval/ratio data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 measures of central tendency?

A

Mean- sum of data points
Median- middle score in data set
Mode- most in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median:
The middle value in a dataset, or the mean of the middle two values can be calculated as:

A

Odd value datasets: (n+1)÷2

Even value datasets: Line up middle two values then÷2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the equation for calculating the mean?

A

Sum of individual data points
÷
sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean Pros?

A

Uses all of the data

Is most effective for normally distributed datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean Cons?

A

Sensitive to outliers

Values are not always meaningful (we cant get a score of 6.74 out of 10!).

Only meaningful for RATIO and INTERVAL data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measures of spread:
Mode

A

no measures of spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measures of spread:
Median

A

‘distance-based’ measures such as range and interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measures of spread:
Mean

A

‘centre-based’ measures of spread such as variance and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interquartile range IQR:

Pros and cons are identical to the median

A

highest score - lowest score
but ignores most extreme values

Lower quartile= median of lower half of the data

Upper quartile= median of upper half of the data

Interquartile range = Upper quartile-Lower quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

IQR example:

A

Lower quartile =14th/15th value
=5 (Five times a day)

Upper quartile = 43rd/44th value
= 7 (Seven times a day)

IQR = 7-5 = 2

10
Q

Variance Pros?

A

Uses all of the data

Forms the basis of several other tests

10
Q

Deviance-

A

take each score and subtract it from the mean

10
Q

Variance Cons?

A

Requires a normal distribution

Sensitive to outliers

Units are not sensible (can we explain variance as scores2?)

10
Q

Sum of squared errors-

A

total the squared errors

10
Q

Squared errors-

A

take each deviance score and square it

10
Q

Variance-

A

average squared errors

10
Q

What is measure of spread that is equal to the unit of measurement of the dependent variable.

A

Standard Deviation (SD)

11
Q

How is SD calculated?

A

Calculated using the square root of variance.

11
What does Ordinal data mostly use?
Median and IQR
11
What does Categorical data mostly use?
Mode
12
lmao you should literally know all of this from a level!
FRFR