Descriptive Statistics Flashcards

1
Q

Name 3 types of data
(CDC)

A

Categorical

Discrete

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical
Binary or Nominal

A

Has two or more categories with no ordering to them.

E.g. Hair colour, Job title

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous
Ratio or Interval variables

A

Can take any fractional value
E.g. Reaction times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discrete
Ordinal, Ratio, or Interval variables

A

Has a fixed value with a logical order
E.g. Shoe size, Score out of 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median Cons?

A

Ignores a lot of the data

Difficult to calculate without a computer

Can’t use this with NOMINAL data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median Pros?

A

Insensitive to outliers

Often gives a real, meaningful data value

Useful for ordinal data, and skewed interval/ratio data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 measures of central tendency?

A

Mean- sum of data points
Median- middle score in data set
Mode- most in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median:
The middle value in a dataset, or the mean of the middle two values can be calculated as:

A

Odd value datasets: (n+1)÷2

Even value datasets: Line up middle two values then÷2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the equation for calculating the mean?

A

Sum of individual data points
÷
sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean Pros?

A

Uses all of the data

Is most effective for normally distributed datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean Cons?

A

Sensitive to outliers

Values are not always meaningful (we cant get a score of 6.74 out of 10!).

Only meaningful for RATIO and INTERVAL data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measures of spread:
Mode

A

no measures of spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measures of spread:
Median

A

‘distance-based’ measures such as range and interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measures of spread:
Mean

A

‘centre-based’ measures of spread such as variance and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interquartile range IQR:

Pros and cons are identical to the median

A

highest score - lowest score
but ignores most extreme values

Lower quartile= median of lower half of the data

Upper quartile= median of upper half of the data

Interquartile range = Upper quartile-Lower quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

IQR example:

A

Lower quartile =14th/15th value
=5 (Five times a day)

Upper quartile = 43rd/44th value
= 7 (Seven times a day)

IQR = 7-5 = 2

10
Q

Variance Pros?

A

Uses all of the data

Forms the basis of several other tests

10
Q

Deviance-

A

take each score and subtract it from the mean

10
Q

Variance Cons?

A

Requires a normal distribution

Sensitive to outliers

Units are not sensible (can we explain variance as scores2?)

10
Q

Sum of squared errors-

A

total the squared errors

10
Q

Squared errors-

A

take each deviance score and square it

10
Q

Variance-

A

average squared errors

10
Q

What is measure of spread that is equal to the unit of measurement of the dependent variable.

A

Standard Deviation (SD)

11
Q

How is SD calculated?

A

Calculated using the square root of variance.

11
Q

What does Ordinal data mostly use?

A

Median and IQR

11
Q

What does Categorical data mostly use?

A

Mode

12
Q

lmao you should literally know all of this from a level!

A

FRFR