descriptive statistics (w4) Flashcards

1
Q

what things can be sued to describe data

A

histograms, central tendency, spread, shape, outliers, box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are central tendencies

A

mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are spreads of data

A

quantile/quartile/percentile
variance and standard deviation
z-score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the shape in terms of describing data

A

skewness, kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

purpose of a histogram

A

to visualise how data are distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the mode, what types of variables can it be used for

A

most occurring answer (highest stack), can be multiple modes
all types of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the median, what types of variables can it be used for

A

a middle value dividing data into 2 groups with the same number (middle value)
only ordinal, interval, ratio (ordered variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the mean, what types of variables can it be used for

A

= ∑ coin value
number of coins
(sum of all value/total number)
only interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

which central tendency (mean, median, mode) would an outlier affect most

A

mean and it depends on actual values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why need spread of data

A

distributions can have same mean/median but one may be much more spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to calculate spread

A

divide data into sections containing same number of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are quantiles

A

cut off points diving equal sections of data, for N sections they are called N-quantiles (N-1 values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are quartiles

A

when there are 4 sections in total, they are called quartiles (1st-3rd), median is 2nd quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are percentiles

A

when there are 100 sections in total, they are called percentiles (1st-99th), median is 50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the 2nd moment

A

how hard to spin data around mean
= ∑ [distance from mean]2 to each data point
/number of data points
= variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is standard deviation

A

square root of variance, the standard distance from the mean

17
Q

what does mean +- SD show

A

where the centre is and how spread data points are around it

18
Q

what is the z-score and what’s it for

A

given SD, distance can be describe as a ratio with respect to SD, which is the z-score
enables fair comparisons of deviations

19
Q

who’s height is more deviated from the mean:
female: 5.3 +- 0.3ft male: 5.8 +- 0.4ft
mary is 5.6, dave is 6.1

A

mary: (5.6-5.3)/0.3 = 1
dave: (6.1-5.8)/0.4 = 0.75
mary’s height is more deviated from the mean compared to dave

20
Q

what is skewness

A

measures degree of asymmetry

21
Q

what is 3rd moment, how do you make it dimensionless

A

= ∑ [distance from mean]3 to each data point
/number of data points
divide it by SD^3
ie: skewness = 3rd moment/SD^3

22
Q

what does: 0 and high skewness mean

A

0 = data symmetrically distributed
high = distribution highly symmetrical

23
Q

what does + and - skewness mean

A

+ data skewed left
- data skewed right

24
Q

what is kurtosis

A

the sharpness of graph

25
Q

what it the 4th moment, how do you make it dimensionless

A

=∑ [distance from mean]4 to each data point
/number of data points
divide it by SD^4
kurtosis = 4th moment/SD^4

26
Q

what is excess kurtosis

A

kurtosis is always +ve but normally subtract 3

27
Q

what are the 1st, 2nd, 3rd and 4th moment around the mean

A

1st - mean
2nd - variance (how spread)
3rd - skewness (how skewed/distorted)
4th - kurtosis (how thin)

28
Q

what are outliers

A

extreme values relative to bulk of values in a data set

29
Q

what can outliers be due to

A

inaccuracies in data processing, problems with methodology (measures, instruments, participants not following instructions), an actual extreme value from an unusual participant

30
Q

how to detect outliers (2 ways)

A

based on z-score
based on IQR (inter quartile range) (width between 1st and 3rd quartile)

31
Q

how to detect outlier based on z-score

A

outlier if z-score is more then 3 or less than -3
ie: distance from mean is more than 3x SD

32
Q

how to detect outlier based on IQR

A

outlier if value is greater than 1.5 IQR above the 3rd quartile or smaller than 1.5 IQR below the 2nd quartile

33
Q

what is a box plot

A

plot summarising quartile based statistics of a data set

34
Q

what does a box plot include

A

location of quartiles
range of data excluding outliers
outliers detected by quartile