descriptive statistics (w4) Flashcards

1
Q

what things can be sued to describe data

A

histograms, central tendency, spread, shape, outliers, box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are central tendencies

A

mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are spreads of data

A

quantile/quartile/percentile
variance and standard deviation
z-score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the shape in terms of describing data

A

skewness, kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

purpose of a histogram

A

to visualise how data are distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the mode, what types of variables can it be used for

A

most occurring answer (highest stack), can be multiple modes
all types of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the median, what types of variables can it be used for

A

a middle value dividing data into 2 groups with the same number (middle value)
only ordinal, interval, ratio (ordered variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the mean, what types of variables can it be used for

A

= ∑ coin value
number of coins
(sum of all value/total number)
only interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

which central tendency (mean, median, mode) would an outlier affect most

A

mean and it depends on actual values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why need spread of data

A

distributions can have same mean/median but one may be much more spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to calculate spread

A

divide data into sections containing same number of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are quantiles

A

cut off points diving equal sections of data, for N sections they are called N-quantiles (N-1 values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are quartiles

A

when there are 4 sections in total, they are called quartiles (1st-3rd), median is 2nd quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are percentiles

A

when there are 100 sections in total, they are called percentiles (1st-99th), median is 50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the 2nd moment

A

how hard to spin data around mean
= ∑ [distance from mean]2 to each data point
/number of data points
= variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is standard deviation

A

square root of variance, the standard distance from the mean

17
Q

what does mean +- SD show

A

where the centre is and how spread data points are around it

18
Q

what is the z-score and what’s it for

A

given SD, distance can be describe as a ratio with respect to SD, which is the z-score
enables fair comparisons of deviations

19
Q

who’s height is more deviated from the mean:
female: 5.3 +- 0.3ft male: 5.8 +- 0.4ft
mary is 5.6, dave is 6.1

A

mary: (5.6-5.3)/0.3 = 1
dave: (6.1-5.8)/0.4 = 0.75
mary’s height is more deviated from the mean compared to dave

20
Q

what is skewness

A

measures degree of asymmetry

21
Q

what is 3rd moment, how do you make it dimensionless

A

= ∑ [distance from mean]3 to each data point
/number of data points
divide it by SD^3
ie: skewness = 3rd moment/SD^3

22
Q

what does: 0 and high skewness mean

A

0 = data symmetrically distributed
high = distribution highly symmetrical

23
Q

what does + and - skewness mean

A

+ data skewed left
- data skewed right

24
Q

what is kurtosis

A

the sharpness of graph

25
what it the 4th moment, how do you make it dimensionless
=∑ [distance from mean]4 to each data point /number of data points divide it by SD^4 kurtosis = 4th moment/SD^4
26
what is excess kurtosis
kurtosis is always +ve but normally subtract 3
27
what are the 1st, 2nd, 3rd and 4th moment around the mean
1st - mean 2nd - variance (how spread) 3rd - skewness (how skewed/distorted) 4th - kurtosis (how thin)
28
what are outliers
extreme values relative to bulk of values in a data set
29
what can outliers be due to
inaccuracies in data processing, problems with methodology (measures, instruments, participants not following instructions), an actual extreme value from an unusual participant
30
how to detect outliers (2 ways)
based on z-score based on IQR (inter quartile range) (width between 1st and 3rd quartile)
31
how to detect outlier based on z-score
outlier if z-score is more then 3 or less than -3 ie: distance from mean is more than 3x SD
32
how to detect outlier based on IQR
outlier if value is greater than 1.5 IQR above the 3rd quartile or smaller than 1.5 IQR below the 2nd quartile
33
what is a box plot
plot summarising quartile based statistics of a data set
34
what does a box plot include
location of quartiles range of data excluding outliers outliers detected by quartile