descriptive statistics (w4) Flashcards by sarah pettigrew

what things can be sued to describe data

histograms, central tendency, spread, shape, outliers, box plots

How well did you know this?

Not at all

Perfectly

what are central tendencies

mode, median, mean

How well did you know this?

Not at all

Perfectly

what are spreads of data

quantile/quartile/percentile
variance and standard deviation
z-score

How well did you know this?

Not at all

Perfectly

what is the shape in terms of describing data

skewness, kurtosis

How well did you know this?

Not at all

Perfectly

purpose of a histogram

to visualise how data are distributed

How well did you know this?

Not at all

Perfectly

what is the mode, what types of variables can it be used for

most occurring answer (highest stack), can be multiple modes
all types of variables

How well did you know this?

Not at all

Perfectly

what is the median, what types of variables can it be used for

a middle value dividing data into 2 groups with the same number (middle value)
only ordinal, interval, ratio (ordered variables)

How well did you know this?

Not at all

Perfectly

what is the mean, what types of variables can it be used for

= ∑ coin value
number of coins
(sum of all value/total number)
only interval and ratio

How well did you know this?

Not at all

Perfectly

which central tendency (mean, median, mode) would an outlier affect most

mean and it depends on actual values

How well did you know this?

Not at all

Perfectly

why need spread of data

distributions can have same mean/median but one may be much more spread

How well did you know this?

Not at all

Perfectly

how to calculate spread

divide data into sections containing same number of data

How well did you know this?

Not at all

Perfectly

what are quantiles

cut off points diving equal sections of data, for N sections they are called N-quantiles (N-1 values)

How well did you know this?

Not at all

Perfectly

what are quartiles

when there are 4 sections in total, they are called quartiles (1st-3rd), median is 2nd quartile

How well did you know this?

Not at all

Perfectly

what are percentiles

when there are 100 sections in total, they are called percentiles (1st-99th), median is 50th percentile

How well did you know this?

Not at all

Perfectly

what is the 2nd moment

how hard to spin data around mean
= ∑ [distance from mean]2 to each data point
/number of data points
= variance

How well did you know this?

Not at all

Perfectly

what is standard deviation

Study These Flashcards

square root of variance, the standard distance from the mean

what does mean +- SD show

Study These Flashcards

where the centre is and how spread data points are around it

what is the z-score and what’s it for

Study These Flashcards

given SD, distance can be describe as a ratio with respect to SD, which is the z-score
enables fair comparisons of deviations

who’s height is more deviated from the mean:
female: 5.3 +- 0.3ft male: 5.8 +- 0.4ft
mary is 5.6, dave is 6.1

Study These Flashcards

mary: (5.6-5.3)/0.3 = 1
dave: (6.1-5.8)/0.4 = 0.75
mary’s height is more deviated from the mean compared to dave

what is skewness

Study These Flashcards

measures degree of asymmetry

what is 3rd moment, how do you make it dimensionless

Study These Flashcards

= ∑ [distance from mean]3 to each data point
/number of data points
divide it by SD^3
ie: skewness = 3rd moment/SD^3

what does: 0 and high skewness mean

Study These Flashcards

0 = data symmetrically distributed
high = distribution highly symmetrical

what does + and - skewness mean

Study These Flashcards

+ data skewed left
- data skewed right

what is kurtosis

Study These Flashcards

the sharpness of graph

what it the 4th moment, how do you make it dimensionless

=∑ [distance from mean]4 to each data point /number of data points divide it by SD^4 kurtosis = 4th moment/SD^4

what is excess kurtosis

kurtosis is always +ve but normally subtract 3

what are the 1st, 2nd, 3rd and 4th moment around the mean

1st - mean 2nd - variance (how spread) 3rd - skewness (how skewed/distorted) 4th - kurtosis (how thin)

what are outliers

extreme values relative to bulk of values in a data set

what can outliers be due to

inaccuracies in data processing, problems with methodology (measures, instruments, participants not following instructions), an actual extreme value from an unusual participant

how to detect outliers (2 ways)

based on z-score based on IQR (inter quartile range) (width between 1st and 3rd quartile)

how to detect outlier based on z-score

outlier if z-score is more then 3 or less than -3 ie: distance from mean is more than 3x SD

how to detect outlier based on IQR

outlier if value is greater than 1.5 IQR above the 3rd quartile or smaller than 1.5 IQR below the 2nd quartile

what is a box plot

plot summarising quartile based statistics of a data set

what does a box plot include

location of quartiles range of data excluding outliers outliers detected by quartile

descriptive statistics (w4) Flashcards

(34 cards)