Topic 3 (statistics) Flashcards

1
Q

measures of central tendency adv and dis

A

mode:
pros- qualitative data, not affected by outliers or errors or omission always an observed data value
cons: doesnt use all data, not representative if low frequency or if other values have similar frequency

median:
pros-not affected by outliers or significantly affected by error or omissions
cons: doesnt make use of all data

mean:
pros: uses all values in set, large set = outlier not big impact on data
cons: data is small = outliers have a big impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

measure of dispersion/spread adv and dis

A

range:
pros reflects full data set
cons distorted by outliers

IQR
pros not distorted by outliers
cons doesnt reflect full data set (half is disregarded)

standard deviations
pros data set is large = few outliers = negligible impact
cons data set small outliers - big impact on data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

outliers

A

a value that lies significantly outside the set of values of a variable

due to:

  • errors in measuring/recording data
  • natural variation
  • clean data if value incorrect
  • included if genuine result from natural variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ways outliers are defined

A

anything bigger than Q3 + k(Q3-Q1)
anything smaller than Q1- k(Q3-Q1)
k = typically constant 1.5

OR

anything more than a given number of standard deviations from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

advantages of a stem and leaf diagram

A

visibility of data easy to spot clusters and outliers
convenient to calculate median mode and range
can compare 2 data sets easily back to back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

cumulative frequency graphs

A

cf on y axis
variable on x axis

plot the upper bounds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Frequency density

A

Frequency divided by class width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Histograms

A
Continuous data 
No gaps between bars 
Height = freq density 
Area of bars proportional to frequency 
Plot at bounds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

no skew/ symmetrical skew

A

Q2 - Q1 = Q3-Q2
mean = median = mode

use median and IQR when data skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

positive skew

A

more data to the left
Q2 - Q1 < Q3-Q2
mean >median > mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

negative skew

A

more data to right
Q2 - Q1 > Q3-Q2
mean < median < mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3(mean - median) divided by standard deviation

A

postive for positive skew
negative for negative skew
0 for symmetrical skew
greater value = stronger skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

comparing data:

A

comment on measure of location (usually mean or median)
comment on measure of spread
make comparison in context

compare median and IQR (not affected by extreme values, or if the data is skewed)
compare mean and standard deviation (used when data are fairly symmetrical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly