Numerical Summaries Flashcards

1
Q

Histogram

A

visualise quantitative results
Highlights the frequency of data in one class interval compared to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

density scale

A

Height of each block = proportion in the block/length of the class interval
The area of the whole histogram on the density scale is one (or, in percentage. 100%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simple box plot

A

Graphic display of numerical summaries

5 number summary of data set - the middle 50% of the data in a box, the expected maximum and minimum in the whiskers, and determines any outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

comparative box plot

A

splits up a quantitative variable by a qualitative variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scatter plot

A

Examines the relationship between 2 quantitative variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Heat map

A

useful when a contingency table is not practical due to too many different values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

end point convention

A

If an interval contains the left endpoint but excludes the right endpoint, then 18 year old would be counted in [18,25) not [0,18)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

crowding

A

high density within a class interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Advantages of numerical summaries

A

A numerical summary reduces all the data to one simple number (“statistic”)

Precise number, less disagreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample mean

A

unique point at which the data is balanced.

i.e. the numbers to the left of the mean are balanced by the numbers to the right of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sample median

A

the middle data point, when the observations are ordered from smallest to largest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Robust

A

Sample median is said to be robust and is a good summary for skewed data as it is not affected by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

compareing sample mean and median

A

The difference between the sample mean and the sample median can be an indication of the shape of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For symmetric data, we expect the sample mean

A

to be the same as the sample median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For left skewed data, we expect the sample mean

A

to be smaller than the sample median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For right skewed data, we expect the sample mean

A

to be larger than the sample median

17
Q

Limitations of sample mean and median

A

need to be paired with a measure of spread.

18
Q

Root Mean Square (RMS).

A

measures the average of a set of numbers, regardless of the signs.

19
Q

Standard deviation

A

measures the spread of the data
(average of the gaps)

20
Q

Population standard deviation

A

RMS of gaps from the sample mean

20
Q

Population standard deviation

A

RMS of gaps from the sample mean

21
Q

Standard units

A

= (data point - mean) / SD

22
Q

IQR

A

Range of the middle 50% of the data
Q3 - Q1

23
Q

1st quartile

A

25% percentile

24
Q

3rd quartile

A

75% percentile

25
Q

Lower threshold on boxplot

A

Q1 - 1.5(IQR)

26
Q

Upper threshold on boxplot

A

Q3 + 1.5(IQR)

27
Q

Coefficient of variance (CV)

A

combines the mean and standard deviation into one summary (SD/mean)

28
Q

quartile

A

split data into:
min, Q1, median, Q3, max

29
Q

quantile

A

points in a distribution that relate to the rank order of values in that distribution

The set of q-quantiles divides the data into q equal size sets (in terms of percentage of data).

30
Q

percentile

A

100-quantile