Numerical Summaries Flashcards

Question 1

Q

Histogram

Answer

A

visualise quantitative results
Highlights the frequency of data in one class interval compared to another

Question 2

Q

density scale

Answer

A

Height of each block = proportion in the block/length of the class interval
The area of the whole histogram on the density scale is one (or, in percentage. 100%)

Question 3

Q

Simple box plot

Answer

A

Graphic display of numerical summaries

5 number summary of data set - the middle 50% of the data in a box, the expected maximum and minimum in the whiskers, and determines any outliers.

Question 4

Q

comparative box plot

Answer

A

splits up a quantitative variable by a qualitative variable.

Question 5

Q

Scatter plot

Answer

A

Examines the relationship between 2 quantitative variables.

Question 6

Q

Heat map

Answer

A

useful when a contingency table is not practical due to too many different values.

Question 7

Q

end point convention

Answer

A

If an interval contains the left endpoint but excludes the right endpoint, then 18 year old would be counted in [18,25) not [0,18)

Question 8

Q

crowding

Answer

A

high density within a class interval

Question 9

Q

Advantages of numerical summaries

Answer

A

A numerical summary reduces all the data to one simple number (“statistic”)

Precise number, less disagreement

Question 10

Q

Sample mean

Answer

A

unique point at which the data is balanced.

i.e. the numbers to the left of the mean are balanced by the numbers to the right of the mean.

Question 11

Q

Sample median

Answer

A

the middle data point, when the observations are ordered from smallest to largest.

Question 12

Q

Robust

Answer

A

Sample median is said to be robust and is a good summary for skewed data as it is not affected by outliers

Question 13

Q

compareing sample mean and median

Answer

A

The difference between the sample mean and the sample median can be an indication of the shape of the data.

Question 14

Q

For symmetric data, we expect the sample mean

Answer

A

to be the same as the sample median

Question 15

Q

For left skewed data, we expect the sample mean

Answer

A

to be smaller than the sample median

Question 16

Q

For right skewed data, we expect the sample mean

Answer

A

to be larger than the sample median

Question 17

Q

Limitations of sample mean and median

Answer

A

need to be paired with a measure of spread.

Question 18

Q

Root Mean Square (RMS).

Answer

A

measures the average of a set of numbers, regardless of the signs.

Question 19

Q

Standard deviation

Answer

A

measures the spread of the data
(average of the gaps)

Question 20

Q

Population standard deviation

Answer

A

RMS of gaps from the sample mean

Question 21

Q

Population standard deviation

Answer

A

RMS of gaps from the sample mean

Question 22

Q

Standard units

Answer

A

= (data point - mean) / SD

Question 23

Q

IQR

Answer

A

Range of the middle 50% of the data
Q3 - Q1

Question 24

Q

1st quartile

Answer

A

25% percentile

Question 25

Q

3rd quartile

Answer

A

75% percentile

Question 26

Q

Lower threshold on boxplot

Answer

A

Q1 - 1.5(IQR)

Question 27

Q

Upper threshold on boxplot

Answer

A

Q3 + 1.5(IQR)

Question 28

Q

Coefficient of variance (CV)

Answer

A

combines the mean and standard deviation into one summary (SD/mean)

Question 29

Q

quartile

Answer

A

split data into:
min, Q1, median, Q3, max

Question 30

Q

quantile

Answer

A

points in a distribution that relate to the rank order of values in that distribution

The set of q-quantiles divides the data into q equal size sets (in terms of percentage of data).

Question 31

Q

percentile

Answer

A

100-quantile