Presentation / Display of data Flashcards

1
Q

What is another word for Nominal data?

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Nominal (Categorical) data?

A

categories into which observations fall, without any quantitative element, e. g. Eye colour, ABO blood group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is binary data?

A

Data where there are just TWO categories, e. g. Sex, HIV status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Ordinal data?

A

Ordinal data have a quantitative element in the categories, but no well defined scale of measurement, e. g. The Apgar scale of condition of the new-born (from 0 to 10), Stages of cancer (I, II, III, IV).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is Discrete numerical data?

A

Discrete numerical data have a well-defined numerical scale of measurement confined to whole numbers: Number of living children a woman has, Number of times a patient has been admitted to hospital.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Continuous numerical data ?

A

That is data where the observations may vary over some continuous numerical range: Age, Height, Body temperature, Blood pressure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Frequencies?

A

How often each value occures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you express Relative frequencies?

A

Percent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which two ways are the most popular forms to display frequency?

A

Bar chart and pie chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the Pie chart work?

A

For the Pie chart the frequency is represented by the area i. e. the propotion of each ‘slice’ equals the relative frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the bar chart work?

A

The hight of each bar represents the corresponding frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which two ways are the most popular forms to display numerical data?

A

Dotplots and histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a dotplot work?

A

Dotplot represents numeric data as dots lying on the real line with repeating entries stack upon each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a Histogram work?

A

Histogram is a vertical bar chart. The range of data is split into disjoint regions: bins, and bars are drawn on their bases so that the areas of the bars (not heights!, in general) represent frequencies of data in each region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the midpoint of classes/bins called?

A

class marks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many bins should you choose?

A

between 6-12

17
Q

What is the difference between unimodal, bimodal and multi-modal?

A

distribution has one, two or more distinctive peaks.

18
Q

Explain what skew is

A

one tail is longer/heavier than another

19
Q

What is Shape of distribution?

A

Shapes for comparison

20
Q

What are outliers?

A

are there observations which appear to be different from the rest of the data. We might want to exclude them from the analysis.

21
Q

What is the location?

A

How is the values of the data distributed?

22
Q

If a curve is skew to the right, what does it look like?

A

The “top” is located to the left and the “tail” to the right

23
Q

What is the mean value?

A

Average

24
Q

What is the median value?

A

The middle value,
If N is odd, median = middle value.
If N is even, median = Average of two middle values.

25
Q

How does the median and mean react to outliers?

A

The mean is sensitive to outliers, i. e. its value is greatly affected by their presence; whereas the median is robust against effects of extreme values. Hence median is more reliable measure of location in presence of outliers, or generally, for unreliable data.

26
Q

Which two methods are most common and most useful when it comes to variability? (Variationsrikedom)

A

Sample variance and Inter-Quartile Range

27
Q

How do you define the lower (or the first) quartile?

A

You may define lower (or the first) quartile Q_1 to be the value such that one quarter of data lie below it and 3 quarters – above it.

28
Q

How do you define the upper (or the third) quartile?

A

the upper (or the third) quartile Q_3 is the value such that 3 quarters of data lie below it and 1 quarter – above it.

29
Q

How do you define Inter-quartile range?

A

IQR = Q3 − Q1, where

Q1, Q3 are the lower and upper quartiles, respectively.

30
Q

How does IQR react on outliers?

A

IQR is robust (insensitive) to outliers.

31
Q

What do you need to use Five Figure Summary?

A

L, Q_1, m, Q_3, U, where L is the maximum of ( smallest observation and Q_1-1.5IQR) and U is the minimum of (largest observation and Q_3+1.5IQR)

32
Q

If U = Q_3 + 1.5(IQR) or/and L=Q_1 −1.5(IQR) what happends to the values above/below?

A

They get noted as outliers.

33
Q

When do you use a boxplot?

A

When you want to display a five figure summary

34
Q

What does the boxes, whiskers and * mark in a boxplot?

A

A box is drawn with edges at Q1 and Q3. The segment inside shows median m. Additional lines (whiskers) represent the range of the lower 25% of data and of the upper 25% of data excluding outliers. The outliers are marked separately with *. In principle, each section of the plot corresponds to 1/4 of data so the plot also gives immediate info about the shape of the distribution.

35
Q

When the distribution is skew, what is better to use, mean or median?

A

Then the median is a much better typical value than the mean

36
Q

What is the relation between Sampla variance and standard deviation?

A

(Standard deviation)^2=Sample variance