Presentation / Display of data Flashcards
What is another word for Nominal data?
Categorical
What is Nominal (Categorical) data?
categories into which observations fall, without any quantitative element, e. g. Eye colour, ABO blood group.
What is binary data?
Data where there are just TWO categories, e. g. Sex, HIV status.
What is Ordinal data?
Ordinal data have a quantitative element in the categories, but no well defined scale of measurement, e. g. The Apgar scale of condition of the new-born (from 0 to 10), Stages of cancer (I, II, III, IV).
what is Discrete numerical data?
Discrete numerical data have a well-defined numerical scale of measurement confined to whole numbers: Number of living children a woman has, Number of times a patient has been admitted to hospital.
What is Continuous numerical data ?
That is data where the observations may vary over some continuous numerical range: Age, Height, Body temperature, Blood pressure.
What is Frequencies?
How often each value occures.
How do you express Relative frequencies?
Percent
Which two ways are the most popular forms to display frequency?
Bar chart and pie chart.
How does the Pie chart work?
For the Pie chart the frequency is represented by the area i. e. the propotion of each ‘slice’ equals the relative frequency.
How does the bar chart work?
The hight of each bar represents the corresponding frequency.
Which two ways are the most popular forms to display numerical data?
Dotplots and histograms
How does a dotplot work?
Dotplot represents numeric data as dots lying on the real line with repeating entries stack upon each other
How does a Histogram work?
Histogram is a vertical bar chart. The range of data is split into disjoint regions: bins, and bars are drawn on their bases so that the areas of the bars (not heights!, in general) represent frequencies of data in each region.
What is the midpoint of classes/bins called?
class marks
How many bins should you choose?
between 6-12
What is the difference between unimodal, bimodal and multi-modal?
distribution has one, two or more distinctive peaks.
Explain what skew is
one tail is longer/heavier than another
What is Shape of distribution?
Shapes for comparison
What are outliers?
are there observations which appear to be different from the rest of the data. We might want to exclude them from the analysis.
What is the location?
How is the values of the data distributed?
If a curve is skew to the right, what does it look like?
The “top” is located to the left and the “tail” to the right
What is the mean value?
Average
What is the median value?
The middle value,
If N is odd, median = middle value.
If N is even, median = Average of two middle values.
How does the median and mean react to outliers?
The mean is sensitive to outliers, i. e. its value is greatly affected by their presence; whereas the median is robust against effects of extreme values. Hence median is more reliable measure of location in presence of outliers, or generally, for unreliable data.
Which two methods are most common and most useful when it comes to variability? (Variationsrikedom)
Sample variance and Inter-Quartile Range
How do you define the lower (or the first) quartile?
You may define lower (or the first) quartile Q_1 to be the value such that one quarter of data lie below it and 3 quarters – above it.
How do you define the upper (or the third) quartile?
the upper (or the third) quartile Q_3 is the value such that 3 quarters of data lie below it and 1 quarter – above it.
How do you define Inter-quartile range?
IQR = Q3 − Q1, where
Q1, Q3 are the lower and upper quartiles, respectively.
How does IQR react on outliers?
IQR is robust (insensitive) to outliers.
What do you need to use Five Figure Summary?
L, Q_1, m, Q_3, U, where L is the maximum of ( smallest observation and Q_1-1.5IQR) and U is the minimum of (largest observation and Q_3+1.5IQR)
If U = Q_3 + 1.5(IQR) or/and L=Q_1 −1.5(IQR) what happends to the values above/below?
They get noted as outliers.
When do you use a boxplot?
When you want to display a five figure summary
What does the boxes, whiskers and * mark in a boxplot?
A box is drawn with edges at Q1 and Q3. The segment inside shows median m. Additional lines (whiskers) represent the range of the lower 25% of data and of the upper 25% of data excluding outliers. The outliers are marked separately with *. In principle, each section of the plot corresponds to 1/4 of data so the plot also gives immediate info about the shape of the distribution.
When the distribution is skew, what is better to use, mean or median?
Then the median is a much better typical value than the mean
What is the relation between Sampla variance and standard deviation?
(Standard deviation)^2=Sample variance