summarising and displaying data Flashcards

1
Q

—- are scales w underlying defined
unit.
example:
– A count (number of children)
– An accepted unit
* Years
* Metres
* Euros
these scales can be —- or —-

A

numeric scales
continuous or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

true or false:
-Many things cannot have a defined unit
as :Depression, satisfaction, pain
-We recognise that people can be satisfied, or in pain, to a
greater or lesser extent
-The problem is measuring these concepts without a defined
unit

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

—– Used to measure relative quantity

A

ordinal scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

age measured in years, unit of days are examples of

A

defined units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

– Severity of pain: mild, moderate, severe
–Alcohol consumption: none, low, high
–Quality of life score: 0, 1, 2,….,10
are examples of:

A

ordinal scales ( check slide 12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Numeric and ordinal scales are labels that tell us —- and the more basic example is — by which —- is the basis of measurement

A

how much
what
classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Labelling schemes that classify people or things or events are —-
examples are:

A

nominal measurement scales
– Disease classification schemes e.g ICD 10 (International Classification
of Diseases)
– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

nominal measurement scales tells us— of thing something is and its based on ——

A

what kind
agreed classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Some scales have only two labels these are called —-

A

dichotomous scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other
are examples of

A

nominal measurement as blood groups types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

– Disease status: Presence or absence of disease
– Lab test result: Positive or negative
– Mortality : alive or dead status
– Exam result: Pass or fail
are examples of

A

dichotomous scales - simplest sort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

types of variables summary:
1- —- variables
– Defined units, tell us how much in an absolute sense
– Can be continuous or discrete
Categorical variables
*—– scales
– Tell us how much, but in a relative rather than absolute sense
*—– scales
– Classify. Tell us what rather than how much
– Called —- scale when only two values

A

numeric
ordinal
nominal
dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Knowing the measurement scale of data informs us as to how we should —- and — it

A

display and summarise it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Summaries are — than the original because of what they leave out
* So any summary is a —- of the original
things can go wrong by:
1- We present aspects of the data that lead to the wrong conclusion
2- We leave out some important aspect of the data, leading to the reader drawing the wrong conclusion
- In practice, data analysts will examine the data in —– ways to make sure to avoid these pitfalls when reporting on them

A

smaller
simplification
different wats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The most basic summary statistic is a —-

A

frequency as count or percent ( check the graph of stacked histogram ) and we can use a frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

rule of thumbs:
—- for precise information
—- for patterns and understanding

A

numbers
graphs

17
Q

A simple graph displaying
frequencies of categories is —-
– —- is preferable but often they
presented —-

A

bar graph
horizontal
vertical

18
Q

When the data are measured on a
continuous scale but we have
relatively small amounts of data, we
can display the data as —

A

dots aka a dot plot this can be used for heights of women and men from a small study
– For men or women with the
same height, the dots are shown
beside each other

19
Q

With —- amounts of data, we don’t need to rely on the summaries, we can simply show all the data in a plot
* But with —- datasets, the dots become too numerous and we rely more and more on summaries

A

small
larger

20
Q

death in intensive care unit:
Patients had their risk of death calculated using —– scores
* These scores combines — to produce an —- of the—- of death
* The study also looked at length of stay
* These two variables - length of stay and APACHE-II scores
- the dots show will be —–

A

APACHE-II
risk indicators
overal prediction
chance
predicted risk of
death (APACHE-II scores)
( check slide 27 pls , 28)

21
Q

Summarising the risk scores using % cut-offs :
- These summaries don’t show us — the data, but they give us a good idea of —
- they show —-
- and give some idea of how scores – around that

A

all
key marker
middle point/halfway
vary

22
Q

—– is a value representing a cut off of a specified percentage of the data

A

percentiles but also called quantiles ( check graph 29 plsss)

23
Q

—– is the half-way point of the data values.
– Strictly speaking, half of the values lie —— the median
– The — percentile!

A

median
lie at or below
50th
( check slide 31 PLSSSS)

24
Q

—- is the average and it indicates approcimaently where the data is located on the number line.
and its calculated as:

A

mean
“Sum up the individual values then
divide by the number of them”
mean can be misleading tho ( check the bar graph 35 )

25
Q

—- A “tail” of exceptionally long stay times push the mean up
—– a detailed summary of the objectives, methods, results, and conclusions of a full study report and these statistics that maintain their properties even if the underlying distributional assumptions are incorrect.

A

outliers
( check slide 36 for more info pls).
robust summary

26
Q

The mean is sensitive to —-
while median is affected by —-

A

outliers
robust summary measures such as median while outliers have little effect ( the median is a robust statistic because it has a breakdown point of 50%)
– Omitting the four highest values moves the median from 3·55 to 3·50 – that’s a change
of about an hour (which is very small in comparison with the effect on the mean)
* This explains the differences we see when we look at medians instead of means

27
Q

ranges gives us an idea of — which is not always a good idea and it needs 2 pieces of info which are: — and —
these two values are most likely to be —- cases or —-
- range is not — it while be affected by —

A

variability
biggest and smallest values
atypical cases and errors
robust
outliers
(The range of length of stay is 82 days, but it’s only 38 days if we ignore the
longest-staying patient, and 19 days if we ignore the three longest-staying
patients)

28
Q
  • A quarter of all patients scored 17 or less, and three quarters scored 66 or less
  • So the middle 50% of patients scored between 17 and 66
    – That’s a range of 49 (66 – 17)
  • This is called the —– which will be — to outliers bc they will occur at — and not —-
A

interquartile range (abbreviated as IQR)
extreme
middle

29
Q

—– average of the squared differences from the mean , a measure of how far a set of numbers —–
– No-one apart from professional statisticians understand it fully
- Example: fasting blood sugar was checked for 10 employees
– The results are : 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10 mmol/l
Mean=7.75
Variance = (5.5−7.75)2+(6−7.75)2 + (6.5−7.75)2
…………… (10−7.75)2
9
=2.063

A

variance
spread out

30
Q
  • Square root of the variance is —-
    – It is in the same units as the —
  • — SD indicates data points tend to be very close to the mean (and to
    each other)
  • —- SD indicates that the data points are very spread out from the mean and from each other.
  • blood sugar ex:
A

standard deviation SD
original value
small
large
𝑆𝐷 = square root of this 2.063 = 1.44
check slide 46 37 plssss)

31
Q

box plots are useful for —- and present — key summary statistics for each group which are:
- shown in a —-
- they display the —- by building a box around —- and —

A

comparing groups
5
The minimum,
25th percentile,
50th percentile (median),
75th percentile and maximum
simple visual display
interquartile range
25th and 75th

32
Q

biomedical example :
-Mass spectrometry experiments where proteins are —- in —- samples from patients
- Prior to identifying biomarkers of
interest:
1– Boxplots for each sample can be used to identify —- with sample preparation or with calibration of the mass spectrometer
– Based on this, samples may then be —- or—
– Note the whiskers, extending to min & max

A

quantified
biological
problems
excluded
re-aligned (normalization)

33
Q

—– is a data point which is abnormally distant from the rest of the data
- we can modify a — to show outliers as:
– using a —- that is based on the IQR, we change the length of the whiskers*
– Individual points —- the whiskers are shown as outliers
- We can then further investigate the nature of the outliers:
– Often they are valid observations: reporting —– is recommended

A

outliers
box plot
detection rule
outside
robust summary statistics
check slide 51 52 53

34
Q
  • In the examples for Length of stay in ICU and BMI, there appeared to be an excess of high values
    – An excess of low or high values is called —- these may be visualised as —-
  • A special case of data without skewness is the —-
A

skewness
dotplots, boxplots and histograms
normal distribution

35
Q

true or false:
Importance of the normal distribution is bc s that it fits
many natural phenomena
* Many things we measure are
approximately normal
– e.g. blood pressure & height
* But nothing is truly normal (it is a
mathematical concept)