Week 9 Kuracloud: Measuring and Summarising Data Flashcards

1
Q

Statistics

=

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

A

= “the science of collecting, summarising, presenting and interpreting data, and of using them to estimate the magnitude of associations and test hypotheses”

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Statistics

A

= describes features of data sample
“summarising, presenting and interpreting data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential Statistics

A

= infer findings of sample to target population
“estimate the magnitude of associations and test hypotheses”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data

=

A

= “a set of values of subjects with respect to qualitative or quantitative variables”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Raw Data

=

A

= observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data set

=

A

= collection of information regarding a group of people or other items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variables

=, 2

A

= characteristics that you can measure or observe and may take any one of a specified set of values
- Numerical (quantitative) (or interval/ratio data)
- Categorical (qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical Variables

2,1

A
  • ordered/ordinal = rank in categories in an order
  • unordered/nominal = place observations in named, unordered groups
    • dichotomous/binary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Numerical Variables

2

A
  • continuous = on a continuos scale, can take any value in range
  • discrete = finite options, usually countable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Derived variable

=,

A

= new variable created from existing variable
variable measured as numerical –> categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Spreadsheets of datasets

3

A
  • Columns: each represents 1 variable (first usually identifier)
  • Rows: each represents data for 1 person (record)
  • Cells: value of 1 variable for 1 person = observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outcome variable

=, (3)

A

= focus of attention, we try to explain its variation
(dependant variable/response variable/y-variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Exposure Variable

=, (3)

A

= influences variation of outcome variable
(independant variable/predictor variable/x-variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Operationalising Variables

=,

A

= deciding which category designates individual as having an outcome/exposed
dictates interpretation of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nominal (unordered categorical) variable measurement

2

A
  • frequencies (no. observations in each category)
  • proportions (relative frequencies)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ordinal (ordered categorical) measurement

2

A
  • frequencies
  • proportions
  • sometimes means and medians
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Numerical (interval/ratio) measurement

3

A
  • mean
  • median
  • standard deviation
18
Q

Nominal (unordered categorical) graphical representation

3

A
  • pie chart
  • column/bar graph
  • stacked column/bar graph
19
Q

Ordinal (ordered categorical) graphical representation

1

A
  • column/bar graph
20
Q

Numerical (interval/ratio) graphical reprentation

4

A
  • bar graph (data grouped)
  • histogram (data grouped)
  • box and whisker plot (summary statistics)
  • line graph (over time)
21
Q

Relative frequencies

=, 3

A

= proportion/percentage of total number
presented in:
- table
- bar graph
- pie chart

22
Q

Epidemiological prevalence or cumulative incidence

2

A

Presentation: proportion/percentage
Type: dichotomous categorical variables

23
Q

Frequency distribution

=, 2, 2

A

= distribution of values of a numerical variable
- first step in analysing numerical data
- displayed in a histogram
- for discrete: individual frequencies displayed
- for continuous: frequencies of formed groups/ranges

24
Q

Histogram vs Bar graph

A

histogram has no gaps between bars because continous data

25
Q

Histograms show us:

5

A
  • spread
  • skew
  • mode
  • gaps
  • unusual values
26
Q

Histogram Shapes

A
  • positively skewed
  • symmetrical
  • negatively skewed
27
Q

Positively Skewed

=,

A

= asymmetrical distribution in which “upper tail is longer than lower tail” (higher frequency at left/lower values)
^\__
mean > median

28
Q

Symmetrical

=,

A

= symmetrical distribution around centre, bell curve, normal distribution, Gaussian distribution
_/^_
mean, median, mode almost equal

29
Q

Negatively Skewed

=,

A

= asymmetrical distribution in which “lower tail is longer than upper tail” (higher frequency at higher/right values)
/^
mean < median

30
Q

Measures of Central Tendency

3

A
  • mean
  • median
  • mode
31
Q

Measures of Variability

3

A
  • range
  • interquartile range/IQR (difference between 1st and 3rd quartiles)
  • standard deviation
32
Q

Standard deviation (SD)

A

= measure of spread about mean
calculation:
1. differences of each observation from mean taken (deviations)
2. Deviations are squared
3. Add deviations together
4. divide by no. observations - 1 (= variance = SD squared)
5. Square root

33
Q

Theoretical Frequency Distribution/Standard Normal Distribution properties
(or PDF = probability density function)

8

A
  • symmetrical about mean (bell curve)
  • mean = 0, SD = 1
  • tall and narrow for small SD, short and wide for large SD
  • 68% lie within 1 SD of mean
  • 95% lie within 2 (actually 1.95) SDs of mean
  • 99% lie within 3 SDs of mean
  • use mean and SD to find proportion lying between any two values
  • probability of any specific value is 0
34
Q

95% reference range/central reference range

=

A

= range of expected normal values in a population, values that enclose 95% population (1.95 or 2 SD either side of mean)

35
Q

Assumption of Normality

=, 2

A

= assuming values of a continuous variable are normally distributed before calculations
Distribution may be skewed if:
1. Mean and median are very different
2. Very large SD, 95% reference range falls outside of possible values or is negative

36
Q

Aggregated Data

=

A

= units of observation are combined not individual level

37
Q

Univariate analysis

=

A

= describes single variable

38
Q

Bivariate analysis

=,

A

= relationship between 2 variables
- exposure –> outcome, test hypothesis

39
Q

When both variables categorical:

4

A

display relationship by cross-tabulating in a contingency table
- rows: exposure
- columns: outcomes (no outcome column eliminated if percentages)
used to calculate odds rations

40
Q

Categorical Measures of association

3

A
  • odds ratio = strength of association between variables (yes/no –> odds for variable 1/odds for variable 2)
  • risk ratio (only in longitudinal)
  • prevalence ratio (good for cross-sectional)
41
Q

When both variables numerical

A

Scatterplot
- x-axis: exposure
- y-axis: outcome

42
Q

Numerical Measures of Association

===,4

A

r = correlation coefficient = strength of linear association between two continuous variables = number of SD that outcome changes for 1 SD when exposed
- always between -1 and 1
- r < 0: inverse correlation
- r = 0: no association
- r > 0: correlation
- r = 1: perfect correlation, straight line