Numerical Variables 2

- continuous = on a continuos scale, can take any value in range - discrete = finite options, usually countable

Spreadsheets of datasets 3

- Columns: each represents 1 variable (first usually identifier) - Rows: each represents data for 1 person (record) - Cells: value of 1 variable for 1 person = observation

Ordinal (ordered categorical) measurement 2

- frequencies - proportions - sometimes means and medians

Numerical (interval/ratio) measurement 3

- mean - median - standard deviation

Week 9 Kuracloud: Measuring and Summarising Data Flashcards by Unknown Unknown

Statistics

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

= “the science of collecting, summarising, presenting and interpreting data, and of using them to estimate the magnitude of associations and test hypotheses”

(Kirkwood & Sterne. Essential Medical Statistics, 2nd ed., 2010)

How well did you know this?

Not at all

Perfectly

Descriptive Statistics

= describes features of data sample
“summarising, presenting and interpreting data”

How well did you know this?

Not at all

Perfectly

Inferential Statistics

= infer findings of sample to target population
“estimate the magnitude of associations and test hypotheses”

How well did you know this?

Not at all

Perfectly

Data

= “a set of values of subjects with respect to qualitative or quantitative variables”

How well did you know this?

Not at all

Perfectly

Raw Data

= observations

How well did you know this?

Not at all

Perfectly

Data set

= collection of information regarding a group of people or other items

How well did you know this?

Not at all

Perfectly

Variables

=, 2

= characteristics that you can measure or observe and may take any one of a specified set of values
- Numerical (quantitative) (or interval/ratio data)
- Categorical (qualitative)

How well did you know this?

Not at all

Perfectly

Categorical Variables

2,1

ordered/ordinal = rank in categories in an order
unordered/nominal = place observations in named, unordered groups
- dichotomous/binary

How well did you know this?

Not at all

Perfectly

Numerical Variables

continuous = on a continuos scale, can take any value in range
discrete = finite options, usually countable

How well did you know this?

Not at all

Perfectly

Derived variable

= new variable created from existing variable
variable measured as numerical –> categorical

How well did you know this?

Not at all

Perfectly

Spreadsheets of datasets

Columns: each represents 1 variable (first usually identifier)
Rows: each represents data for 1 person (record)
Cells: value of 1 variable for 1 person = observation

How well did you know this?

Not at all

Perfectly

Outcome variable

=, (3)

= focus of attention, we try to explain its variation
(dependant variable/response variable/y-variable)

How well did you know this?

Not at all

Perfectly

Exposure Variable

=, (3)

= influences variation of outcome variable
(independant variable/predictor variable/x-variable)

How well did you know this?

Not at all

Perfectly

Operationalising Variables

= deciding which category designates individual as having an outcome/exposed
dictates interpretation of results

How well did you know this?

Not at all

Perfectly

Nominal (unordered categorical) variable measurement

frequencies (no. observations in each category)
proportions (relative frequencies)

How well did you know this?

Not at all

Perfectly

Ordinal (ordered categorical) measurement

frequencies
proportions
sometimes means and medians

How well did you know this?

Not at all

Perfectly

Numerical (interval/ratio) measurement

Study These Flashcards

mean
median
standard deviation

Nominal (unordered categorical) graphical representation

Study These Flashcards

pie chart
column/bar graph
stacked column/bar graph

Ordinal (ordered categorical) graphical representation

Study These Flashcards

column/bar graph

Numerical (interval/ratio) graphical reprentation

Study These Flashcards

bar graph (data grouped)
histogram (data grouped)
box and whisker plot (summary statistics)
line graph (over time)

Relative frequencies

=, 3

Study These Flashcards

= proportion/percentage of total number
presented in:
- table
- bar graph
- pie chart

Epidemiological prevalence or cumulative incidence

Study These Flashcards

Presentation: proportion/percentage
Type: dichotomous categorical variables

Frequency distribution

=, 2, 2

Study These Flashcards

= distribution of values of a numerical variable
- first step in analysing numerical data
- displayed in a histogram
- for discrete: individual frequencies displayed
- for continuous: frequencies of formed groups/ranges

Histogram vs Bar graph

Study These Flashcards

histogram has no gaps between bars because continous data

Histograms show us: | 5

- spread - skew - mode - gaps - unusual values

Histogram Shapes

- positively skewed - symmetrical - negatively skewed

Positively Skewed | =,

= asymmetrical distribution in which "upper tail is longer than lower tail" (higher frequency at left/lower values) ^\__ mean > median

Symmetrical | =,

= symmetrical distribution around centre, bell curve, normal distribution, Gaussian distribution _/^\_ mean, median, mode almost equal

Negatively Skewed | =,

= asymmetrical distribution in which "lower tail is longer than upper tail" (higher frequency at higher/right values) /^ mean < median

Measures of Central Tendency | 3

- mean - median - mode

Measures of Variability | 3

- range - interquartile range/IQR (difference between 1st and 3rd quartiles) - standard deviation

Standard deviation (SD)

= measure of spread about mean calculation: 1. differences of each observation from mean taken (deviations) 2. Deviations are squared 3. Add deviations together 4. divide by no. observations - 1 (= variance = SD squared) 5. Square root

Theoretical Frequency Distribution/Standard Normal Distribution properties (or PDF = probability density function) | 8

- symmetrical about mean (bell curve) - mean = 0, SD = 1 - tall and narrow for small SD, short and wide for large SD - 68% lie within 1 SD of mean - 95% lie within 2 (actually 1.95) SDs of mean - 99% lie within 3 SDs of mean - use mean and SD to find proportion lying between any two values - probability of any specific value is 0

95% reference range/central reference range | =

= range of expected normal values in a population, values that enclose 95% population (1.95 or 2 SD either side of mean)

Assumption of Normality | =, 2

= assuming values of a continuous variable are normally distributed before calculations Distribution may be skewed if: 1. Mean and median are very different 2. Very large SD, 95% reference range falls outside of possible values or is negative

Aggregated Data | =

= units of observation are combined not individual level

Univariate analysis | =

= describes single variable

Bivariate analysis | =,

= relationship between 2 variables - exposure --> outcome, test hypothesis

When both variables categorical: | 4

display relationship by cross-tabulating in a contingency table - rows: exposure - columns: outcomes (no outcome column eliminated if percentages) used to calculate odds rations

Categorical Measures of association | 3

- odds ratio = strength of association between variables (yes/no --> odds for variable 1/odds for variable 2) - risk ratio (only in longitudinal) - prevalence ratio (good for cross-sectional)

When both variables numerical

Scatterplot - x-axis: exposure - y-axis: outcome

Numerical Measures of Association | ===,4

r = correlation coefficient = strength of linear association between two continuous variables = number of SD that outcome changes for 1 SD when exposed - always between -1 and 1 - r < 0: inverse correlation - r = 0: no association - r > 0: correlation - r = 1: perfect correlation, straight line

Week 9 Kuracloud: Measuring and Summarising Data Flashcards

(42 cards)