Summarising data Flashcards by Kayleigh Swainson

What are the two types of variables?

Categorical

Numerical

How well did you know this?

Not at all

Perfectly

What is a categorical variable?

Give 3 subtypes:

Non-numerical (each value is associated with a catgory e.g.

Ordered categorical (ordinal) e.g. social class = assigned a number
Unordered categorical (nominal) e.g. blood group = named group
Dichotomous/binary e.g. gender = two categories only

How well did you know this?

Not at all

Perfectly

Give two subtypes of numerical variable:

Continuous e.g. height = infinite no. of distinct values
Discrete/counts e.g. number of siblings = only specific no of variables

How well did you know this?

Not at all

Perfectly

What type of variable is severity (low, moderate or severe) of dental erosion?

Ordinal

How well did you know this?

Not at all

Perfectly

What type of study is ALSPAC (avon longitudinal study of parents and children)?

Prospective cohort study

Recruit pregnant women living in Avon with a due date between April 1991 and Dec 1992

How well did you know this?

Not at all

Perfectly

How can one categorical variable be shown?

Bar chart

Pie chart

Frequency table

How well did you know this?

Not at all

Perfectly

How can one continuous variable be presented?

Histogram

Bar chart

Pie chart

How well did you know this?

Not at all

Perfectly

How can a categorical outcome and categorical exposure be presented?

Contingency (2 way) table

Outcome = columns

Exposure = rows]

Each cell usually shows count and % within exposure = gives some idea of relationship between outcome and exposure

How well did you know this?

Not at all

Perfectly

How can a numerical outcome and categorical exposure be presented?

Box and whisker blot (n.b. whiskers are used by diff people to represent different measures

Can compare distributions in >2 groups

How well did you know this?

Not at all

Perfectly

How can a numerical outcome and numerical exposure be presented?

Scatter plot (has regression line)

How well did you know this?

Not at all

Perfectly

What is the most appropriate graph for displaying adult height according to social class?

Box and whisker plot

(adult height = outcome = continuous; social class - exposure = categorical)

How well did you know this?

Not at all

Perfectly

What are the 3 different measures of central tendancy?

Mean

Median (more useful if there are extreme/outlying values or data is not symmetrically distributed)

Mode (depends on precision of data -> if sufficiently precise each reading can be distinguished from the other = mode wont exist)

N.B. these vary!

How well did you know this?

Not at all

Perfectly

What are the 4 measures of variability (extent of spread around the centre)?

Range (depends soley on two extrene values = may be inrepresentative of the whole set)

Interquartile range (often more useful if use the median instead of the mean)

Standard deviation (must be approximately symmetrically distributed to be meaningful)

Variance (SD²)

How well did you know this?

Not at all

Perfectly

What are the 3 types of distributions?

Normal (symmetrical)

Positively skewed (long tail to right)

Negativiely skewed (long tail to left)

How well did you know this?

Not at all

Perfectly

What do we do to positively skewed data to convert it to approximate normality?

Log transformation (must remember to transfer any means and SD back to the origional units for comparison)

n.b. other transformations may be required for negitively skewed data

How well did you know this?

Not at all

Perfectly

What is the geometric mean?

The exponential of mean value calculated from logged data

How well did you know this?

Not at all

Perfectly

What is a normal distribution?

95% of observations enclosed within mean +/- 1.96 SD

Mean & median = identical

SD determines shape (small = tall and narrow, large = shorter and fatter)

How well did you know this?

Not at all

Perfectly

What is a reference range?

A further measure of variability = amount of variation between individual observations of dae = used in clinic to determine if patient is clinically normal or not

e.g. 95% reference range = mean +/- 1.96 SD

Can also have 90 & 99% reference ranges etc.

Defined using properties of normal distribution

How well did you know this?

Not at all

Perfectly

The shape of curve for a normal distribution has a large standard deviation is ________ than one with a small standard deviation

Shorter/wider

How well did you know this?

Not at all

Perfectly

What are the 3 measures of outcome occurence?

Study These Flashcards

(other types of summary measures)

Prevalence

Incidence

Incidence rate

What does prevalence and incidence calculations exclude?

Study These Flashcards

Cure

Death

Emigration of ill people

What is the link between prevalence and incidence?

Study These Flashcards

Prevalence = incidence X average outcome duration

only if prevalence <0.1 and prevalence/incidence is constant

What tells us how many new cases have occurred in a particular time period?

Study These Flashcards

Incidence

How do we determine an association between two contunuous variables?

Study These Flashcards

Examine data graphically (scatter plot = initial feeling of relationship)

Statistical quantification of linear association = correlation (closely associated to linear regression)

Pearson's correlation coefficient quantifies the linear association between two variables in terms of what?

Direction and strength

What values does pearsons correlation coefficient range from?

+1 = variables tend to have higher or lower values together (closer assocation = closer +1) 0 = variables not linearly associated -1 = High values of one variable tend to be associated with low values of the other (closer assocation = closer to -1)

What does a perfect correlation look like?

What is an absent correlation and what does it look like?

No linear association betwen variables (may be quadratic though)

What type of variable is Sex? and how is it best graphically presented?

Categorical (binary) Bar chart or pie chart

What type of variable is Age? and how is it best graphically presented?

Numerical (continuous) Histogram

What type of variable is Ethnicity? and how is it best graphically presented?

Categorical (nominal) Bar chart/pie chart

What type of variable is Height? and how is it best graphically presented?

Numerical (continuous) Histogram

What type of variable is Social class group? and how is it best graphically presented?

Categorical (ordinal) Bar chart/ Pie chart

What type of variable is number of fillings? and how is it best graphically presented?

Numerical (discrete) Bar chart

What type of variable is Fat mass? and how is it best graphically presented?

Numerical (continuous) Histogram

If asked to describe what each of the summary statistics tells us about two variables make sure to:

Write the actual values out from the table and explain what the summary statistic is e.g. middle value of data ranked is 150.7 cm

How can you tell from summary statistics if data is normally distributed or not?

If mean and median are very similar = normally distributed

Which gives a better representation of the average? Arithemtic mean or geometric mean?

Geometric -\> not overly influenced by the very large values in a skewed distribution

What does age standarisation mean? And when is it appropriate?

Adjusting the rates to minimise the effects of differences in age composition when comparing across different populations Appropriate when comparing a statistic across populations with differing age distributions (otherwise results could be misleading)

What two reasons may cause an increase in prevalence of disease in a population while the incidence remains fairly constant?

* Average duration of disease increased due to improvements in treatment to prolong life * Average duration of disease increases due to improvements in diagnosis i.e. earlier diagnosis (prolong the period people know they have the disease)

What type of variable is number of siblings?

Numerical (discreet) = number but cannot have 0.4 children = not a continuum

How could you display numerical exposure and outcome data?

Scatter plot

Which of the following is NOT a measure of variabilty in the population? * Interquartile range * Standard error * Standard deviation

Standard error = used to make inferences outside of the people we are measuring

The location and variability of a normally distributed variable are usually summarised by which of the following: * Median & IQR * Mean & SD * Mode & range

Meand & SD n.b. mode, median and mean are equivilent in a normal distribution

Which of th following is a false statement about referance ranges? 1. They can be interpreted as likely values for an individual in the population 2. 90, 95 and 99% reference ranges can all be calculated 3. They are a measure of location

3. They are a measure of location = variability of mean!

In a population of adults, if the number of teeth remaining has mean 30 and SD 4, could these data be normally distributed?

No -\> cannot have more than 32 teeth! If it were a normal distribution we would assume that over 32 teeth could exist

For a particular outcome, what is the deifnition of prevalence and outcome?

Proportion with the outcome at a particular point in time

Should pairs of continuous variable always be examined graphically before analysis to check for non-linear associations?

Yes

Summarising data Flashcards

(48 cards)