Summarising data Flashcards

1
Q

What are the two types of variables?

A

Categorical

Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a categorical variable?

Give 3 subtypes:

A

Non-numerical (each value is associated with a catgory e.g.

  • Ordered categorical (ordinal) e.g. social class = assigned a number
  • Unordered categorical (nominal) e.g. blood group = named group
  • Dichotomous/binary e.g. gender = two categories only
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give two subtypes of numerical variable:

A
  • Continuous e.g. height = infinite no. of distinct values
  • Discrete/counts e.g. number of siblings = only specific no of variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of variable is severity (low, moderate or severe) of dental erosion?

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of study is ALSPAC (avon longitudinal study of parents and children)?

A

Prospective cohort study

Recruit pregnant women living in Avon with a due date between April 1991 and Dec 1992

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can one categorical variable be shown?

A

Bar chart

Pie chart

Frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can one continuous variable be presented?

A

Histogram

Bar chart

Pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can a categorical outcome and categorical exposure be presented?

A

Contingency (2 way) table

Outcome = columns

Exposure = rows]

Each cell usually shows count and % within exposure = gives some idea of relationship between outcome and exposure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can a numerical outcome and categorical exposure be presented?

A

Box and whisker blot (n.b. whiskers are used by diff people to represent different measures

Can compare distributions in >2 groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can a numerical outcome and numerical exposure be presented?

A

Scatter plot (has regression line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the most appropriate graph for displaying adult height according to social class?

A

Box and whisker plot

(adult height = outcome = continuous; social class - exposure = categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 3 different measures of central tendancy?

A

Mean

Median (more useful if there are extreme/outlying values or data is not symmetrically distributed)

Mode (depends on precision of data -> if sufficiently precise each reading can be distinguished from the other = mode wont exist)

N.B. these vary!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 measures of variability (extent of spread around the centre)?

A

Range (depends soley on two extrene values = may be inrepresentative of the whole set)

Interquartile range (often more useful if use the median instead of the mean)

Standard deviation (must be approximately symmetrically distributed to be meaningful)

Variance (SD2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 3 types of distributions?

A

Normal (symmetrical)

Positively skewed (long tail to right)

Negativiely skewed (long tail to left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we do to positively skewed data to convert it to approximate normality?

A

Log transformation (must remember to transfer any means and SD back to the origional units for comparison)

n.b. other transformations may be required for negitively skewed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the geometric mean?

A

The exponential of mean value calculated from logged data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a normal distribution?

A

95% of observations enclosed within mean +/- 1.96 SD

Mean & median = identical

SD determines shape (small = tall and narrow, large = shorter and fatter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a reference range?

A

A further measure of variability = amount of variation between individual observations of dae = used in clinic to determine if patient is clinically normal or not

e.g. 95% reference range = mean +/- 1.96 SD

Can also have 90 & 99% reference ranges etc.

Defined using properties of normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The shape of curve for a normal distribution has a large standard deviation is ________ than one with a small standard deviation

A

Shorter/wider

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the 3 measures of outcome occurence?

A

(other types of summary measures)

Prevalence

Incidence

Incidence rate

21
Q

What does prevalence and incidence calculations exclude?

A

Cure

Death

Emigration of ill people

22
Q

What is the link between prevalence and incidence?

A

Prevalence = incidence X average outcome duration

only if prevalence <0.1 and prevalence/incidence is constant

23
Q

What tells us how many new cases have occurred in a particular time period?

A

Incidence

24
Q

How do we determine an association between two contunuous variables?

A

Examine data graphically (scatter plot = initial feeling of relationship)

Statistical quantification of linear association = correlation (closely associated to linear regression)

25
Q

Pearson’s correlation coefficient quantifies the linear association between two variables in terms of what?

A

Direction and strength

26
Q

What values does pearsons correlation coefficient range from?

A

+1 = variables tend to have higher or lower values together (closer assocation = closer +1)

0 = variables not linearly associated

-1 = High values of one variable tend to be associated with low values of the other (closer assocation = closer to -1)

27
Q

What does a perfect correlation look like?

A
28
Q

What is an absent correlation and what does it look like?

A

No linear association betwen variables (may be quadratic though)

29
Q

What type of variable is Sex?

and how is it best graphically presented?

A

Categorical (binary)

Bar chart or pie chart

30
Q

What type of variable is Age?

and how is it best graphically presented?

A

Numerical (continuous)

Histogram

31
Q

What type of variable is Ethnicity?

and how is it best graphically presented?

A

Categorical (nominal)

Bar chart/pie chart

32
Q

What type of variable is Height?

and how is it best graphically presented?

A

Numerical (continuous)

Histogram

33
Q

What type of variable is Social class group?

and how is it best graphically presented?

A

Categorical (ordinal)

Bar chart/ Pie chart

34
Q

What type of variable is number of fillings?

and how is it best graphically presented?

A

Numerical (discrete)

Bar chart

35
Q

What type of variable is Fat mass?

and how is it best graphically presented?

A

Numerical (continuous)

Histogram

36
Q

If asked to describe what each of the summary statistics tells us about two variables make sure to:

A

Write the actual values out from the table and explain what the summary statistic is e.g. middle value of data ranked is 150.7 cm

37
Q

How can you tell from summary statistics if data is normally distributed or not?

A

If mean and median are very similar = normally distributed

38
Q

Which gives a better representation of the average? Arithemtic mean or geometric mean?

A

Geometric -> not overly influenced by the very large values in a skewed distribution

39
Q

What does age standarisation mean?

And when is it appropriate?

A

Adjusting the rates to minimise the effects of differences in age composition when comparing across different populations

Appropriate when comparing a statistic across populations with differing age distributions (otherwise results could be misleading)

40
Q

What two reasons may cause an increase in prevalence of disease in a population while the incidence remains fairly constant?

A
  • Average duration of disease increased due to improvements in treatment to prolong life
  • Average duration of disease increases due to improvements in diagnosis i.e. earlier diagnosis (prolong the period people know they have the disease)
41
Q

What type of variable is number of siblings?

A

Numerical (discreet) = number but cannot have 0.4 children = not a continuum

42
Q

How could you display numerical exposure and outcome data?

A

Scatter plot

43
Q

Which of the following is NOT a measure of variabilty in the population?

  • Interquartile range
  • Standard error
  • Standard deviation
A

Standard error = used to make inferences outside of the people we are measuring

44
Q

The location and variability of a normally distributed variable are usually summarised by which of the following:

  • Median & IQR
  • Mean & SD
  • Mode & range
A

Meand & SD

n.b. mode, median and mean are equivilent in a normal distribution

45
Q

Which of th following is a false statement about referance ranges?

  1. They can be interpreted as likely values for an individual in the population
  2. 90, 95 and 99% reference ranges can all be calculated
  3. They are a measure of location
A
  1. They are a measure of location

= variability of mean!

46
Q

In a population of adults, if the number of teeth remaining has mean 30 and SD 4, could these data be normally distributed?

A

No -> cannot have more than 32 teeth!

If it were a normal distribution we would assume that over 32 teeth could exist

47
Q

For a particular outcome, what is the deifnition of prevalence and outcome?

A

Proportion with the outcome at a particular point in time

48
Q

Should pairs of continuous variable always be examined graphically before analysis to check for non-linear associations?

A

Yes