Summarising Data Flashcards
Why should data be summarised?
- Data quality monitoring
• Data checking + cleaning - check for invalid/missing entries
• Baseline data in a study - describe characteristics of participants in study e.g. 1st table in many research articles - to set study + results in context
• Before doing a complex analysis - so it makes sense
What is quantitative data and the 2 types?
- data which can be measured numerically
- continuous or discrete
What is continuous data and give e.g.s?
- data lie on a continuum
- can take any value between 2 limits
- e.g. weight, height
What is a limitation of continuous data?
accuracy of data depends on accuracy of method of measurement so that some continuous data may be recorded as integers although that is an approx to true value
What is discrete data and e.g?
- data do not lie on a continuum
- can only take certain values, usually counts (integers)
- no. of children in a family
Why is weight a continuous variable and what is the limitation?
- it is measured using weighing scales
- lies on a continuum
- limitation is the accuracy of the scales
Why is the number of previous pregnancies in a pregnant woman discrete data?
- it is counted
- only whole numbers are possible
What is ordinal data and which type of data is always ordinal?
– the data values can be arranged in a numerical order from the smallest to the largest.
- Quantitative data are always ordinal
What are e.g.s of ordinal data?
- Questionnaire scale data - often counts, e.g. when adding the no. of +ve responses to a set of questions to get a total score.
- Categorical data may also have an inherent orde, such as stage of disease.
What is an e.g. where continuous data can look discrete?
- because of the way they are measured and/or
reported. - e.g. gestational age of babies often reported in whole weeks, e.g. 38 weeks, - appears to be discrete.
- It is however continuous - could be reported to a greater degree of accuracy, e.g. as a decimal, such as 38.5 weeks
What are all continuous measurements limited by?
- the accuracy of the instrument used to measure
them, - many quantities are reported in whole numbers for convenience such as age and height
What is categorical data?
data where individuals fall into a number of separate categories or classes
Give e.g.s of categorical data
- gender: male or female = 2 classes
- disease status: alive or dead = 2 classes
- stage of cancer: I, II, III or IV = 4 classes
- marital status: married, single, divorced, widowed or legally separated = 5 classes
Give e.g.s of when categorical data can be ordinal?
- Different categories of categorical data may be assigned a number for coding purposes
- if there are several categories there may be an implied ordering, such as with stage of cancer where stage I is the least advanced and stage IV the most advanced.
What is dichotomous data and give e.g.?
- only 2 classes
- all individuals fall into one or other of the classes
- aka as binary data.
Is it possible to categorise continuous data?
- possible to re-classify continuous data into groups, for ease of reporting.
- e.g. it is common to report birthweight in bands, giving the numbers of babies who fall into each
birthweight band.
What are the consequences of dichotomising?
- lots of info + statistical power lost in the analysis.
- nature of any relationships may be masked. e.g. if relationship was curved, this may be weaker if the data were categorized
- if relationship was U-shaped, categorization may totally obscure it
Why is it better if continuous data are re-classified into several groups?
- effect on statistical power is less than when dichotomizing.
- Grouping causes no problem if re-classification done simply to present summary statistics but the original data are used in the analysis
- Sometimes can be useful when examining a non-linear relationship. The analysis may be more straightforward and more meaningful
How can continuous data be summarised?
- a measure of the centre of the data distribution
- measure of the variability of the data.
What are measures of centre of data?
- Mean
* Median