Exploring quantitative data Flashcards
What is the research process?
A process that uses the scientific method to establish, confirm and/or reaffirm certain pieces of knowledge supported by strong evidence
Why do we use the research process?
- Create theories
- Find solutions to problems
- Find problems to solutions
- Find some sort of truth
What does the research design encompass?
Plan of sampling, data collection, measurement and analysis
Methodology
Study design
Data analysis
Types of study designs in medicine
- Basic studies
- Observational studies
- Experimental (interventional) studies
- Economic evaluations
- Meta-analysis/systematic review
What is an observational study?
- Non-interventional
- variables are not manipulated by researchers
- Researchers observe natural relationships between factors and outcomes
Types of observational studies
- Cross-sectional studies
- Longitudinal studies
- Case-control studies
- Cohort studies
- Survey studies
What is a Cross-sectional study?
A study that assesses a population, as represented by the study sample, as a single point in time
–> they reflect the situation of a disease or clinical outcome at a particular moment in a particular population
Cross-sectional study example
- Enrolling current smokers or never smokers and assessing whether or not they have decreased lung function
How is a cross-sectional study conducted?
- Participants recruited based on inclusion and exclusion criteria
- Study the exposure and outcome at the same time
- Estimate the prevalence (of outcome and exposure as well) –> calculate odds ratio
What is a longitudinal study?
They use continuous or repeated measures to follow a particular individual over prolonged periods of time (year or decades)
- -> observational in nature
- -> collects quantitative and qualitative data without external influence being applied
Longitudinal study example
The longitudinal study of the Office of Population Censuses and Surveys
What is a case-control study?
Researchers identify study participants based on their case status (i.e. diseased or not diseased)
–> Quantification of the number of individuals among the cases and the controls who are exposed allows for statistical associations between exposure and outcomes to be established
Case-control study example
Analysing the relationship between obesity and knee replacement surgery
–> Cases are participants who have had knee surgery, and controls are a random sampling of those who have not, and the comparison is the relative odds of being obese if you have knee surgery as compared to those that do not
What are cohort studies?
involve identifying study participants based on their exposure status and following them through time to identify which participants develop the outcome(s) of interest
Cohort study example
cohort of 5766 men aged 35–64 at the time of examination who were recruited from workplaces in the west of Scotland between 1970 and 1973. The study investigated the association between socioeconomic position in early life (when the participants were children) and cause specific mortality. Relative rates of mortality adjusted for age for men with fathers in manual versus non-manual occupations were 1.52 (95% confidence interval 1.24 to 1.87) for coronary heart disease, 1.83 (1.13 to 2.94) for stroke, 1.65 (1.12 to 2.43) for lung cancer, 2.06 (0.93 to 4.57) for stomach cancer, and 2.01 (1.17 to 3.48) for respiratory disease.
What are survey studies?
Research based off of surveys
Benefit of surveys
enables the researcher to describe the characteristics of the sample being studied and to make generalisations to the larger population of interest. Surveys are particularly useful for collecting information about research phenomena that are not directly observable or measurable. They are also useful for collecting data from people who are widely distributed geographically, since direct contact between researcher and research participant is not necessary.
Types of surveys
- Epidemiological
- Surveys on attitudes to a health service intervention
- Questionnaires assessing knowledge on a particular topic or issue
What are interventional studies?
those where the researcher intervenes or manipulate the variable (s) at some point throughout the study.
Types of interventional studies
- RCT
- Pre-post studies
- Non-randomised controlled trials
What is an RCT?
a trial in which subjects are randomly assigned to one of two (or more) groups: one (the experimental group) receiving the intervention that is being tested, and the other (the comparison group or control) receiving an alternative (conventional) treatment.
Features of an RCT
- The sample to be studied will be appropriate to the hypothesis being tested so that any results are appropriately generalisable. The study will recruit sufficient patients to allow it to have a high probability of detecting a clinically important difference between treatments
- There will be effective (concealed) randomisation to eliminate bias
- Both groups will be treated identically in all respects except for the intervention being tested and to this end, patients and investigators will ideally by blinded to which group an individual is assigned
- The investigator assessing outcome will be blinded to treatment allocation
- Patients are analyzed within the group to which they were allocated, irrespective of whether they experience the intended intervention (intention to treat analysis)
- The analysis focuses on testing the research question that initially led to the trial
What are pre-post studies?
A pre-post study measures the occurrence of an outcome before and again after a particular intervention is implemented.
Example of pre-post study
comparing deaths from motor vehicle crashes before and after the enforcement of a seat-belt law.
What are the non-randomised trials?
interventional study designs that compare a group where intervention was performed with a group where there was no intervention –> suggest possible relationships between the intervention and the outcome.
Negative of non-randomised controlled trials
Often subject to bias or error
Why do we want to visualise data?
to visualise the distributions of the variables and relationships between them. This allows you to become familiar with the data before carrying out analysis, reveal possible data entry errors, and discover unexpected patterns in the data.
3 aspects of distribution
- Location
- Spread
- Shape
What is the location of the variability?
A typical value taken by the variable
What is the spread of the variability?
How far the values extend from the centre
What is the shape of the variability?
whether or not values are spread symmetrically on either side of the centre
Continuous variables can be explored using plots such as…
- Dot plots
- Histograms
- Density plots
What are dot plots?
- Draw an axis
- Put a dot for each response
What is alpha blending?
giving dots transparency so that you can see where points are overlapping
What is a jittered dot plot?
- The vertical axis has no meaning except as a way to separate the dots –> can see more detail and clusters
What are dot plots useful for?
- Comparing the distribution between groups
Negative of dot plots
- Hard when you have lots of points
What are histograms useful for?
- Showing shape of the distribution
- Visualising large numbers of observations
- Provide a good picture of the location, spread and shape
How to make a histogram
- Divide the range of height into bins then count how many of the values fell into each bin
Details of skewed data
- Look at the tail of the data
- can be skewed to the right (positively skewed) or the left
Negative of histograms
- Don’t have a lot of control over them since you only have discrete choices for the number of bins
- Changing the number of bins can give quite a different picture
- They are somewhat subjective (based on the choice of bins)
- Difficult to compare more than two groups
What is a bimodal distribution?
When the distribution has two peaks
What are density plots?
- Alternative to histograms that show a continuous estimate of density
What are density plots useful for?
- Comparing distributions between groups since it is easy to overlay them
Negative of density plots
Need software to create them (whereas histograms can be calculated and drawn by hand if needed)
What is the median?
Lining up all the values in order from smallest to largest and seeing what is in the middle
Other names for the median
50th percentile
0.5 quantile
What is the 25th percentile (0.25 quartile) called?
the FIRST QUARTILE
What is the 75th percentile (0.75 quartile) called?
the THIRD QUARTILE
What is the 0th percentile (0.00 quantile) called?
the MINIMUM
What is the 100th percentile (1.00 quartile) called?
the MAXIMUM
How to estimate first quartile
Median of values BELOW the median
How to estimate third quartile
Median of values ABOVE the median
What is the five-number summary?
Minimum, first quartile, median, third quartile, maximum
What is the interquartile range?
Distance between the first and third quartiles
E.g. first quartile = 51; third = 65
therefore the interquartile range = 65-51
= 14
Positive of IQR
It is not affected by the tails of the distribution
What is a box and whisker plot
Shows the location, spread and shape of the distribution as well as flagging unusual observations
Box = Q3, M & Q1 Whiskers = Min & Max
How can data be flagged as unusual?
If they are 1.5 times higher or lower than the Q3 and Q1
What are boxplots useful for?
- Comparing distribution
What is an outlier?
A data value that does not seem to match the overall distribution observed
- Can be genuine, or made from experimental errors or mistakes in data entry
Categorical variables (nominal or ordinal) such as gender or degree program are most often graphed using…
a BAR CHART
What do bar charts show?
frequencies in each category or the percentage of participants in each category by using different bar heights (for a vertical bar graph) or lengths (for a horizontal bar graph)
What are bar charts useful for?
comparing frequencies across a range of categories