Topic 2 - Data and Graphical Summaries Flashcards
LO
LO3 Produce, interpret and compare graphical and numerical summaries, using base R and ggplot.
IDA
Initial Data Anaylysis
- The first general look at the data without formal
**Involves: **
- Data background
- Data structure
- Data wrangling
- Data summaries
Variables
- Measures/ describes some attribute of the subjects
Qualitative (categorical)
Ordinal
- Has a natural order (numbers)
Nominal
- Has no natural order (colours)
Quantitative (Numerical)
Discrete
- Clear space between numbers
Continuous
- Data that falls into constant sequence
Simple barplot
Summarizes 1 qualitative variable
Double barplot
2 qualitative variables
Data cleaning
Involves changing the format of the data, but not the essence of the data
Simple Histogram
Used for Qualitative data to see how a variable is distributed across different ‘bins’
Standard Histogram and Probability/ Density Histogram
Density histogram
- The area on the graph = 100%
- The height is found by dividing the % of subjects by length of bin
- Density Histograms dont need a y axis
Sliced Histogram
- Slicing by another variable
- Allows the addition of a qualitative variable
2 histgrams on one graph
Simple Box Plot
- Shows distribution of a single Quantitative variable, based on its percentiles
- ‘Box’ = 50% of the data
- Lines outside the box = 50% of data
IQR represents the length of the box plot
Upper threashold = 75% + 1.5 x IQR
Lower threashold = 25% - 1.5 x IQR
Comparative boxplots
Adds a new qualitative variable
- eg. Gender
Simple scatterplot
Used when there are 2 Quantitative variables
- For x and y axes
Filtered scatterplot
Adding a qualitative variable to a simple scatterplot, can add many new variables if wanted