Section A.2: Univariate analysis Flashcards
Univariate Analysis
Univariate analysis is the simplest form of data analysis. It involves analysing one variable in a dataset to understand its characteristics
Dataset
A dataset is a multidimensional (multiple columns) heterogeneous (highly variable - many data types) data structure (a format for organising data)
Data structure
A specialised format for organising, processing, retrieving and storing data
Univariate data
Univariate data is a type of data which consists of observations on only a single characteristic or attribute, with no cause or relationship between variables.
Univariate data classifications (3)
1) ID - Used to uniquely identify a subject.
2) Numerical - Number based. Can be discrete or continuous
3) Categorical - Based on characteristics, can be ordinal or nominal.
Univariate analysis techniques (4)
Graphical
Tables
Descriptive statistics
Inferential statistics
Univariate data - Graphical analysis (3)
Histogram
Boxplot
Density curve
Histogram
A histogram displays the frequency of each value or group of values in numerical data
Boxplot
A boxplot summarises data based on the 5-number summary: First quartile, Median, Third quartile, Minimum, and Maximum
It is beneficial for identifying outliers in data
Bar chart
Frequency bar charts are a univariate chart used to find the frequency distribution of categories in categorical data
Pie charts
Frequency Pie charts are a univariate chart used to show the frequency distribution of categorical data based on “slices” indicating the share of each category
What are the two types of descriptive statistics?
Measure of central tendency - mean, median, mode
Measure of variability - range, IQR, variance, standard deviation etc.
What is descriptive statistics?
Descriptive statistics involves the generation of summary statistics from a sample of data, used to describe and gain insight into the features of the data set overall.
Define measures of central tendency
Measures of central tendency are statistical measures that use a single value to represent the central or typical value for a probability distribution. The three most common are mean, median, and mode.
Define measures of variability/dispersion
Measures of variability or dispersion are statistical measures that use a single value to represent the variability or dispersion of values in a data set from the central point. Univariate statistics such as: Range, IQ Range, variance, Quartiles, variance, and standard deviation, are common summary statistics used for this.