Section A.2: Univariate analysis Flashcards

1
Q

Univariate Analysis

A

Univariate analysis is the simplest form of data analysis. It involves analysing one variable in a dataset to understand its characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dataset

A

A dataset is a multidimensional (multiple columns) heterogeneous (highly variable - many data types) data structure (a format for organising data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data structure

A

A specialised format for organising, processing, retrieving and storing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Univariate data

A

Univariate data is a type of data which consists of observations on only a single characteristic or attribute, with no cause or relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Univariate data classifications (3)

A

1) ID - Used to uniquely identify a subject.
2) Numerical - Number based. Can be discrete or continuous
3) Categorical - Based on characteristics, can be ordinal or nominal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Univariate analysis techniques (4)

A

Graphical
Tables
Descriptive statistics
Inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Univariate data - Graphical analysis (3)

A

Histogram
Boxplot
Density curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Histogram

A

A histogram displays the frequency of each value or group of values in numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Boxplot

A

A boxplot summarises data based on the 5-number summary: First quartile, Median, Third quartile, Minimum, and Maximum
It is beneficial for identifying outliers in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bar chart

A

Frequency bar charts are a univariate chart used to find the frequency distribution of categories in categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pie charts

A

Frequency Pie charts are a univariate chart used to show the frequency distribution of categorical data based on “slices” indicating the share of each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two types of descriptive statistics?

A

Measure of central tendency - mean, median, mode
Measure of variability - range, IQR, variance, standard deviation etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is descriptive statistics?

A

Descriptive statistics involves the generation of summary statistics from a sample of data, used to describe and gain insight into the features of the data set overall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define measures of central tendency

A

Measures of central tendency are statistical measures that use a single value to represent the central or typical value for a probability distribution. The three most common are mean, median, and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define measures of variability/dispersion

A

Measures of variability or dispersion are statistical measures that use a single value to represent the variability or dispersion of values in a data set from the central point. Univariate statistics such as: Range, IQ Range, variance, Quartiles, variance, and standard deviation, are common summary statistics used for this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define: Mean

A

The mean is a univariate summary statistic (measure of central tendency) that is calculated given the sum of all data points divided by the number of data points.

17
Q

Define: Median

A

The median is a univariate summary statistic (measure of central tendency) that is the middle-most value when values of data points are ordered by their magnitude. (Highest to lowest, or lowest to highest)

18
Q

Define: Mode

A

The mode is a univariate summary statistic (measure of central tendency) that is the most commonly observed value in a distribution. A distribution of data can have 0 or more modes.

19
Q

Define: Range

A

The range is a univariate summary statistic (measure of variability) that is the

20
Q

Define: Interquartile range

A

The interquartile range is the difference between the 75th (Q3) and 25th (Q1) percentiles of the data

21
Q

Define: Quartile

A

A quartile is a quantile which divides the number of data points into four parts based on their values when ordered from lowest to highest

22
Q

Define: Standard deviation

A

The standard deviation is the square root of the variance, it expresses how much the data points differ from the mean.

23
Q

Define: Variance

A

The variance of the data is the average of the squared deviations from the mean.