Module 2A Visualising Variability Flashcards

1
Q

What is variation in the context of data analysis?

A

The spread or difference between data points in a dataset, showing how much they differ from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a random variable?

A

A quantity whose value is uncertain and can vary based on chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a frequency distribution describe?

A

The value of a variable and how often they appear in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a categorical variable?

A

Data that consists of labels or names for which arithmetic manipulation is impossible

Examples include gender, color, or brand names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a quantitative variable.

A

Data that consists of numerical values for which arithmetical manipulation is possible

Examples include age, height, or income.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a sample in statistics?

A

A subset of the population that makes data collection feasible

Samples are used to infer characteristics about larger populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is relative frequency?

A

The proportion of times a value occurs in a dataset, calculated as: Frequency of a value / Total number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is percent frequency calculated?

A

Frequency of a value / Total number of values * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a probability distribution?

A

It shows how the possible values of a random variable are distributed and the likelihood of each value occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Benford’s Law state?

A

States that in many data sets, the proportion of observations in which the first digit is 1, 2, 3, 4, 5, 6, 7, 8, or 9 follows a specific distribution

This law is often used in fraud detection and data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does skewness represent in a quantitative distribution?

A

The lack of symmetry in a quantitative distribution

It indicates how much the distribution deviates from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a frequency polygon?

A

A line graph that shows the distribution of data by plotting the midpoints of each class interval and connecting them with straight lines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Trellis Display?

A

A grid of small graphs that shows how data patterns change across different categories or conditions. (Same formatting but different data sets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the first quartile?

A

25th percentile

Quartiles divide the data set into four equal parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the second quartile also known as?

A

The median

It represents the middle value of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is the interquartile range calculated?

A

3rd quartile minus 1st quartile

It measures the spread of the middle 50% of the data.

17
Q

What does the mean represent?

A

The sum of the values divided by the sample size

It is a measure of central tendency.

18
Q

How is the median defined?

A

The middle value of the sample size; if sample size is even, take the average of the two middle points

It is less affected by outliers compared to the mean.

19
Q

What is the mode?

A

The most frequent value(s) in the data set

A data set can have multiple modes or none at all.

20
Q

How is the range calculated?

A

Largest value minus smallest value in the set

It gives a measure of the spread of the data.

21
Q

What does standard deviation measure?

A

The average deviation from the mean

It quantifies the amount of variation or dispersion in a set of values.

22
Q

What is the Empirical Rule for bell-shaped distributions regarding data values within one standard deviation?

A

68% of the data values lie within one standard deviation of the mean

This rule provides a quick estimation of data spread.

23
Q

What percentage of data values lie within two standard deviations of the mean according to the Empirical Rule?

A

95%

This helps in understanding the distribution of data points.

24
Q

What percentage of data values lie within three standard deviations of the mean according to the Empirical Rule?

A

99.7%

This is known as the 68-95-99.7 rule.

25
Q

What does a Box-and-Whisker Plot use to display data?

A

It uses the measures of variability to display data.

It shows the median, quartiles, and potential outliers.

26
Q

What is a Violin Chart?

A

An advanced visualization that combines a box and whisker chart with a rotated and mirrored kernel density chart

It provides a richer representation of data distribution.

27
Q

What is statistical inference?

A

The process of using data from a sample to make conclusions or predictions about a larger population.

28
Q

What is a confidence interval?

A

Provides a range of values within which the true population parameter is expected to lie.

29
Q

How is a confidence interval on a mean calculated?

A

Sample mean ± margin of error

It reflects the uncertainty associated with the sample mean.

30
Q

What does the margin of error represent?

A

The maximum expected difference between the sample estimate and the true population value. (Uncertainty on the parameter)

31
Q

What is time series data?

A

Data collected or recorded at regular time intervals, showing how values change over time.

It is often used for forecasting and trend analysis.

32
Q

What is a time series chart?

A

A line graph that shows how data points change over time, with time on the x-axis and the measured values on the y-axis.