Descriptive Statistics Flashcards

1
Q

What is the mean?

A

The average of a set of numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you calculate the median?

A

Divide the ordered dataset into two halves; if the number of observations is odd, the middle number is the median, if even, it is the average of the two middle numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the mode represent in a dataset?

A

The most frequently occurring value in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the range in statistics.

A

The difference between the highest and lowest values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is variance calculated in a dataset?

A

The average of the squared differences from the Mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how standard deviation is used in data analysis.

A

It measures the amount of variation or dispersion of a set of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a high variance indicate about a dataset?

A

It suggests a wider spread of data points in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you find the interquartile range?

A

The difference between the 75th and 25th percentiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a boxplot and what does it show?

A

A graphical representation of the distribution of data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is it important to know the shape of the distribution?

A

It provides insights into the symmetry and spread of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is skewness in statistical terms?

A

A measure of how much data deviates from being symmetrical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain kurtosis in a dataset.

A

A measure of the “tailedness” of the probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does one identify outliers in data?

A

By identifying data points that significantly differ from other observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a frequency distribution?

A

The organization of data by the frequency of their values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can a histogram help in understanding data?

A

It visually shows the distribution of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a scatter plot used for?

A

To display values involving two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do quartiles divide a dataset?

A

They divide the dataset into four equal parts.

18
Q

What is the difference between absolute deviation and mean deviation?

A

Absolute deviation is the absolute differences, mean deviation is the average of these absolute differences.

19
Q

How do you calculate a percentile rank?

A

The position of a value in a dataset as a percentage of the total number of data points.

20
Q

What is a cumulative frequency distribution?

A

The sum of relative frequencies up to a certain point in a dataset.

21
Q

Explain the concept of a relative frequency distribution.

A

It shows the proportion of each class relative to the total number of cases.

22
Q

What role does the mean play in symmetrical distributions?

A

It represents the balance point of the distribution.

23
Q

What is the best measure of central tendency for skewed data?

A

Median.

24
Q

Why might one use the median instead of the mean?

A

It is less affected by outliers and skewed data.

25
Q

How is the mode different from the mean and median?

A

The mode is categorical unlike mean and median which are numerical.

26
Q

When is the range not a good measure of dispersion?

A

When the dataset contains outliers.

27
Q

What is the significance of a high standard deviation?

A

More data points are far from the mean.

28
Q

How do variance and standard deviation relate?

A

Standard deviation is the square root of variance.

29
Q

What are the limitations of using the range in data analysis?

A

It doesn’t account for the distribution between the highest and lowest values.

30
Q

What statistical measure can help compare data sets with different units?

A

Coefficient of variation.

31
Q

How does one interpret the standard error of the mean?

A

It represents the distribution of sampling means.

32
Q

What does a low interquartile range indicate?

A

The values are closely packed around the median.

33
Q

Why is the mean sensitive to outliers?

A

It can be distorted by extreme values.

34
Q

What type of data is best summarized by the mode?

A

Categorical data where numbers repeat often.

35
Q

How does one decide between using standard deviation and variance?

A

Depends on the analysis requirement—standard deviation is more intuitive.

36
Q

What is the benefit of using the median in real-world data?

A

It provides a more accurate measure for skewed data.

37
Q

How do you handle outliers before calculating statistical measures?

A

By either removing them or adjusting them based on the context.

38
Q

What insights can the coefficient of variation provide?

A

It shows the ratio of the standard deviation to the mean.

39
Q

Why might a bimodal distribution be significant?

A

It indicates two dominant groups within the dataset.

40
Q

How can measures of central tendency mislead if not used properly?

A

How measures can be misleading when not considering the nature of the data.