Chapter 2- Describing the distribution of a variable Flashcards

1
Q

measures most typical value

Measures of central tendency

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mean, and how is it calculated?

A

The mean is the average of all the values in a dataset. It represents a typical value by summing all the observations and dividing by the number of observations. There are two types of means, depending on whether the dataset represents a sample or the entire population:
* Sample mean – denoted as X‾ (X-bar): Used when the data represents only a sample of the population.
* Population mean – denoted as μ (Greek letter “mu”): Used when the data represents the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the median?

A

The median is the middle value when the data is arranged in ascending order. It represents the point where half the values are below and half are above. The median works slightly differently depending on whether the number of observations is odd or even:
* Odd number of observations: The median is the middle value.
* Example: If there are 9 values, the median is the 5th value.
* Even number of observations: The median is the average of the two middle values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes the median different from the mean?

A

The median is not affected by extremely high or low values, while the mean is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is the median more appropriate than the mean?

A

The median is often better than the mean when the data is skewed, meaning there are outliers (extremely high or low values) that would distort the mean.

the median remains a more representative measure of the “typical” variab

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mode, and how is it calculated?

A

The mode is the value that appears most often in a dataset. In some cases, there may be no mode (if no value repeats), or there can be multiple modes if several values occur with equal frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is the mode useful?

A

The mode is particularly useful in cases where you want to know the most frequent or common value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which is a better measure of central tendency: mean, median, or mode?

A
  • Mean is best when the data is symmetrical and there are no extreme outliers.
  • Median is better when there are outliers or skewed data because it isn’t influenced by extreme values.
  • Mode is useful when you’re interested in the most frequent occurrence, such as finding the most common salary or the most popular choice.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the 5 Figure summary?

A
  1. Minimum
  2. Q1
  3. Q2- Median
  4. Q3
  5. Maximum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a percentile?

A

For a given percentage p, the pth percentile is the value such that p% of all the data points are below (or equal to) this value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a quartile, and how does it relate to percentiles?

A

These are specific types of percentiles that divide data into four equal parts. There are three key quartiles:
* 1st Quartile (Q1) = 25th percentile
* 2nd Quartile (Q2) = 50th percentile (This is also called the median.)
* 3rd Quartile (Q3) = 75th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you calculate percentiles and quartiles in Excel?

A
  • PERCENTILE Function: This takes two arguments:
    1. The range of your data (for example, all the salaries).
    2. A value p between 0 and 1, representing the desired percentile. For example, to calculate the 95th percentile, you would use PERCENTILE(data_range, 0.95).
  • QUARTILE Function: This function also takes two arguments:
    1. The data range.
    2. A number (1, 2, or 3) to specify the quartile. For example, QUARTILE(data_range, 1) will give you the 1st quartile (25th percentile), and QUARTILE(data_range, 3) will give you the 3rd quartile (75th percentile).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the new functions Microsoft introduced in Excel 2010?

A
  • PERCENTILE.EXC: Exclusive percentile function.
  • PERCENTILE.INC: Inclusive percentile function.
  • QUARTILE.EXC: Exclusive quartile function.
  • QUARTILE.INC: Inclusive quartile function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the Inclusive function do?

A

PERCENTILE.INC and QUARTILE.INC: These functions work the same as the older PERCENTILE and QUARTILE functions. They include the endpoints of the data range when calculating the percentile or quartile.

Example: If you are calculating the 90th percentile, the function will include data at the lower and upper bounds of the dataset in the calculation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the Exclude function do?

A

PERCENTILE.EXC and QUARTILE.EXC: These functions are designed for smaller datasets, where including the endpoints can introduce bias. The EXC versions exclude the endpoints from the calculation.

Example: If calculating the 90th percentile with PERCENTILE.EXC, the function excludes the highest and lowest values to reduce bias, which can be more accurate for small datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are the EXC (Exclusive) versions recommended for small datasets?

A

In smaller datasets, the inclusion of extreme values (i.e., the smallest and largest numbers) can skew the results disproportionately. By excluding those endpoints, the EXC functions offer a more balanced calculation. This is useful when dealing with small sample sizes, where outliers or extreme values can have an outsized effect.

17
Q

How are these functions typically used?

A
  • PERCENTILE.EXC (or PERCENTILE.INC): Used to find the value below which a given percentage of data falls.
    Example: If you want to find the 90th percentile salary in a dataset, PERCENTILE will return the salary below which 90% of the salaries lie.
  • QUARTILE.EXC (or QUARTILE.INC): Used to divide your dataset into quartiles. Quartiles are essentially percentiles at 25%, 50%, and 75%.
    Example: If you calculate the 1st quartile, it gives you the value below which 25% of your data lies.
18
Q

What are the meaures of variability?

A
  • Range
  • IQR
  • Variance
  • Standard variation
  • Mean absolute deviation
19
Q

What is variability in data?

A

Variability tells us how spread out the values in a dataset are. If the values are close together, the variability is low. If they are far apart, the variability is high. It gives us an idea of the consistency or fluctuation in the data.

20
Q

What is the range?

A

The range is the simplest measure of variability. It is calculated as:
Range=Maximum Value−Minimum Value

21
Q

Disadvantage of using the range

A

The range is too sensitive to extreme values (outliers). For example, if one player’s salary increases by $10 million, the range jumps by $10 million, even though most salaries haven’t changed.

22
Q

What is the interquartile range (IQR)?

A

The interquartile range (IQR) is a more robust measure of variability because it focuses on the middle 50% of the data and ignores the extreme values. It is calculated as:
IQR=Q3−Q1

23
Q

What is variance?

A

Variance is a measure of how spread out the data is, but instead of just looking at how far each number is from the average (mean), it squares those differences. By squaring, we make sure that all differences (whether above or below the mean) contribute equally.