Chapter 2- Describing the distribution of a variable Flashcards
measures most typical value
Measures of central tendency
- Mean
- Median
- Mode
What is the mean, and how is it calculated?
The mean is the average of all the values in a dataset. It represents a typical value by summing all the observations and dividing by the number of observations. There are two types of means, depending on whether the dataset represents a sample or the entire population:
* Sample mean – denoted as X‾ (X-bar): Used when the data represents only a sample of the population.
* Population mean – denoted as μ (Greek letter “mu”): Used when the data represents the entire population.
What is the median?
The median is the middle value when the data is arranged in ascending order. It represents the point where half the values are below and half are above. The median works slightly differently depending on whether the number of observations is odd or even:
* Odd number of observations: The median is the middle value.
* Example: If there are 9 values, the median is the 5th value.
* Even number of observations: The median is the average of the two middle values.
What makes the median different from the mean?
The median is not affected by extremely high or low values, while the mean is.
When is the median more appropriate than the mean?
The median is often better than the mean when the data is skewed, meaning there are outliers (extremely high or low values) that would distort the mean.
the median remains a more representative measure of the “typical” variab
What is the mode, and how is it calculated?
The mode is the value that appears most often in a dataset. In some cases, there may be no mode (if no value repeats), or there can be multiple modes if several values occur with equal frequency.
When is the mode useful?
The mode is particularly useful in cases where you want to know the most frequent or common value.
Which is a better measure of central tendency: mean, median, or mode?
- Mean is best when the data is symmetrical and there are no extreme outliers.
- Median is better when there are outliers or skewed data because it isn’t influenced by extreme values.
- Mode is useful when you’re interested in the most frequent occurrence, such as finding the most common salary or the most popular choice.
What is the 5 Figure summary?
- Minimum
- Q1
- Q2- Median
- Q3
- Maximum
What is a percentile?
For a given percentage p, the pth percentile is the value such that p% of all the data points are below (or equal to) this value.
What is a quartile, and how does it relate to percentiles?
These are specific types of percentiles that divide data into four equal parts. There are three key quartiles:
* 1st Quartile (Q1) = 25th percentile
* 2nd Quartile (Q2) = 50th percentile (This is also called the median.)
* 3rd Quartile (Q3) = 75th percentile
How do you calculate percentiles and quartiles in Excel?
-
PERCENTILE Function: This takes two arguments:
1. The range of your data (for example, all the salaries).
2. A value p between 0 and 1, representing the desired percentile. For example, to calculate the 95th percentile, you would use PERCENTILE(data_range, 0.95). -
QUARTILE Function: This function also takes two arguments:
1. The data range.
2. A number (1, 2, or 3) to specify the quartile. For example, QUARTILE(data_range, 1) will give you the 1st quartile (25th percentile), and QUARTILE(data_range, 3) will give you the 3rd quartile (75th percentile).
What are the new functions Microsoft introduced in Excel 2010?
- PERCENTILE.EXC: Exclusive percentile function.
- PERCENTILE.INC: Inclusive percentile function.
- QUARTILE.EXC: Exclusive quartile function.
- QUARTILE.INC: Inclusive quartile function.
What does the Inclusive function do?
PERCENTILE.INC and QUARTILE.INC: These functions work the same as the older PERCENTILE and QUARTILE functions. They include the endpoints of the data range when calculating the percentile or quartile.
Example: If you are calculating the 90th percentile, the function will include data at the lower and upper bounds of the dataset in the calculation.
What does the Exclude function do?
PERCENTILE.EXC and QUARTILE.EXC: These functions are designed for smaller datasets, where including the endpoints can introduce bias. The EXC versions exclude the endpoints from the calculation.
Example: If calculating the 90th percentile with PERCENTILE.EXC, the function excludes the highest and lowest values to reduce bias, which can be more accurate for small datasets.