Descriptive Statistics Flashcards
What is the mean?
The average of a set of numbers.
How do you calculate the median?
Divide the ordered dataset into two halves; if the number of observations is odd, the middle number is the median, if even, it is the average of the two middle numbers.
What does the mode represent in a dataset?
The most frequently occurring value in a dataset.
Define the range in statistics.
The difference between the highest and lowest values.
How is variance calculated in a dataset?
The average of the squared differences from the Mean.
Explain how standard deviation is used in data analysis.
It measures the amount of variation or dispersion of a set of values.
What does a high variance indicate about a dataset?
It suggests a wider spread of data points in the dataset.
How do you find the interquartile range?
The difference between the 75th and 25th percentiles.
What is a boxplot and what does it show?
A graphical representation of the distribution of data points.
Why is it important to know the shape of the distribution?
It provides insights into the symmetry and spread of data.
What is skewness in statistical terms?
A measure of how much data deviates from being symmetrical.
Explain kurtosis in a dataset.
A measure of the “tailedness” of the probability distribution.
How does one identify outliers in data?
By identifying data points that significantly differ from other observations.
What is a frequency distribution?
The organization of data by the frequency of their values.
How can a histogram help in understanding data?
It visually shows the distribution of data.
What is a scatter plot used for?
To display values involving two variables.
How do quartiles divide a dataset?
They divide the dataset into four equal parts.
What is the difference between absolute deviation and mean deviation?
Absolute deviation is the absolute differences, mean deviation is the average of these absolute differences.
How do you calculate a percentile rank?
The position of a value in a dataset as a percentage of the total number of data points.
What is a cumulative frequency distribution?
The sum of relative frequencies up to a certain point in a dataset.
Explain the concept of a relative frequency distribution.
It shows the proportion of each class relative to the total number of cases.
What role does the mean play in symmetrical distributions?
It represents the balance point of the distribution.
What is the best measure of central tendency for skewed data?
Median.
Why might one use the median instead of the mean?
It is less affected by outliers and skewed data.
How is the mode different from the mean and median?
The mode is categorical unlike mean and median which are numerical.
When is the range not a good measure of dispersion?
When the dataset contains outliers.
What is the significance of a high standard deviation?
More data points are far from the mean.
How do variance and standard deviation relate?
Standard deviation is the square root of variance.
What are the limitations of using the range in data analysis?
It doesn’t account for the distribution between the highest and lowest values.
What statistical measure can help compare data sets with different units?
Coefficient of variation.
How does one interpret the standard error of the mean?
It represents the distribution of sampling means.
What does a low interquartile range indicate?
The values are closely packed around the median.
Why is the mean sensitive to outliers?
It can be distorted by extreme values.
What type of data is best summarized by the mode?
Categorical data where numbers repeat often.
How does one decide between using standard deviation and variance?
Depends on the analysis requirement—standard deviation is more intuitive.
What is the benefit of using the median in real-world data?
It provides a more accurate measure for skewed data.
How do you handle outliers before calculating statistical measures?
By either removing them or adjusting them based on the context.
What insights can the coefficient of variation provide?
It shows the ratio of the standard deviation to the mean.
Why might a bimodal distribution be significant?
It indicates two dominant groups within the dataset.
How can measures of central tendency mislead if not used properly?
How measures can be misleading when not considering the nature of the data.