1 Data Exploration and Summary Statistics Flashcards
What is central tendency?
Measures the ‘center’ or ‘typical’ value in a dataset.
What is the formula for calculating the mean?
Mean = ( rac{ ext{Sum of all values}}{ ext{Total number of values}} )
What is the mean of the ages (39, 34, 37, 35, 33)?
Mean = 35.6
What Excel function is used to calculate the mean?
=AVERAGE(range)
What is the median?
Middle value when data is arranged in ascending order.
What is the median of the ages (39, 34, 37, 35, 33)?
Median = 35
What is the median calculation for an even number of values?
Average of the two middle values.
What Excel function is used to calculate the median?
=MEDIAN(range)
What is the mode?
Most frequent value in the dataset.
What is the mode of the ages (25, 30, 35, 35, 40, 45)?
Mode = 35
What Excel function is used to calculate the mode?
=MODE(range)
What do summary statistics provide?
An overview of the dataset’s main characteristics.
What does standard deviation measure?
How dispersed values are around the mean.
What is the first step in calculating standard deviation?
Find the mean.
What does a smaller standard deviation indicate?
Data points are closer to the mean.
What Excel function is used to calculate standard deviation?
=STDEV(range)
What is the definition of range?
Difference between maximum and minimum values.
What is the formula for calculating range?
Range = Maximum - Minimum
What is variance related to?
Standard deviation, focusing on squared deviations.
What is the formula for variance?
Variance = Average of squared deviations.
What Excel function is used to calculate variance?
=VAR(range)
What are outliers?
Extreme values significantly different from other data points.
What methods can be used to detect outliers?
- IQR (Interquartile Range) * Z-scores
What visualization tool can be used to visualize data distribution?
Histograms
What can box plots help identify?
Outliers and understand the spread.
What are the common summary statistics?
- Mean * Median * Standard Deviation * Minimum * Maximum