BA 1 - Describing and Summarizing Data Flashcards
Axes of a histogram?
X-axis - bins corresponding to ranges of data;
Y-axis - frequency of observations falling into each bin.
What’s an outlier?
An outlier is a value that falls far from the rest of the data.
How do you examine the validity of an outlier?
i. Check if it’s valid, though unusual;
ii. Check for a data entry error; and
iii. Check if it was collected under different circumstances than the rest of the data.
What do you do about an outlier?
Leave it; change it to its corrected value; or in extreme cases, delete it.
Skewness
Skewness measures the degree of a graph’s asymmetry.
What are descriptive statistics?
Summary measures that provide an overview of the data set without showing every data point.
Mean
Sum of all data points divided by the number of data points
The mean is affected by outliers.
Median
Middle value of the data set; i.e. 50th percentile.
When the number of values is even, it’s the average of the middle two values.
Mode
Value that occurs most frequently
Conditional mean
The mean of a subset of the data that includes all values satisfying a certain condition.
Percentile value
Value beneath which a certain percentage of the data lie
i.e. 25th percentile is the smallest value that is greater than or equal to 25% of the data points.
Range
Maximum value - Minimum value
Relationship between standard deviation and variance?
SD = square root (Variance)
What does variance measure?
Variance is a measure of how far each point is from the mean.
Difference between populate and sample variance/sd?
For population, denominator is N; for sample, denominator is ‘n-1’