Chapter 3 Flashcards
Characteristic or measure obtained by using the data values from a sample
Statistic
Characteristic or measure obtained by using all the data values from a specific population
Parameter
Rounding is the last step
Round to one more decimal point than in original data
General Rounding Rule
The sum of the total X values, divided by the total number of values
Mean
Calculated by using sample data. This is a statistic
Sample Mean
Calculated by using all the values in the population. Population is a parameter
Population Mean
When data is in numeric order
Data Array
Midpoint of the data array. Symbol is MD
Median
The value that occurs most often in a data set
Mode
Data set that only has one value that occurs with the greatest frequency
Unimodal
Data set that has two values that occur with the same greatest frequency
Bimodal
A data set that has more than two values that occur with the same greatest frequency
Multimodal
No data value occurs more than once
No mode
Class with the largest frequency
Modal Class
Extremely low data values or extremely high data values in a data set
Outliers
Rough estimate of the middle. Affected by outliers.
(Defined as the sum of the lowest and highest values in the data set divided by 2)
Midrange (MR)
Mean that considers an additional factor. Used when the values are not all equally represented
Weighted Mean
Majority of the data values fall to the left of the mean and cluster at the lower end of the distribution; tail is to the right.
Mean is to the right of the median, and mode is to the left of the median.
Positively skewed or right-skewed distribution
Data values are evenly distributed on both sides of the mean (when the distribution is unimodal, the mean, median, and mode are the same and at the center of the distribution)
Symmetric Distribution
Majority of the data values fall to the right of the mean and cluster at the upper end of the distribution. Mean is to the left of the median, and mode is to the right of the median.
Negatively skewed or left-skewed
Highest value minus the lowest value
Range (R)
Based on the difference or distance each data value is from the mean
Data Variation
Data variation difference or distance is called
Deviation
Average of the squares of the distance each value is from the mean
Population Variance
Square root of the variance
Population Standard Deviation
Standard deviation divided by the mean. Results are represented as a percentage
Coefficient of Variation (CVar)
Range used to approximate the standard deviation. Only is an approximation and should be used when distribution is unimodal and roughly symmetric
Range Rule of Thumb
Specifies the portions of the spread in terms of the standard deviation (The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1-1/k^2) Can be used to find the minimum percentage of data values that will fall between any two given values
Chebyshev’s Theorem
When distribution is bell-shaped
Empirical Rule
Changing the data values to a different scale i.e) Changing data values to Fahrenheit scale
Linear Transformation of The Data
Z score or Standard Score
A value obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol is Z.
Percentiles
Divide the data set into 100 equal groups.
Quartiles
Divide the distribution into four equal groups
Interquartile Range (IQR)
The difference between the third and first quartiles
Outlier
Extremely low data value when compared with the rest of the data values
Exploratory data analysis (EDA)
data can be organized using a stem and leaf plot. This measure of central currency is the median.
Boxplot
Graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through the median or Q2.
Modified boxplot
Can be drawn and used to check for outliers