Chapter 3 (3.1-3.4): Descriptive Statistics and Analytics--Numerical Methods Flashcards
In addition to describing the shape of a distribution, we want to describe the data set’s central tendency. This includes what 3 things?
Mean, median, and mode
A measure of central tendency represents the ________ (or middle) of the data.
center
average of the population measurements
population mean
a number calculated from all the population measurements that describes some aspect of the population
population parameter
(all the numbers we calculate using population measurements is called (population) parameter)
a number calculated using the sample measurements that describes some aspect of the sample
sample statistic
(when we calculate mean, median, and mode using samples, this is called (sample) statistics)
What are the 3 measures of central tendency?
Mean, median, and mode
the average or expected value
mean
the value of the middle point of the ordered measurements
median (Md)
the most frequent value
mode (Mo)
What is the symbol for population mean? What about sample mean?
- the fancy M
(the population mean is the value to expect, on average, in the long-run) - the x with a line over it
(the sample mean is a point estimate of the population mean)
How do you calculate mean?
Add all numbers and then divide that total by total number of classes
Mean is also called what other two things (these words are interchangeable)?
- Average
- expected value
For median, if the number of measurements is (odd/even), the median is the middlemost measurement in the ordering.
For median, if the number of measurements is (odd/even), the median is the average of the two middlemost measurements in the ordering.
odd; even
What do you have to do before calculating median?
Arrange the numbers in numerical (increasing) order
T or F: Modes are the values that are observed “most typically”.
True
If there are two modes, the data is ________.
bimodal
(ex: 3,4,5,5,5,6,6,6,7,8,9… 5 and 6 are bimodal)
If there are more than two modes, the data is _________.
multimodal
(ex: 1,1,1,2,2,2,3,3,3)
When data are in classes, the class with the (highest/lowest) frequency is the modal class.
highest
(the tallest box in the histogram)
Mean, median, and mode are _________.
descriptives
What are the 13 descriptive statistics?
- Mean
- Median
- Mode
- Standard Error
- Standard Deviation
- Sample Variance
- Kurtosis
- Skewness
- Range
- Minimum
- Maximum
- Sum
- Count
If mean=median=mode, then the curve will be:
a. skewed to the right
b. symmetrical
c. skewed to the left
b. symmetrical
If mode < median < mean, then the curve will be:
a. skewed to the right
b. symmetrical
c. skewed to the left
a. skewed to the right
If mean < median < mode, then the data will be:
a. skewed to the right
b. symmetrical
c. skewed to the left
c. skewed to the left
T or F: A population parameter describes some aspect of the population and is a number calculated using all population measurements. Its point estimate is calculated from a sample of measurements rather than all the population measurements.
True
What 3 things tells us the variation in our data (what are the 3 measures of variation)?
- Range
- Standard Deviation
- Variance
How do you calculate range?
Highest number - smallest number (in our data)
the average of the squared deviations of all the population measurements from the population mean
Variance
the square root of the population variance
standard deviation
What two measures of variation measures the spread of data from the mean (how far our data is spread from the mean)?
Variance and standard deviation
What are the 2 steps to calculate variance?
- Calculate mean
- Take the mean and subtract it from each data point separately (one data point at a time) and square that, then add all of these numbers together and divide by total number of observations
(look at camera roll for example of this)
How do you calculate standard deviation?
Just take the square root of the variance
What is the standard deviation symbol?
What is the variance symbol?
- The funky looking 6 shape
- That 6 shape but squared
(look at camera roll for what population standard deviation and sample standard deviation looks like)
The Empirical Rule for Normal Populations:
If a population has mean (fancy M) and standard deviation (funky looking 6) and is described by a normal curve, then:
1. _______ of the population measurements lie within one standard deviation of the mean: [M-6, M+6]
2. _______ lie within two standard deviations of the mean: [M-2(6), M+2(6)]
3. ________ lie within three standard deviations of the mean: [M-3(6), M+3(6)]
- 68.26%
- 95.44%
- 99.73%
(make sure you know these percentages; picture of this explained in camera roll)
What is the formula for calculating z-scores?
z = (x - mean) / standard deviation
(For any x in a population or sample, the associated z score is z = (x-mean) / standard deviation)
(example of calculating z-scores in camera roll)
the number of standard deviations that x is from the mean; indicates the relative location of a value within a population or sample
z scores (standardized value)
If z score is positive, then our number is (greater than/less than/equal to) the mean.
greater than
If z score is negative, then our number is (greater than/less than/equal to) the mean.
less than
If z-score is 0, then….
our number (x) is equal to the mean
measures the size of the standard deviation relative to the size of the mean
coefficient of variation
What is the formula for calculating the coefficient of variation?
(Standard deviation / mean) x 100%
(example of calculating this in camera roll)
What is the coefficient of variation used for?
To measure risk
(as well as compare the relative variabilities of values about the mean, and compare the relative variability of populations or samples with different means and different standard deviations)
If standard deviation is high, our data (is/is not) spread all over the mean, which means it has (high/low) risk.
is; high
If standard deviation is small, our data is (spread our/near) the mean, which means it has (high/low) risk
low (or less)
pth percentile:
P% are (above/below) P and (100-P) are (above/below) P.
below; above
(ex: if your score is 90th percentile, that means 90% of scores are below yours and (100-90) scores are above yours.
- The first quartile (Q1) is the _____ percentile.
- The second quartile (Q2) (median) is the ______ percentile.
- The third quartile (Q3) is the _____ percentile.
- The interquartile range (IQR) is ______.
- 25th
- 50th (denoted Md)
- 75th
- Q3-Q1
What are the 3 steps for calculating percentiles?
- Arrange the measurements in increasing (lowest to highest) order.
- Calculate the index i= (p/100) x n where p is the percentile to find. (n= count of data points)
- (a) if i is not an integer (whole number), round up and the next integer greater than i denotes the pth percentile
(b) if i is an integer, the pth percentile is the average of the measurements in the i and i+1 ordered positions. (ex: i=2, so (2+3)/2 and this = P)
What is the formula for calculating the index when calculating percentiles?
i = (p/100) x n
(example of calculating percentiles in camera roll)
The 5 Number Summary is used to create what type of graph?
Box and whisker plot
What is the 5 Number Summary?
- The smallest measurement
- The first quartile, Q1
- The median, Md (Q2)
- The third quartile (Q3)
- The largest measurement
(Once you have these numbers you can display this info visually using a box-and-whiskers plot)
a convenient way of visually displaying the data through quartiles, and is easy to read and summarize
box and whisker plot
The inner fences of a box-and-whiskers plot is located ____x_____ away from the quartiles
1.5 x IQR
(Q1 plus or minus (1.5 x IQR))
What is the formula to calculate the lower limit for a box and whiskers plot?
Q1-1.5(IQR)
What is the formula to calculate the upper limit for a box and whiskers plot?
Q3 + 1.5(IQR)
T or F: If there is a long whisker (line) on right side, it is left-skewed.
False; right-skewed
T or F: If there is a long whisker (line) on left side, it is left-skewed.
True
measurements that are very different from other measurements; they are either much larger or much smaller than most of the other measurements
Outliers
________ lie beyond the limits of the box-and-whiskers plot; measurements less than the lower limit or greater than the upper limit
outliers
T or F: Outliers skew our data.
True
the length of the interval that contains the middle 50% of the data; is a single number, not a range nor an interval of numbers
interquartile range (Q3-Q1)
a value below which lie the specified percentage of the measurements in the population or in the sample
percentile
When points on a scatter plot seem to fluctuate around a straight line, there is a _______ relationship between x and y.
linear
T or F: A positive covariance indicates a positive linear relationship between x and y.
True (as x increases, y increases)
T or F: A negative covariance indicates a negative linear relationship between x and y
True (as x increases, y decreases)
T or F: A box and whiskers plot is used to study the relationships between 2 quantitative variables.
False; scatter plot
What is the correlation coefficient called?
r
When r > 0, this indicates a (positive/negative) relationship.
positive
When r < 0, this indicates a (positive/negative) relationship
negative
When r = 0, this indicates (positive/negative/no) relationship
no
What does the correlation coefficient tell us?
How strong the relationship is between 2 variables (the strength of the relationship does NOT depend on the magnitude of data)
a. sample correlation coefficient (r) is always between…
b. values near ___ show strong negative correlation.
c. values near ____ show no correlation
d. values near _____ show strong positive correlation
a. -1 and 1
b. -1
c. 0
d. 1
If there is a linear relationship between x and y, you might wish to predict y on the basis of x. This requires the equation of a line describing the linear relationship. Line is calculated based on the _____ _____ line. What is the formula for this?
- least squares
- y = b0 +b1(x)
(b0 = y-intercept and b1 = slope)
(example of this in camera roll; will be on exam!)