Module 2: Descriptive Statistics: Tabular, Graphical and Numerical Methods Flashcards
Frequency Distribution
A tabular representation of the summary data that shows the numerical count of items in each class in the data set.
TRUE or FALSE:
In a frequency distribution, classes can overlap.
FALSE: For any frequency distribution, the classes must not overlap. An overlap will result in double counting an item and will yield erroneous results.
Relative Frequency
Relative frequency is a calculated value that represents the proportion of the items in each class.
What is the Relative Frequency equation?
Frequency of Class
Relative Frequency of a Category = ________________
n
(where n is the total count of all the classes being compared)
How can you convert relative frequency relative percentage?
Multiply the relative frequency by 100
Frequency distribution tables are a way to help us understand data
a) numerically
b) visually
c) philosophically
a) numerically
Charting is a means to represent frequencies
a) numerically
b) visually
c) auditorily
b) visually
TRUE OR FALSE:
You must arrange charts in descending order
FALSE
You do not have to arrange your information in descending order, however it can be very helpful to do so.
Pie charts are most helpful when representing distributions focused on what?
Proportions
TRUE OR FALSE:
There are usually many ways to define classes for numerical data.
TRUE
There are usually many ways to define classes for numerical data.
What are the three steps to produce an effective frequency distribution table?
Step One: Determine the number of classes to be evaluated.
Step Two: Determine the width of the classes.
Step Three: Determine each class’s limits.
When creating a frequency distribution table, how many classes should you create?
Use the fewest number of classes possible to effectively explain your data.
What is the formula to calculate the approximate width of your classes?
Largest data value - smallest data value
Approx. width = _______________________________
number of categories
TRUE OR FALSE:
Cross-tabulations can only be used to compare qualitative-to-quantitative data.
FALSE
The use of cross-tabulations is an effective method for comparing qualitative-to-qualitative data, qualitative-to-quantitative data, and quantitative-to-quantitative data.
Scatter Diagram
A two-dimensional plot, or graphical display, of data. It helps to determine whether there is relationship between two variables.
On a scatter diagram, what indicates a POSITIVE RELATIONSHIP between variables?
When data points have an increasing slope as you move from left to right.
It indicates that if one variable is decreased, the other variable will also decrease.
What 3 types of relationships do we see in scatter diagrams?
1) Positive relationship
2) Negative relationship
3) No relationship
On a scatter diagram, what signifies NO RELATIONSHIP between variables?
Usually indicated by a horizontal pattern (or close to horizontal pattern) on the scatter diagram
On a scatter diagram, what indicates a NEGATIVE RELATIONSHIP between variables?
When the graphed data points have a downward slope, as you move left to right.
As one variable increases in value, the other variable decreases in value, and vice versa.
Measure of Central Tendency
“typical” or “average” value of a data set
TRUE OR FALSE:
There are only 3 types of measures of central tendencies?
FALSE
There are many measures of central tendencies.
The 3 we examine in this section are:
1) Mean
2) Median
3) Mode
Mean
Average
What are the 2 types of means?
1) Sample Mean
2) Population Mean
What is the mathematical formula for
SAMPLE MEAN?
Σ xi
x̄ = —–
n
What does x̄ (x-bar) represent?
x̄ stands for the “sample mean”
What does xi (x subscript i) represent?
xᵢ means “all of the x-values”
The subscript i will equal 1 to n; therefore:
x₁ = the first term in the set,
x₂ = the second term in the set,
and so on until the last term xn is entered.
What does n represent?
n means “the total number of items in the sample”
What does μ (“mu”) represent?
μ (“mu”) stands for population mean
What is the mathematical formula for
POPULATION MEAN?
Σ xᵢ
μ = —–
N
What does N represent?
N means “the total number in the population”
Median
The midpoint of all the points in the data set.
How do you find the median of a data set?
FIRST, arrange the data in ascending order.
Find the midpoint:
a) If there are an ODD number of data points in the set, the median is the middle point in the set.
The median is the point that has an equal number of points below and above it.
b) If there are an EVEN number of data points in the set, the median is the mean (or average) of the middle two points in the data set. Add the two middle points together, then divide by 2.
Mode
The value that appears most frequently in the data set.
TRUE OR FALSE:
Usually the mode will be the only measure of central tendency for qualitative data
TRUE
In many cases, the mode will be the only measure of central tendency for qualitative data.
Bimodal
When a data set has two modes
Multimodal
When a data set has more than two modes
No Mode
When no data point occurs more than once
What is the formula to determine the LOCATION OF A SPECIFIC PERCENTILE?
100
i = (——-) n
p
What do all the letters represent in the formula for determining the location of a specific percentile?
i = index p = the desired percentage n = the number of observations in the data set
How do you find the location of a specific percentile?
FIRST arrange all data points in ascending order
Then use the formula:
100
i = (——-) n
p
n will give you the numerical location within your ascending list of data points.
Quartiles
Dividing up the data into four equal parts
When dividing data into quartiles, what percentile does each quartile represent?
The first quartile (Q1) is at the 25th percentile
The second quartile (Q2) is at the 50th percentile
**Q2 is also the median
The third quartile (Q3) is at the 75th percentile
How do you find quartiles?
- Put the data in ascending order.
- Find the median, M, of the data. This median is the second quartile Q2.
- Separate the data into two halves. (That is, data below the median and data above the median.)
- Find the median of the data below the median.
This is Q1. - Find the median of the data above the median.
This is Q3.
What are “Measures of Dispersion”?
Measure the extent to which data is spread out or dispersed.
Range
The span between the smallest value in the set and the largest
What is the formula for RANGE?
Range = Largest Value - Smallest Value
What is the crudest way to measure dispersion?
Range
Interquartile Range
This calculation focuses on the middle 50% of the data.
What is the INTERQUARTILE RANGE formula?
1QR = Q3 - Q1
Variance
A measure of dispersion that uses all of the data in the set and is based on the difference between each data point in a data set and the mean of the data set.
What are the two different type of variances?
Sample Variance
and
Population Variance
σ²
The Greek lowercase letter sigma.
It represents population variance.
s²
Represents sample variance.
What is the formula for POPULATION VARIANCE?
Σ ( xᵢ - μ )²
σ² = —————-
N
What is the formula for SAMPLE VARIANCE?
Σ ( xᵢ - x̄ )²
s² = —————–
n - 1
What is “deviation about the mean”?
The calculated difference between the data value and the mean.
TRUE OR FALSE:
The larger the variance, the less dispersed (or spread out) the data.
FALSE:
The larger the variance, the MORE dispersed (or spread out) the data.
Why do we use n - 1 as the divisor in the sample variance formula?
The n - 1 value assists the formula in providing an unbiased estimate of the population variance.
Standard Deviation
The positive square root of the variance
Population Standard Deviation
= σ = √σ²
Sample Standard Deviation
= s = √s²
Coefficient of Variation
Shows how the size of the standard deviation compares to the mean of the data set
The coefficient of variation may be used to compare the dispersion of one set of data to the dispersion of another set of data.
What is the formula for COEFFICIENT OF VARIATION?
Standard Deviation
= —————————— * 100
mean
z-score
aka, the standardized value.
It tells the researcher how many standard deviations an item in the data set lies from the mean of the data set.
What is the formula for Z-SCORE?
x - μ
z = ———-
σ
What does everything stand for in the z-score formula?
z = the z-score for x µ = the mean σ = the standard deviation