Reading Quiz 1 Flashcards
Distribution
Distribution of a variable indicates what values a variable takes n and the frequency at which it takes on these values
Key features of a histogram
Center, spread, shape, outliers
Three basic shapes
Symmetric, skewed right, skewed left
Shape of distribution can also be described
By referring to number of modes
Uniondale, bimodal, multimodal, or uniform
Measures of center
Mean and median
Sample mean
Arithmetic average or arithmetic mean, average of a set of data values
Median
Middle number
Median position formula
Indicates where the median will lie
(n+1)/2
n = number of numbers in the data set
Formula only indicates where median is not what median is
Perfectly symmetric vs skewed
If perfectly symmetric, mean equals median
If skewed, mean farther out in long tail than median
Measures of spread
Range, interquartile range, five number summary, variance and sample standard deviation
Range
Largest number minus smallest number
Interquartile range
Q3 - Q1
Five number summary
Minimum, Q1, median, Q3, maximum
Most commonly used measure of spread
Standard deviation
Variance
s^2 = (Σ(x1 - xbar)^2)/(n-1)
Standard deviation
The square root of the variance, represented by s
Measures how the numbers are spread out from the mean
s = square root of variance formula
Nonresistant
Deviation of xi from the mean
xi - xbar
Sum of all deviations of the mean equals zero
Degrees of freedom
Quantity n - 1
Appears in the denominator of the formulas for variance and standard deviation
Symmetric measures
Mean and standard deviation
Skewed sets measures
Median and five number summary
Outlier
An individual observation that falls outside the overall pattern of the graph
striking deviations
Outlier test
Data point is outlier if it lies more than 1.5 interquartile ranges below Q1 or above Q3
two types of graphs most appropriate for categorical data
pie charts and bar graphs
graph inappropriate for when several percentages don’t represent portions of same whole
pie chart
want raw data values, center, shape, spread, too many for dot plot, what graph
stemplot
histogram
breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class most common graph of distribution of quantitative variable
ogive
relative cumulative frequency graph
horizontal axis: values of variable
vertical axis: relative cumulative frequency
how to find center of ogive
horizontal line from 50% on vertical axis to graph, that value is the center
time plot axes
time is on horizontal axis
trend
on time plot, overall upward or downward slope
seasonal variation
time plot, shorter-term, regularly occurring, rise and fall variations
resistant measure
measure of center of spread is relatively unaffected by extreme observations
two resistant measures
median and interquartile range
first quartile
the median of the subset of observations whose position in the ordered list is to the left of the overall median
graph that gives picture of five number summary
boxplot
IQR
Q3-Q1
difference between regular and modified boxplot
regular is graph of five number summary
modified plots suspected outliers individually
measures of spread
standard deviation and IQR
when is standard deviation 0
when there is no spread aka all observations are the same value
adding same number to each distribution
adds a to measures of center and to quartiles but does not change measures of spread
multiply each observation by same number
multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b
three graphical measures of comparing distributions
bar charts, back to back stemplots, and side by side boxplots
categorical variables
place individuals into groups or categories (qualitative)
quantitative variables
numeric measures, makes sense to perform arithmetic operations such as adding or averaging
most appropriate displays of categorical data
pie charts dot plots bar graphs
best displays for quantitative data
dot plots stem plots histograms
bins
values in piles, histograms, need to be physically and numerically equal in width
rule of thumb bin number
square root of number of observations
spread
level of variability, range also a measure of this
standard deviation
measure of average distance of all observations from the mean
box plots
not ideal indicators of shape and should not be used if there are other options