2: Exploratory Data Analysis: Single Variable Flashcards
cases
objects described by a set of data (companies, subjects, customers)
label
variable used in some data sets to distinguish different cases
variable
characteristic of a case
distribution
of a variable tells us what values it takes and how often it takes these values
distribution of categorical variable
lists the categories and gives either the count or the percent of cases who fall in each category
stemplot
steam and leaf plot. gives quick pic of distribution shape while includes actual numerical values in graph. separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf – the final digit. write stems in vert. column with smallest at top and draw vert line at right. write each leaf in the row to the right of them stem, in increasing order out from the stem
histogram
breaks range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. classes = equal width.
tails
extreme values of a distribution
modes
major peaks in a distribution
time plot
of a variable plots each observation against the time at which it was measured. time is on horiz.. scale of plot and variable measured is on vert. scale
mean vs. median
mean is average value.
(x1 + x2+ x3 + xn / n)
median is middle value.
(1) if number of observations is odd – medium’s LOCATION can be found by counting (n+1)/2 observations up from bottom of the list
(2) if even – median is the mean of the two center observations in the ordered list. location is (n+1)/2 observations up from bottom of the list
quartile
upper quartile = median of the upper half of the data. lower quartile = median of lower half of the data
pth percentile
the value that has p percent of the observations fall at or below it
five number summary
set of observations consists of the smallest observation, the first quartile, the median, the third quartile, the largest observation - from small to big.
Min Q1 M Q3 Max
boxplot
graph of five-number summary