STAT MOD 1: Chapter 2 Flashcards
Graphical Methods for Describing Data Distributions
Identifying the type of data (know for type of graphical display):
What is univariate?
data that consists of observations on a single variable made on individuals
- Collected one variable on each subjects
Ex: height
Identifying the type of data (know for type of graphical display):
What is bivariate?
data that consists of pairs of numbers from two variables for each individual
- Collected two variables
Ex: height and weight
Identifying the type of data (know for type of graphical display):
What is multivariate?
data that consist of observations on two or more variables
- Many variables for each subjects
Ex: height, weight, sit-ups, mile time, distance
Identifying the type of variable:
What is categorical (qualitative)?
categorical or non-numerical responses
Ex: eye color, major, neighborhood, numbers that are labels are qualitative (room number, section number)
Identifying the type of variable:
What is numeric (quantitative)?
measurements take on numerical values
Ex: number of units, hours, minutes
What are the types of numeric variable?
1) Discrete
2) Continuous
Identifying the type of variable:
What is discrete?
variable that are isolated points along a number line, counts of items
Ex: stop lights, units, number of jobs
Identifying the type of variable:
What is continuous?
variable that can be any value in a given interval
Ex: time, height, weight, age
Know how to choose appropriate type of display for given data/variable type
Refer to graphical display chart with notes
What is distribution?
describes how often the possible responses occur
- How often the variable takes on a value
What is frequency?
a listing of all categories along with their frequencies (count)
- How many people are in your count (pivot table)
What is relative frequency?
a listing of all categories along with their relative frequencies expressed
- as a proportion (between 0 and 1)
- percent (between 0 and 100)
What charts do you use for categorical variables?
bar charts, comparative bar charts, pie charts
How do you get the frequency of a bar chart?
if asking for frequency of a certain category, you need take height of bar and divide by size of dataset
What are characteristics of pie chart?
- the area corresponds to the fraction in the category
- need to add up to 100%
What graphical displays do you use for small data sets?
- dot plots
- stem and leaf displays
- comparative stem and leaf
- split stem and leaf (used when there’s a lot of leaves in a category)
What graphical displays do you use for large data sets?
- histogram
- comparative histograms
What must you be able to interpret from these graphical displays (dotplot, stem and leaf display, histograms)?
center, spread, shape, gaps, outliers
How do you construct a histogram (general)?
- Create bins
- Count how many of your data observations fell into each bins
- frequency/relative frequencies are plotted on y-axis of histogram
Shape of distribution:
What is symmetric?
distribution looks similar on both sides
- Bell-shape
Shape of distribution:
What is skewed right?
more data on the left side of the distribution, so the right tail is longer
- Positive skew (has longer tail or outliers on right side)
Shape of distribution:
What is left skewed?
more data on the right side of the distribution, so the left tail is longer
- Negative skew (has longer tail or outliers on left side)
What is mode? What are the types?
the most frequent value of a data set
unimodal - shape in which there is one prominent peak
bimodal - shape in which there are two prominent peaks
Give examples of variables you would expect to have each shape.
- symmetric (scores)
- right skewed (income)
- left skewed (grade)