Chapter 1: Data Analysis Flashcards
Define Categorical Variable
assigns labels to place individuals into categories (usually non-numeric)
ex. hair color, eye color, zip-code, Birthday, etc.
Define Quantitative Variable
number values that can be used in calculations (numbers)
ex. height, weight, number of siblings, how long it takes, etc.
How to distinguish between Categorical and Quantitative Variables?
Ask = does it make sense to find the average of the data?
Define Discrete Quantitative Variable
fixed set of values with gaps in between (dots)
ex. Number of pets you have
Define Continuous Quantitative Variable
any value on an interval (line segment)
ex. how long it takes to finish homework
How to distinguish between Categorical and Quantitative Variables?
Ask = does it make sense to have 0.5 of something? ==> yes for height, but no for number of pets you have.
measurements are usually continuous
What is a distribution?
values a variable takes and how often it takes those values.
ex. in a die roll,
[1, 2, 3, 4, 5, 6] –> “values a variable takes”
[got 2 two times, 4 one times, and 3 two times] –> “how often it takes those values”
Frequency Table
“bare bone”
how many each values were taken
x. 1. 2. 3.
freq. 3. 6. 4.
Relative Frequency Table
“more info”
percentage of the frequency table
x. 1. 2. 3.
rel. freq. 3/20. 6/20. 4/20.
When is it better to use relative frequency?
when comparing groups of different sizes
marginal relative frequency
turns total into relative frequency
B/C –> interpretation: proportion of all who are ___________ is ____#____.
joint relative frequency
A/C –> interpretation: the proportion of all who are __________ and __________ is ____#____.
conditional relative frequency
A/B –> interpretation: the proportion of __________s who are __________ is ____#____.
What does it mean for two variables to have an association?
if you know 1 variable, it helps you predict the other variable.
Graphs of Categorical Variables
- Side-by-side bar graph
- segmented bar graph
- mosaic plot
- pie chart
- pictograph
Graphs of Quantitative Variables
- histogram
- stem-and-leaf plot (KEY AND LABEL)
- dot plot
- box plot
- scatter plot
- ogive
What does it mean for two variables to not have an association?
knowing one or the other would not help you predict the other variable.
Distribution shapes names
- symmetric
- skewed left/right
- unimodal (single-peaked)
- bimodal (double-peaked)
- uniform (no peaks)
DESCRIBING A DISTRIBUTION
SOCS + context.
Use comparative language
- Shape
- Outliers
- Center
- Spread
Define statistic
a number that was calculated from a SAMPLE
Define parameter
a number that was calculated from a POPULATION
Define resistant measure (with examples)
one that outliers/extreme value won’t affect
Resistant measure: Median, IQR
Not a Resistant measure: Mean, Range, Standard Deviation, Variance
Define Range and explain its problem
Max-Min
- not a resistant measure
- ignores all values in the data set except the max and min
Define Standard Deviation
How far, on average, the values of the distribution are from the mean (average distance data is from the mean)
The equation of S.D. for parameters and statistics are different. MEMORIZE or know where to find it from the equation sheet