Chapter 1 and 3.1 Flashcards
representing and describing data
Quantitative data
numerical, makes sense to find an average
Categorical data
categories or groups with labels, doesn’t make sense to find an average
Mean
measure of center, average, use for symmetrical data
Median
measure of center, middle value (n+1/2=middle position), use for skewed data and/or outliers
Range
measure of spread, maximum minus minimum
Interquartile range (IQR)
measure of spread, range of middle 50% of data, Q3-Q1
Standard deviation
measure of spread, mean distance from the mean
Outliers
Data values that are unusually low or high, Q3+1.5IQR or
Q1-1.5IQR
CSOCS
Context, shape (symmetrical/ skew/mode), outliers, center, spread, address all of these points when asked about the distribution of a data set
Explanatory variable
independent or x-variable, variable that when changed impacts the other
Response variable
dependent or y-variable, depends on the other-both change
Positive correlation
as x increases, y increases
Negative correlation
as x increases, y decreases
CDOFS
Context, direction (+/-), outliers, form (linear/nonlinear), strength (strong/moderate/weak), use to describe dot plots (variation of CSOCS)
Correlation coefficient (r)
r close to 0=weak correlation
r close to 1=strong correlation
positive r=positive correlation
negative r=negative correlation
Individuals
the objects/people described by a data set
Variables
characteristics of individuals
Distribution
values of a variable and their frequencies, pattern of variation
Displaying numerical data
Dot plot, stem plot, histogram
Displaying categorical data
Frequency table (counts), relative frequency table (%), pie charts, bar graphs
Misleading graphs may…
have a scale that doesn’t start at zero, use pictures with width instead of simple bars
Marginal distribution
distribution of a single variable for all individuals, totals/percentages of groups for one variable, do not address relationships between variables
Conditional distribution
the value of a variable given individuals have a certain value of another variable (*given), two sets (column and row variables), can be used to compare relationships between variables
Association
knowing the value of one variable helps to predict the value of the other variable