Module 1 Flashcards
Before gathering and analyzing data, we should always ______ the question we wish to answer.
identify
Graphs are very useful for examining a data set, as they often reveal ______ and ______ and help us detect outliers.
patterns; trends
One useful graph is a _______, also called a bar chart.
histogram
A histogram’s _____ represents _____ corresponding to ranges of data;
x-axis; bins
A histogram’s _______ indicates the _______of
observations falling into each bin.
y-axis; frequency
An ______ is a value that falls far from the rest of the data.
outlier
Graphing ___________ on a scatter plot can reveal relationships between ___________.
two variables; two variables (two data sets)
Although there may be a relationship between two variables, we cannot conclude that one variable
______ the other. This point is best summarized in the admonition, “_________ does not imply causation.”
“causes”; correlation
Be alert to the possibility of _________, which may be responsible for patterns we see when graphing or examining relationships between two data sets.
hidden variables
To summarize a data set numerically, we often use __________, also known as summary statistics.
descriptive statistics
Three values describe the center, or central tendency, of the data set:
- mean
- median
- mode
The _____ is equal to sum of all data points in the set divided by the number of data points.
mean
The ______ is the middle value of the data set.
median - half of the data set’s values lie below the median, and half lie above the median.
The _____ is the value that occurs most frequently in the data set.
mode - A data set may have multiple modes.
The ______, _______, and ________ measure the spread of the data.
range; variance; standard deviation
The standard deviation is equal to the _______of the variance.
square root
To compare variation in different data sets, we calculate the ___________.
coefficient of variation
The coefficient ofvariation measures the size of the ________ relative to the size of the ______.
standard deviation; mean
coefficient of variation = standard deviation/mean
A ________ is the mean of a subset of the data that includes all values satisfying a certain condition.
conditional mean
60% of the observations are _____ or ______ to the
60th percentile.
less than; equal
The median is by definition the _____ percentile of a data set.
50th
We can quantify the strength of a linear relationship between two variables by calculating the _______.
correlation coefficient
The value of the correlation coefficient ranges between __ and ___.
-1; +1
A correlation coefficient near ____ indicates a weak or nonexistent linear relationship.
zero