Module 1 Flashcards
Before gathering and analyzing data, we should always ______ the question we wish to answer.
identify
Graphs are very useful for examining a data set, as they often reveal ______ and ______ and help us detect outliers.
patterns; trends
One useful graph is a _______, also called a bar chart.
histogram
A histogram’s _____ represents _____ corresponding to ranges of data;
x-axis; bins
A histogram’s _______ indicates the _______of
observations falling into each bin.
y-axis; frequency
An ______ is a value that falls far from the rest of the data.
outlier
Graphing ___________ on a scatter plot can reveal relationships between ___________.
two variables; two variables (two data sets)
Although there may be a relationship between two variables, we cannot conclude that one variable
______ the other. This point is best summarized in the admonition, “_________ does not imply causation.”
“causes”; correlation
Be alert to the possibility of _________, which may be responsible for patterns we see when graphing or examining relationships between two data sets.
hidden variables
To summarize a data set numerically, we often use __________, also known as summary statistics.
descriptive statistics
Three values describe the center, or central tendency, of the data set:
- mean
- median
- mode
The _____ is equal to sum of all data points in the set divided by the number of data points.
mean
The ______ is the middle value of the data set.
median - half of the data set’s values lie below the median, and half lie above the median.
The _____ is the value that occurs most frequently in the data set.
mode - A data set may have multiple modes.
The ______, _______, and ________ measure the spread of the data.
range; variance; standard deviation
The standard deviation is equal to the _______of the variance.
square root
To compare variation in different data sets, we calculate the ___________.
coefficient of variation
The coefficient ofvariation measures the size of the ________ relative to the size of the ______.
standard deviation; mean
coefficient of variation = standard deviation/mean
A ________ is the mean of a subset of the data that includes all values satisfying a certain condition.
conditional mean
60% of the observations are _____ or ______ to the
60th percentile.
less than; equal
The median is by definition the _____ percentile of a data set.
50th
We can quantify the strength of a linear relationship between two variables by calculating the _______.
correlation coefficient
The value of the correlation coefficient ranges between __ and ___.
-1; +1
A correlation coefficient near ____ indicates a weak or nonexistent linear relationship.
zero
A correlation coefficient near zero does not mean there is no relationship between the two variables; it indicates only that any relationship that does exist is _____.
not linear
When one of the variables is time, the relationship is known as a ______.
time series
_________ provide a snapshot of data across multiple groups at a given point in time.
Cross-sectional data
What excel function finds the average?
=AVERAGE(number 1,[number 2], …)
What excel function finds the median?
=MEDIAN(number 1, [number 2], …)
What excel function finds the mode?
=MODE.SNGL(number 1, [number 2], …)
What excel function returns the conditional mean, or average of the cells in a specified range that meet the given criteria?
=AVERAGEIF(range, criteria, [average_range])
- Range contains the one or more cells to which we wish to apply the criteria or condition.
- Criteria is the condition that is to be applied to the range.
- [average_range] is the range of cells containing the data we wish to average.
What excel function returns the k-th percentile of value in the specified array? For example, if we want to know the 95th percentile for an array of data, k would be ____.
=PERCENTILE.INC(array, k); 0.95
What excel function finds the variance?
=VAR.S(number 1, [number 2], …)
What excel function finds the standard deviation?
=STDEV.S(number 1, [number 2], …)
What excel function finds the square root?
=SQRT(number)
What excel function finds the number of data points?
=COUNT(value 1, [value 2], …)
What excel function finds the minimum number?
=MIN(number 1, [number 2], …)
What excel function finds the maximum number?
=MAX(number 1, [number 2], …)
What excel function finds the sum of all numbers?
=SUM(number 1, [number 2], …)
What does the S stand for in VAR.S and STDEV.S?
S=Sample
What excel function finds the correlation coefficient?
=CORREL(array 1, array 2)