Modules 1-2 Flashcards
What should we identify before gathering and analyzing data?
The question we wish to answer
What type of graph is useful for examining a data set to reveal patterns and trends?
Histogram
In a histogram, what does the x-axis represent?
Bins corresponding to ranges of data
In a histogram, what does the y-axis indicate?
The frequency of observations falling into each bin
What is an outlier?
A value that falls far from the rest of the data
What should we do before deciding on an outlier?
Carefully investigate it
What can graphing two variables on a scatter plot reveal?
Relationships between two variables (two data sets)
What is a key point regarding correlation and causation?
Correlation does not imply causation
What should we be alert to when examining relationships between two data sets?
The possibility of hidden variables
What are descriptive statistics also known as?
Summary statistics
What three values describe the center of a data set?
- Mean
- Median
- Mode
How is the mean calculated?
Sum of all data points divided by the number of data points
What is the median?
The middle value of the data set
What does the mode represent in a data set?
The value that occurs most frequently
Can a data set have multiple modes?
Yes
What measures the spread of the data?
- Range
- Variance
- Standard deviation
How is the standard deviation calculated?
The square root of the variance
What is a conditional mean?
A conditional mean is the mean of a subset of the data that includes all values satisfying a certain condition.
What is a percentile?
A percentile is a value below which a certain percentage of observations fall. For example, 60% of the observations are less than or equal to the 60th percentile.
What is the median in terms of percentiles?
The median is by definition the 50th percentile of a data set.
What is the coefficient of variation?
The coefficient of variation measures the size of the standard deviation relative to the size of the mean.
What does the correlation coefficient measure?
The correlation coefficient quantifies the strength of a linear relationship between two variables.
What is the range of the correlation coefficient?
The value of the correlation coefficient ranges between -1 and +1.
What does a correlation coefficient near zero indicate?
A correlation coefficient near zero indicates a weak or nonexistent linear relationship.
What is a time series?
When one of the variables is time, the relationship is known as a time series.
What is cross-sectional data?
Cross-sectional data provide a snapshot of data across multiple groups at a given point in time.
What should you recall about Excel functions and analyses?
Familiarize yourself with all of the necessary steps, syntax, and arguments for the Excel functions covered in this course.
What does the AVERAGEIF function do?
The AVERAGEIF function returns the conditional mean, or average of the cells in a specified range that meet the given criteria.
What is criteria in the context of data ranges?
Criteria is the condition that is to be applied to the range.
What does [average range] refer to?
[average range] is the range of cells containing the data we wish to average.
What does the function PERCENTILE.INC(array, k) do?
Returns the k-th percentile of value in the specified array.
For example, if we want to know the 95 percentile for an array of data, k would be 0.95.
What is the syntax for calculating variance in a sample?
=VAR.S(number 1, [number 2], …)
What is the syntax for calculating standard deviation in a sample?
=STDEV.S(number 1, [number 2], …)
What does the function SQRT(number) calculate?
Calculates the square root of a number.
What is the syntax for counting values?
=COUNT(value 1, [value 2], …)
What does the function MIN(number 1, [number 2], …) return?
Returns the minimum value from the specified numbers.
What does the function MAX(number 1, [number 2], …) return?
Returns the maximum value from the specified numbers.
What is the syntax for summing values?
=SUM(number 1, [number 2], …)
What does the function CORREL(array 1, array 2) calculate?
Calculates the correlation coefficient between two arrays.