Organizing, Describing and Visualizing Data Flashcards
Values that can be counted or measured are called _____ data.
Numerical (or Quantitative)
Discrete and Continuous data are types of _____ data.
Numerical (or Quantitative)
Data that is countable, such as the months, days, or hours in a year is called _____ data.
Discrete
Data that can take any fractional value is called _____ data.
Continuous
Data that consist of labels that can be used to classify a set of data into groups is called _____ data.
Categorical (or Qualitative)
Nominal and ordinal data are type of _____ data.
Categorical (or Qualitative)
Labels that cannot be placed in order logically is called _____ data.
Nominal
Data that can be ranked in a logical order is called _____ data.
Ordinal
A a set of observations taken periodically, most often at equal intervals over time is called _____.
Time series
A set of comparable observations all taken at one specific point in time is called _____.
Cross-sectional data
The combination of time series and cross-sectional data, often presented in tables is called _____.
Panel data
Time series, cross-sectional, and panel data, organized in a defined way, are examples of _____ data.
Structured data (ex: market data, fundamental data, etc.)
Information that is presented in a form with no defined structure is refered to as _____ data.
Unstructured data (ex: management commentaries, must be transformed into structured data to be analyzed)
A time series is an example of a _____ array.
One-dimensional
For any frequency distribution, the interval with the greatest frequency is referred to as the _____ interval.
Modal
The _____ frequency is the percentage of total observations falling within each interval.
Relative
The _____ frequency is the number of observations falling within an interval.
Absolute
A _____ is a two-dimensional array with which we can analyze two variables at the same time.
Contingency table (ex: Accidents by intersection and day of week)
One kind of contingency table is a 2-by-2 array called a _____.
Confusion matrix
To analyze three variables at the same time, an analyst can create a _____.
Scatter plot matrix
The most effective chart types for visualizing RELATIONSHIPS are _____.
Scatter plots, scatter plot matrices, and heat maps
The most effective chart types for COMPARING CATEGORIES are _____.
Bar charts, tree maps, and heat maps
The most effective chart types for COMPARING OVER TIME are _____.
Line charts, dual-scale line charts, and bubble line charts
The most effective chart types for visualizing DISTRIBUTIONS of NUMERICAL DATA are _____.
Histograms, frequency polygons, and cumulative distribution charts
The most effective chart types for visualizing DISTRIBUTIONS of CATEGORICAL DATA are _____.
Bar charts, tree maps, and heat maps
The mean that excludes a stated percentage of the most extreme observations (ex: discard the lowest 0.5% and the highest 0.5% of the observations) is called the _____ mean.
Trimmed
The mean that substitute a value for the highest and lowest observations is called the _____ mean.
Windsorized
The trimmed and winzorized means are used to control for _____.
Outliers
The midpoint of a data set when the data is arranged in ascending or descending order is called the _____.
Median
The value that occurs most frequently in a data set is called the _____.
Mode
The mean to use for estimating the next observation, expected value of a distribution is the _____ mean.
Arithmetic
The mean to find the compound rate of returns over multiple periods is the _____ mean.
Geometric
The mean to use for estimating the mean without the effects of a given percentage of outliers is the _____ mean.
Trimmed
The mean to use for estimating the mean while decreasing the effects of a given percentage of outliers is the _____ mean.
Winzorized
The mean to use to calculate the average share cost from periodic purchases in a fixed dollar amount is the _____ mean.
Harmonic
The difference between the third quartile and the first quartile (25th percentile) is known as the _____.
Interquartile range
To visualize a data set based on quantiles, we can create a _____ plot.
Box and whisker
The _____ is the distance between the largest and the smallest value in the data set.
Range
The average of the absolute values of the deviations of individual observations from the arithmetic mean divided by the sample size is called the _____.
Mean absolute deviation (MAD)
The coefficient of variation (CV) is computed as the _____ of X divided by the _____ of X.
Standard deviation, Average value
One measure of downside risk that involves choosing a target value against which to measure each outcome and only include deviations from the target value is called _____.
Target downside deviation (or Target semideviation)
_____ refers to the extent to which a distribution is not symmetrical.
Skewness (or Skew)
For a _____ distribution, the mean, median, and mode are equal.
Symmetrical
For a positively skewed, unimodal distribution, the _____is less than the _____, which is less than the _____.
Mode, median, mean
Among median, mean, and mode, the _____ is the most affected by skewness.
Mean
_____ is a measure of the degree to which a distribution is more or less peaked than a normal distribution.
Kurtosis
LEPTOKURTIC describes a distribution that is _____ peaked than a normal distribution, whereas PLATYKURTIC refers to a distribution that is _____ peaked than a normal distribution.
(Fill the blanks with ‘‘more’’ or ‘‘less’’)
More, less
A LEPTOKURTIC return distribution will have _____ returns clustered around the mean and _____ returns with large deviations from the mean.
(Fill the blanks with ‘‘more’’ or ‘‘less’’)
More, more
A distribution is said to exhibit _____ if it has either more or less kurtosis than the normal distribution.
Excess kurtosis
Excess kurtosis = Sample kurtosis − X
Find ‘‘X’’.
3
_____ is a measure of HOW two variables move together.
Covariance
_____ measures the STRENGHT of the linear relationship between two random variables.
Correlation
_____ correlation refers to correlation that is either the result of chance or present due to changes in both variables over time that is caused by their association with a third variable.
Spurious