Organizing, Describing and Visualizing Data Flashcards by Nicolas Arguin-Malo

Values that can be counted or measured are called _____ data.

Numerical (or Quantitative)

How well did you know this?

Not at all

Perfectly

Discrete and Continuous data are types of _____ data.

Numerical (or Quantitative)

How well did you know this?

Not at all

Perfectly

Data that is countable, such as the months, days, or hours in a year is called _____ data.

Discrete

How well did you know this?

Not at all

Perfectly

Data that can take any fractional value is called _____ data.

Continuous

How well did you know this?

Not at all

Perfectly

Data that consist of labels that can be used to classify a set of data into groups is called _____ data.

Categorical (or Qualitative)

How well did you know this?

Not at all

Perfectly

Nominal and ordinal data are type of _____ data.

Categorical (or Qualitative)

How well did you know this?

Not at all

Perfectly

Labels that cannot be placed in order logically is called _____ data.

Nominal

How well did you know this?

Not at all

Perfectly

Data that can be ranked in a logical order is called _____ data.

Ordinal

How well did you know this?

Not at all

Perfectly

A a set of observations taken periodically, most often at equal intervals over time is called _____.

Time series

How well did you know this?

Not at all

Perfectly

A set of comparable observations all taken at one specific point in time is called _____.

Cross-sectional data

How well did you know this?

Not at all

Perfectly

The combination of time series and cross-sectional data, often presented in tables is called _____.

Panel data

How well did you know this?

Not at all

Perfectly

Time series, cross-sectional, and panel data, organized in a defined way, are examples of _____ data.

Structured data (ex: market data, fundamental data, etc.)

How well did you know this?

Not at all

Perfectly

Information that is presented in a form with no defined structure is refered to as _____ data.

Unstructured data (ex: management commentaries, must be transformed into structured data to be analyzed)

How well did you know this?

Not at all

Perfectly

A time series is an example of a _____ array.

One-dimensional

How well did you know this?

Not at all

Perfectly

For any frequency distribution, the interval with the greatest frequency is referred to as the _____ interval.

Modal

How well did you know this?

Not at all

Perfectly

The _____ frequency is the percentage of total observations falling within each interval.

Relative

How well did you know this?

Not at all

Perfectly

The _____ frequency is the number of observations falling within an interval.

Absolute

How well did you know this?

Not at all

Perfectly

A _____ is a two-dimensional array with which we can analyze two variables at the same time.

Contingency table (ex: Accidents by intersection and day of week)

How well did you know this?

Not at all

Perfectly

One kind of contingency table is a 2-by-2 array called a _____.

Confusion matrix

How well did you know this?

Not at all

Perfectly

To analyze three variables at the same time, an analyst can create a _____.

Scatter plot matrix

How well did you know this?

Not at all

Perfectly

The most effective chart types for visualizing RELATIONSHIPS are _____.

Scatter plots, scatter plot matrices, and heat maps

How well did you know this?

Not at all

Perfectly

The most effective chart types for COMPARING CATEGORIES are _____.

Study These Flashcards

Bar charts, tree maps, and heat maps

The most effective chart types for COMPARING OVER TIME are _____.

Study These Flashcards

Line charts, dual-scale line charts, and bubble line charts

The most effective chart types for visualizing DISTRIBUTIONS of NUMERICAL DATA are _____.

Study These Flashcards

Histograms, frequency polygons, and cumulative distribution charts

The most effective chart types for visualizing DISTRIBUTIONS of CATEGORICAL DATA are _____.

Bar charts, tree maps, and heat maps

The mean that excludes a stated percentage of the most extreme observations (ex: discard the lowest 0.5% and the highest 0.5% of the observations) is called the _____ mean.

Trimmed

The mean that substitute a value for the highest and lowest observations is called the _____ mean.

Windsorized

The trimmed and winzorized means are used to control for _____.

Outliers

The midpoint of a data set when the data is arranged in ascending or descending order is called the _____.

Median

The value that occurs most frequently in a data set is called the _____.

Mode

The mean to use for estimating the next observation, expected value of a distribution is the _____ mean.

Arithmetic

The mean to find the compound rate of returns over multiple periods is the _____ mean.

Geometric

The mean to use for estimating the mean without the effects of a given percentage of outliers is the _____ mean.

Trimmed

The mean to use for estimating the mean while decreasing the effects of a given percentage of outliers is the _____ mean.

Winzorized

The mean to use to calculate the average share cost from periodic purchases in a fixed dollar amount is the _____ mean.

Harmonic

The difference between the third quartile and the first quartile (25th percentile) is known as the _____.

Interquartile range

To visualize a data set based on quantiles, we can create a _____ plot.

Box and whisker

The _____ is the distance between the largest and the smallest value in the data set.

Range

The average of the absolute values of the deviations of individual observations from the arithmetic mean divided by the sample size is called the _____.

Mean absolute deviation (MAD)

The coefficient of variation (CV) is computed as the _____ of X divided by the _____ of X.

Standard deviation, Average value

One measure of downside risk that involves choosing a target value against which to measure each outcome and only include deviations from the target value is called _____.

Target downside deviation (or Target semideviation)

_____ refers to the extent to which a distribution is not symmetrical.

Skewness (or Skew)

For a _____ distribution, the mean, median, and mode are equal.

Symmetrical

For a positively skewed, unimodal distribution, the _____is less than the _____, which is less than the _____.

Mode, median, mean

Among median, mean, and mode, the _____ is the most affected by skewness.

Mean

_____ is a measure of the degree to which a distribution is more or less peaked than a normal distribution.

Kurtosis

LEPTOKURTIC describes a distribution that is _____ peaked than a normal distribution, whereas PLATYKURTIC refers to a distribution that is _____ peaked than a normal distribution. (Fill the blanks with ''more'' or ''less'')

More, less

A LEPTOKURTIC return distribution will have _____ returns clustered around the mean and _____ returns with large deviations from the mean. (Fill the blanks with ''more'' or ''less'')

More, more

A distribution is said to exhibit _____ if it has either more or less kurtosis than the normal distribution.

Excess kurtosis

Excess kurtosis = Sample kurtosis − X Find ''X''.

_____ is a measure of HOW two variables move together.

Covariance

_____ measures the STRENGHT of the linear relationship between two random variables.

Correlation

_____ correlation refers to correlation that is either the result of chance or present due to changes in both variables over time that is caused by their association with a third variable.

Spurious

Organizing, Describing and Visualizing Data Flashcards

(53 cards)