Organizing Visualizing and Describing Data Flashcards
Absolute dispersion
The amount of variability present without comparison to any reference point or benchmark.
Absolute frequency
The actual number of observations counted for each unique value of the variable (also called raw frequency).
Arithmetic mean
The sum of the observations divided by the number of observations.
Bar chart
A chart for plotting the frequency distribution of categorical data, where each bar represents a distinct category and each bar’s height is proportional to the frequency of the corresponding category. In technical analysis, a bar chart that plots four bits of data for each time interval—the high, low, opening, and closing prices. A vertical line connects the high and low prices. A cross-hatch left indicates the opening price and a cross-hatch right indicates the closing price.
Bimodal
A distribution that has two most frequently occurring values.
Box and whisker plot
A graphic for visualizing the dispersion of data across quartiles. It consists of a “box” with “whiskers” connected to the box.
Bubble line chart
A line chart that uses varying-sized bubbles to represent a third dimension of the data. The bubbles are sometimes color-coded to present additional information.
Categorical data
Values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize (also called qualitative data).
Chi-square test of independence
A statistical test for detecting a potential association between categorical variables.
Clustered bar chart
A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).
Coefficient of variation
The ratio of a set of observations’ standard deviation to the observations’ mean value.
Confusion matrix
A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
Contingency table
A table of the frequency distribution of observations classified on the basis of two discrete variables.
Continuous data
Data that can be measured and can take on any numerical value in a specified range of values.
Correlation
A measure of the linear relationship between two random variables.
Cost averaging
The periodic investment of a fixed amount of money.
Cross-sectional data
A list of the observations of a specific variable from multiple observational units at a given point in time. The observational units can be individuals, groups, companies, trading markets, regions, etc.
Cumulative absolute frequency
Cumulates (i.e., adds up) in a frequency distribution the absolute frequencies as one moves from the first bin to the last bin.
Cumulative frequency distribution chart
A chart that plots either the cumulative absolute frequency or the cumulative relative frequency on the y-axis against the upper limit of the interval and allows one to see the number or the percentage of the observations that lie below a certain value.
Cumulative relative frequency
A sequence of partial sums of the relative frequencies in a frequency distribution.
Data
A collection of numbers, characters, words, and text—as well as images, audio, and video—in a raw or organized format to represent facts or information.
Data table
see two-dimensional rectangular array.
Deciles
Quantiles that divide a distribution into 10 equal parts.
Descriptive statistics
The study of how data can be summarized effectively.
Discrete data
Numerical values that result from a counting process; therefore, practically speaking, the data are limited to a finite number of values.
Dispersion
The variability of a population or sample of observations around the central tendency.
Downside risk
Risk of incurring returns below a specified value.
Excess kurtosis
Degree of kurtosis (fatness of tails) relative to the kurtosis of the normal distribution.
Fat-Tailed
Describes a distribution that has fatter tails than a normal distribution (also called leptokurtic).
Fractile
A value at or below which a stated fraction of the data lies. Also called quantile.
Frequency distribution
A tabular display of data constructed either by counting the observations of a variable by distinct values or groups or by tallying the values of a numerical variable into a set of numerically ordered bins (also called a one-way table).
Frequency polygon
A graph of a frequency distribution obtained by drawing straight lines joining successive points representing the class frequencies.
Geometric mean
A measure of central tendency computed by taking the nth root of the product of n non-negative values.
Grouped bar chart
A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).
Harmonic mean
A type of weighted mean computed as the reciprocal of the arithmetic average of the reciprocals.
Heat map
A type of graphic that organizes and summarizes data in a tabular format and represents it using a color spectrum.
Histogram
A chart that presents the distribution of numerical data by using the height of a bar or column to represent the absolute frequency of each bin or interval in the distribution.
Interquartile range
The difference between the third and first quartiles of a dataset.
Interval
With reference to grouped data, a set of values within which an observation falls.
Joint frequencies
The entry in the cells of the contingency table that represent the joining of one variable from a row and the other variable from a column to count observations.