Organizing Visualizing and Describing Data Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Absolute dispersion

A

The amount of variability present without comparison to any reference point or benchmark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Absolute frequency

A

The actual number of observations counted for each unique value of the variable (also called raw frequency).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Arithmetic mean

A

The sum of the observations divided by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bar chart

A

A chart for plotting the frequency distribution of categorical data, where each bar represents a distinct category and each bar’s height is proportional to the frequency of the corresponding category. In technical analysis, a bar chart that plots four bits of data for each time interval—the high, low, opening, and closing prices. A vertical line connects the high and low prices. A cross-hatch left indicates the opening price and a cross-hatch right indicates the closing price.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bimodal

A

A distribution that has two most frequently occurring values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Box and whisker plot

A

A graphic for visualizing the dispersion of data across quartiles. It consists of a “box” with “whiskers” connected to the box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bubble line chart

A

A line chart that uses varying-sized bubbles to represent a third dimension of the data. The bubbles are sometimes color-coded to present additional information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical data

A

Values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize (also called qualitative data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Chi-square test of independence

A

A statistical test for detecting a potential association between categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clustered bar chart

A

A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coefficient of variation

A

The ratio of a set of observations’ standard deviation to the observations’ mean value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Confusion matrix

A

A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Contingency table

A

A table of the frequency distribution of observations classified on the basis of two discrete variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Continuous data

A

Data that can be measured and can take on any numerical value in a specified range of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation

A

A measure of the linear relationship between two random variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cost averaging

A

The periodic investment of a fixed amount of money.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cross-sectional data

A

A list of the observations of a specific variable from multiple observational units at a given point in time. The observational units can be individuals, groups, companies, trading markets, regions, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Cumulative absolute frequency

A

Cumulates (i.e., adds up) in a frequency distribution the absolute frequencies as one moves from the first bin to the last bin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cumulative frequency distribution chart

A

A chart that plots either the cumulative absolute frequency or the cumulative relative frequency on the y-axis against the upper limit of the interval and allows one to see the number or the percentage of the observations that lie below a certain value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cumulative relative frequency

A

A sequence of partial sums of the relative frequencies in a frequency distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data

A

A collection of numbers, characters, words, and text—as well as images, audio, and video—in a raw or organized format to represent facts or information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data table

A

see two-dimensional rectangular array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Deciles

A

Quantiles that divide a distribution into 10 equal parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Descriptive statistics

A

The study of how data can be summarized effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Discrete data

A

Numerical values that result from a counting process; therefore, practically speaking, the data are limited to a finite number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Dispersion

A

The variability of a population or sample of observations around the central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Downside risk

A

Risk of incurring returns below a specified value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Excess kurtosis

A

Degree of kurtosis (fatness of tails) relative to the kurtosis of the normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Fat-Tailed

A

Describes a distribution that has fatter tails than a normal distribution (also called leptokurtic).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Fractile

A

A value at or below which a stated fraction of the data lies. Also called quantile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Frequency distribution

A

A tabular display of data constructed either by counting the observations of a variable by distinct values or groups or by tallying the values of a numerical variable into a set of numerically ordered bins (also called a one-way table).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Frequency polygon

A

A graph of a frequency distribution obtained by drawing straight lines joining successive points representing the class frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Geometric mean

A

A measure of central tendency computed by taking the nth root of the product of n non-negative values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Grouped bar chart

A

A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Harmonic mean

A

A type of weighted mean computed as the reciprocal of the arithmetic average of the reciprocals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Heat map

A

A type of graphic that organizes and summarizes data in a tabular format and represents it using a color spectrum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Histogram

A

A chart that presents the distribution of numerical data by using the height of a bar or column to represent the absolute frequency of each bin or interval in the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Interquartile range

A

The difference between the third and first quartiles of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Interval

A

With reference to grouped data, a set of values within which an observation falls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Joint frequencies

A

The entry in the cells of the contingency table that represent the joining of one variable from a row and the other variable from a column to count observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Leptokurtic

A

Describes a distribution that has fatter tails than a normal distribution (also called fat-tailed).

42
Q

Line chart

A

A type of graph used to visualize ordered observations. In technical analysis, a plot of price data, typically closing prices, with a line connecting the points.

43
Q

Linear interpolation

A

The estimation of an unknown value on the basis of two known values that bracket it, using a straight line between the two known values.

44
Q

Marginal frequencies

A

The sums determined by adding joint frequencies across rows or across columns in a contingency table.

45
Q

Mean absolute deviation

A

With reference to a sample, the mean of the absolute values of deviations from the sample mean.

46
Q

Measure of central tendency

A

A quantitative measure that specifies where data are centered.

47
Q

Measures of location

A

Quantitative measures that describe the location or distribution of data. They include not only measures of central tendency but also other measures, such as percentiles.

48
Q

Median

A

The value of the middle item of a set of items that has been sorted into ascending or descending order (i.e., the 50th percentile).

49
Q

Mesokurtic

A

Describes a distribution with kurtosis equal to that of the normal distribution, namely, kurtosis equal to three.

50
Q

Modal interval

A

With reference to grouped data, the interval containing the greatest number of observations (i.e., highest frequency).

51
Q

Mode

A

The most frequently occurring value in a distribution.

52
Q

Nominal data

A

Categorical values that are not amenable to being organized in a logical order. An example of nominal data is the classification of publicly listed stocks into sectors.

53
Q

Numerical data

A

Values that represent measured or counted quantities as a number. Also called quantitative data.

54
Q

Observation

A

The value of a specific variable collected at a point in time or over a specified period of time.

55
Q

One-dimensional array

A

The simplest format for representing a collection of data of the same data type.

56
Q

Ordinal data

A

Categorical values that can be logically ordered or ranked.

57
Q

Panel data

A

A mix of time-series and cross-sectional data that contains observations through time on characteristics of across multiple observational units.

58
Q

Percentiles

A

Quantiles that divide a distribution into 100 equal parts that sum to 100.

59
Q

Platykurtic

A

Describes a distribution that has relatively less weight in the tails than the normal distribution (also called thin-tailed).

60
Q

Population

A

All members of a specified group.

61
Q

Qualitative data

A

Values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize (also called Categorical data).

62
Q

Quantile

A

A value at or below which a stated fraction of the data lies. Also referred to as a fractile.

63
Q

Quantitative data

A

Values that represent measured or counted quantities as a number. Also called Numerical data.

64
Q

Quartiles

A

Quantiles that divide a distribution into four equal parts. 25%iles

65
Q

Quintiles

A

Quantiles that divide a distribution into five equal parts. 20%iles

66
Q

Range

A

The difference between the maximum and minimum values in a dataset.

67
Q

Raw data

A

Data available in their original form as collected.

68
Q

Relative dispersion

A

The amount of dispersion relative to a reference value or benchmark.

69
Q

Relative frequency

A

The absolute frequency of each unique value of the variable divided by the total number of observations of the variable.

70
Q

Sample

A

A subset of a population.

71
Q

Sample correlation coefficient

A

A standardized measure of how two variables in a sample move together. It is the ratio of the sample covariance to the product of the two variables’ standard deviations.

72
Q

Sample covariance

A

A measure of how two variables in a sample move together.

73
Q

Sample excess kurtosis

A

A sample measure of the degree of a distribution’s kurtosis in excess of the normal distribution’s kurtosis.

74
Q

Sample mean

A

The sum of the sample observations divided by the sample size.

75
Q

Sample skewness

A

A sample measure of the degree of asymmetry of a distribution.

76
Q

Sample standard deviation

A

The positive square root of the sample variance.

77
Q

Sample statistic

A

A quantity computed from or used to describe a sample.

78
Q

Sample variance

A

The sum of squared deviations around the mean divided by the degrees of freedom.

79
Q

Scatter plot matrix

A

A tool for organizing scatter plots between pairs of variables, making it easy to inspect all pairwise relationships in one combined visual.

80
Q

Skewed

A

Not symmetrical.

81
Q

Spurious correlation

A

Refers to: 1) correlation between two variables that reflects chance relationships in a particular dataset; 2) correlation induced by a calculation that mixes each of two variables with a third variable; and 3) correlation between two variables arising not from a direct relation between them but from their relation to a third variable.

82
Q

Stacked bar chart

A

An alternative form for presenting the frequency distribution of two categorical variables, where bars representing the sub-groups are placed on top of each other to form a single bar. Each sub-section is shown in a different color to represent the contribution of each sub-group, and the overall height of the stacked bar represents the marginal frequency for the category.

83
Q

Standard deviation

A

The positive square root of the variance; a measure of dispersion in the same units as the original data.

84
Q

Statistic

A

A summary measure of a sample of observations.

85
Q

Structured data

A

Data that are highly organized in a pre-defined manner, usually with repeating patterns.

86
Q

Tag cloud / Word Cloud

A

A visual device for representing textual data, which consists of words extracted from a source of textual data. The size of each distinct word is proportional to the frequency with which it appears in the given text (also known as Word cloud).

87
Q

Target semi-deviation/ Target downside deviation

A

A measure of downside risk, calculated as the square root of the average of the squared deviations of observations below the target (also called target downside deviation).

87
Q

Thin-Tailed

A

Describes a distribution that has relatively less weight in the tails than the normal distribution (also called platykurtic)1

88
Q

Time-series data

A

A sequence of observations for a single observational unit of a specific variable collected over time and at discrete and typically equally spaced intervals of time (such as daily, weekly, monthly, annually, or quarterly).

89
Q

Tree-Map

A

Another graphical tool for displaying categorical data. It consists of a set of colored rectangles to represent distinct groups, and the area of each rectangle is proportional to the value of the corresponding group.

90
Q

Trimmed mean

A

A mean computed after excluding a stated small percentage of the lowest and highest observations.

91
Q

Trimodal

A

A distribution that has the three most frequently occurring values.

92
Q

Two-dimensional rectangular array

A

A popular form for organizing data for processing by computers or for presenting data visually. It is comprised of columns and rows to hold multiple variables and multiple observations, respectively (also called a data table).

93
Q

Unimodal

A

A distribution with a single value that is most frequently occurring.

94
Q

Unstructured data

A

Data that do not follow any conventionally organized forms.

95
Q

Variable

A

A characteristic or quantity that can be measured, counted, or categorized and that is subject to change (also called a field, an attribute, or a feature).

96
Q

Variance

A

The expected value (the probability-weighted average) of squared deviations from a random variable’s expected value.

97
Q

Visualization

A

The presentation of data in a pictorial or graphical format for the purpose of increasing understanding and for gaining insights into the data.

98
Q

Weighted mean

A

An average in which each observation is weighted by an index of its relative importance.

99
Q

Winsorized mean

A

A mean computed after assigning a stated percentage of the lowest values equal to one specified low value and a stated percentage of the highest values equal to one specified high value.

100
Q

Word cloud

A

A visual device for representing textual data, which consists of words extracted from a source of textual data. The size of each distinct word is proportional to the frequency with which it appears in the given text (also known as tag cloud).