Organizing Visualizing and Describing Data Flashcards

Question

Discrete data

Answer 1

Numerical values that result from a counting process; therefore, practically speaking, the data are limited to a finite number of values.

Answer 2

The variability of a population or sample of observations around the central tendency.

Answer 3

Risk of incurring returns below a specified value.

Answer 4

Degree of kurtosis (fatness of tails) relative to the kurtosis of the normal distribution.

Answer 5

Describes a distribution that has fatter tails than a normal distribution (also called leptokurtic).

Answer 6

A value at or below which a stated fraction of the data lies. Also called quantile.

Answer 7

A tabular display of data constructed either by counting the observations of a variable by distinct values or groups or by tallying the values of a numerical variable into a set of numerically ordered bins (also called a one-way table).

Answer 8

A graph of a frequency distribution obtained by drawing straight lines joining successive points representing the class frequencies.

Answer 9

A measure of central tendency computed by taking the nth root of the product of n non-negative values.

Answer 10

A bar chart for showing joint frequencies for two categorical variables (also known as a clustered bar chart).

Answer 11

A type of weighted mean computed as the reciprocal of the arithmetic average of the reciprocals.

Answer 12

A type of graphic that organizes and summarizes data in a tabular format and represents it using a color spectrum.

Answer 13

A chart that presents the distribution of numerical data by using the height of a bar or column to represent the absolute frequency of each bin or interval in the distribution.

Answer 14

The difference between the third and first quartiles of a dataset.

Answer 15

With reference to grouped data, a set of values within which an observation falls.

Answer 16

The entry in the cells of the contingency table that represent the joining of one variable from a row and the other variable from a column to count observations.

Answer 17

Describes a distribution that has fatter tails than a normal distribution (also called fat-tailed).

Answer 18

A type of graph used to visualize ordered observations. In technical analysis, a plot of price data, typically closing prices, with a line connecting the points.

Answer 19

The estimation of an unknown value on the basis of two known values that bracket it, using a straight line between the two known values.

Answer 20

The sums determined by adding joint frequencies across rows or across columns in a contingency table.

Answer 21

With reference to a sample, the mean of the absolute values of deviations from the sample mean.

Answer 22

A quantitative measure that specifies where data are centered.

Answer 23

Quantitative measures that describe the location or distribution of data. They include not only measures of central tendency but also other measures, such as percentiles.

Answer 24

The value of the middle item of a set of items that has been sorted into ascending or descending order (i.e., the 50th percentile).

Answer 25

Describes a distribution with kurtosis equal to that of the normal distribution, namely, kurtosis equal to three.

Answer 26

With reference to grouped data, the interval containing the greatest number of observations (i.e., highest frequency).

Answer 27

The most frequently occurring value in a distribution.

Answer 28

Categorical values that are not amenable to being organized in a logical order. An example of nominal data is the classification of publicly listed stocks into sectors.

Answer 29

Values that represent measured or counted quantities as a number. Also called quantitative data.

Answer 30

The value of a specific variable collected at a point in time or over a specified period of time.

Answer 31

The simplest format for representing a collection of data of the same data type.

Answer 32

Categorical values that can be logically ordered or ranked.

Answer 33

A mix of time-series and cross-sectional data that contains observations through time on characteristics of across multiple observational units.

Answer 34

Quantiles that divide a distribution into 100 equal parts that sum to 100.

Answer 35

Describes a distribution that has relatively less weight in the tails than the normal distribution (also called thin-tailed).

Answer 36

All members of a specified group.

Answer 37

Values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize (also called Categorical data).

Answer 38

A value at or below which a stated fraction of the data lies. Also referred to as a fractile.

Answer 39

Values that represent measured or counted quantities as a number. Also called Numerical data.

Answer 40

Quantiles that divide a distribution into four equal parts. 25%iles

Answer 41

Quantiles that divide a distribution into five equal parts. 20%iles

Answer 42

The difference between the maximum and minimum values in a dataset.

Answer 43

Data available in their original form as collected.

Answer 44

The amount of dispersion relative to a reference value or benchmark.

Answer 45

The absolute frequency of each unique value of the variable divided by the total number of observations of the variable.

Answer 46

A subset of a population.

Answer 47

A standardized measure of how two variables in a sample move together. It is the ratio of the sample covariance to the product of the two variables' standard deviations.

Answer 48

A measure of how two variables in a sample move together.

Answer 49

A sample measure of the degree of a distribution's kurtosis in excess of the normal distribution's kurtosis.

Answer 50

The sum of the sample observations divided by the sample size.

Answer 51

A sample measure of the degree of asymmetry of a distribution.

Answer 52

The positive square root of the sample variance.

Answer 53

A quantity computed from or used to describe a sample.

Answer 54

The sum of squared deviations around the mean divided by the degrees of freedom.

Answer 55

A tool for organizing scatter plots between pairs of variables, making it easy to inspect all pairwise relationships in one combined visual.

Answer 56

Not symmetrical.

Answer 57

Refers to: 1) correlation between two variables that reflects chance relationships in a particular dataset; 2) correlation induced by a calculation that mixes each of two variables with a third variable; and 3) correlation between two variables arising not from a direct relation between them but from their relation to a third variable.

Answer 58

An alternative form for presenting the frequency distribution of two categorical variables, where bars representing the sub-groups are placed on top of each other to form a single bar. Each sub-section is shown in a different color to represent the contribution of each sub-group, and the overall height of the stacked bar represents the marginal frequency for the category.

Answer 59

The positive square root of the variance; a measure of dispersion in the same units as the original data.

Answer 60

A summary measure of a sample of observations.

Answer 61

Data that are highly organized in a pre-defined manner, usually with repeating patterns.

Answer 62

A visual device for representing textual data, which consists of words extracted from a source of textual data. The size of each distinct word is proportional to the frequency with which it appears in the given text (also known as Word cloud).

Answer 63

A measure of downside risk, calculated as the square root of the average of the squared deviations of observations below the target (also called target downside deviation).

Answer 64

Describes a distribution that has relatively less weight in the tails than the normal distribution (also called platykurtic)1

Answer 65

A sequence of observations for a single observational unit of a specific variable collected over time and at discrete and typically equally spaced intervals of time (such as daily, weekly, monthly, annually, or quarterly).

Answer 66

Another graphical tool for displaying categorical data. It consists of a set of colored rectangles to represent distinct groups, and the area of each rectangle is proportional to the value of the corresponding group.

Answer 67

A mean computed after excluding a stated small percentage of the lowest and highest observations.

Answer 68

A distribution that has the three most frequently occurring values.

Answer 69

A popular form for organizing data for processing by computers or for presenting data visually. It is comprised of columns and rows to hold multiple variables and multiple observations, respectively (also called a data table).

Answer 70

A distribution with a single value that is most frequently occurring.

Answer 71

Data that do not follow any conventionally organized forms.

Answer 72

A characteristic or quantity that can be measured, counted, or categorized and that is subject to change (also called a field, an attribute, or a feature).

Answer 73

The expected value (the probability-weighted average) of squared deviations from a random variable's expected value.

Answer 74

The presentation of data in a pictorial or graphical format for the purpose of increasing understanding and for gaining insights into the data.

Answer 75

An average in which each observation is weighted by an index of its relative importance.

Answer 76

A mean computed after assigning a stated percentage of the lowest values equal to one specified low value and a stated percentage of the highest values equal to one specified high value.

Answer 77

A visual device for representing textual data, which consists of words extracted from a source of textual data. The size of each distinct word is proportional to the frequency with which it appears in the given text (also known as tag cloud).

Organizing Visualizing and Describing Data Flashcards

(101 cards)