Vol. 1 LM2 Data Types Flashcards
Concept
are data that can be measured or counted quantities as a number
p. 61
numerical data
OR
quantitative data
Describe
numerical data
p. 61
are data that can be measured or counted quantities as a number
Concept
are data that can be measured and can take on any numerical value in a specified range of values
continuous data
Describe
continuous data
p. 61
are data that can be measured and can take on any numerical value in a specified range of values
Concept
are numerical values that result from a counting process.
p. 61
discrete data
Describe
discrete data
p. 61
are numerical values that result from a counting process.
Concept
are categorical values that are not amenable to being organized in a logical order
p. 61
nominal data
Describe
nominal data
p. 61
are categorical values that are not amenable to being organized in a logical order
Concept
are categorical values that can be logically ordered or ranked
p. 62
ordinal data
identify data type
Cash dividends per share paid by a public company. Note that cash divi- dends are a distribution paid to shareholders based on the number of shares owned.
p. 63
Cash dividends per share are continuous data since they can take on any non-negative values.
Identify data type
Credit ratings for corporate bond issues. As background, credit ratings gauge the bond issuer’s ability to meet the promised payments on the bond. Bond rating agencies typically assign bond issues to discrete categories
that are in descending order of credit quality (i.e., increasing probability of non-payment or default).
p. 63
credit ratings are ordinal data
Identify data type
Hedge fund classification types. Note that hedge funds are investment ve- hicles that are relatively unconstrained in their use of debt, derivatives, and long and short investment strategies. Hedge fund classification types group hedge funds by the kind of investment strategy they pursue.
p. 63
Hedge fund classification types are nominal data. Each type groups together hedge funds with similar investment strategies. In contrast to credit ratings for bonds, however, hedge fund classification schemes do not involve a ranking. Thus, such classification schemes are not ordinal data.
Another data classification standard is based on how data are collected, and it cate- gorizes data into three types
p. 63
- cross-sectional data
- time series data
- panel data
Concept
is a characteristic or quantity that can be measured, counted, or categorized and is subject to change.
p. 63
variable
Describe
variable
p. 63
is a characteristic or quantity that can be measured, counted, or categorized and is subject to change
Concept
are a sequence of observations for a single observational unit of a specific variable collected over time and at discrete and typically equally spaced intervals of time
p. 64
time-series data
Describe
time-series data
p. 64
are a sequence of observations for a single observational unit of a specific variable collected over time and at discrete and typically equally spaced intervals of time
Concept
are a list of observations a specific variable from multiple observational units
p. 64
cross-sectional data
Describe
cross-sectional data
p. 64
are a list of observations a specific variable from multiple observational units
Concept
- are a mix of time-series and cross-sectional data that are frequently used in financial analysis and modeling.
- These data consist of observations through time on one or more variables for multiple observational units
p. 64
panel data
Concept
the observational data in this data type are usually organized in a matrix format called a data table
p. 64
panel data
Concept
are highly organized in a pre-defined manner, usually with repeating patterns
p. 64
structured data
Describe
structured data
p. 64
are highly organized in a pre-defined manner, usually with repeating patterns
Concept
typical format of this type of data is a one-dimensional array or a two-dimensional table or matrix
p. 64
structured data
Concept
are data that do not follow any conventionally organized forms, such as financial news or company filings.
p. 65
unstructured data
Describe
unstructured data
p. 65
- are data that do not follow any conventionally organized forms
- DAGs are format for dealing with unstructured data
- JSONs for semi-structured data
- Which of the following is most likely to be structured data?
A. Social media posts where consumers are commenting on what they
think of a company’s new product.
B. Daily closing prices during the past month for all companies listed on Japan’s Nikkei 225 stock index.
C. Audio and video of a CFO explaining her company’s latest earnings announcement to securities analysts.
p. 67
B. Daily closing prices represent structured time-series data
Which of the following statements describing panel data is most accurate?
A. It is a sequence of observations for a single observational unit of a specific variable collected over time at discrete and equally spaced intervals.
B. It is a list of observations of a specific variable from multiple observational units at a given point in time.
C. It is a mix of time-series and cross-sectional data that are frequently used in financial analysis and modeling.
p. 67
C. it is a mix of time-series and cross-sectional data
Which of the following data series is least likely to be sortable by values?
A. Daily trading volumes for stocks listed on the Shanghai Stock
Exchange.
B. EPS for a given year for technology companies included in the S&P 500 Index.
C. Dates of first default on bond payments for a group of bankrupt European manufacturing companies.
C. dates are ordinal data that can be sorted by chronological order, but not by value
Which of the following best describes a time series?
A. Daily stock prices of the XYZ stock over a 60-month period.
B. Returns on four-star rated Morningstar investment funds at the end of the most recent month.
C. Stock prices for all stocks in the FTSE100 on 31 December of the most recent calendar year.
p. 67
A. a time series is a sequence of observations of a speicific variable collected over time (60 months)
Concept
data available in their original format, typically unusable by humans or computers
p. 67
raw data
Concept
the simplest format for representing a collection of data of the same data type, which is suitable for a single variable
p. 68
one-dimensional array
ex. vectors
Concept
summarizes central tendency and spread variation in the data’s distribution
p. 68
descriptive statistics
Describe
descriptive statistics
p. 68
summarizes central tendency and spread variation in the data’s distribution
Concept
is a tabular display of data constructed either by counting the observations of a variable by dinstict values or groups or by tallying the values
p. 71
frequency distribution
steps
Constructing a frequency distribution of a categorical variable
p. 71
- count the number of observations for each unique value of the variable
- construct a table listing each unique value and the corresponding counts, and then sort the records
Concept
the raw frequency that is the actual number of observations counted for each unique value
p. 71
absolute frequency
Describe
absolute frequency
the raw frequency that is the actual number of observations counted for each unique value
Concept
is calculated as the absolute frequency of each unique value of the variable divided by the total number of observations
relative frequency
Describe
relative frequency
is calculated as the absolute frequency of each unique value of the variable divided by the total number of observations
pitfalls
binning data and constructing intervals
p. 74
- if we use too few bins, we wil summarize too much and may lose pertinent characteristics
- if we use too many bins, we may not summarize enough, and potentially introduce noise into the data
Concept
adds up the absolute frequencies as we move from the first bin to the last bin
p. 74
cumulative absolute frequency
Describe
cumulative absolute frequency
p. 74
- adds up the absolute frequencies as we move from the first bin to the last bin
- for the last bin, the cumulative absolute frequency will equal the number of observations in the dataset
Concept
is a sequence of partial sums of the relative frequencies
p. 74
cumulative relative frequency
cumulative relative frequency
is a sequence of partial sums of the relative frequencies
Concept
is a tabular format that displays the frequency distributions of two or more categorical variables simulatneously and is used for finding patterns between the variables
p. 77
contingency table
Concept
a contingency table for two categorical variables
p. 77
two-way table
Concept
A contingency table having R levels of one variable in rows and C levels of the other variable in columns
p. 77
R x C table
name the data representation
p. 78
5 x 3 contingency table
Name the data type
p. 78
joint frequencies
Name the data type
p. 78
marginal frequencies
Name the table
p. 80
Confusion Matrix for Bond Default Prediction Model
Describe
chi-square test of independence
p. 80
- A way to test for a potential association between categorical variables
- the procedure involves constructing a contingency table
- the actual values and expected values are used to derive the chi-square test statistic
Concept
the actual values and expected values from a contingency table are used to derive this value
p. 80
chi-square test statistic
Describe how the contingency table is used to set up a test for independence between fund style and risk level.
p. 81