CFAI BBE - Organizing, Visualizing, and Describing Data Flashcards
Identifying Data Types (I)
Identify the data type for each of the following kinds of investment-related
information:
- Number of coupon payments for a corporate bond. As background, a
corporate bond is a contractual obligation between an issuing corporation
(i.e., borrower) and bondholders (i.e., lenders) in which the issuer agrees
to pay interest—in the form of fixed coupon payments—on specified
dates, typically semi-annually, over the life of the bond (i.e., to its maturity
date) and to repay principal (i.e., the amount borrowed) at maturity.
Solution to 1
Number of coupon payments are discrete data. For example, a newly-issued
5-year corporate bond paying interest semi-annually (quarterly) will make 10
(20) coupon payments during its life. In this case, coupon payments are limited
to a finite number of values; so, they are discrete.
Identifying Data Types (I)
Identify the data type for each of the following kinds of investment-related
information:
- Cash dividends per share paid by a public company. Note that cash
dividends are a distribution paid to shareholders based on the number of
shares owned.
Solution to 2
Cash dividends per share are continuous data since they can take on any non-negative values.
Identifying Data Types (I)
Identify the data type for each of the following kinds of investment-related
information:
- Credit ratings for corporate bond issues. As background, credit ratings
gauge the bond issuer’s ability to meet the promised payments on the
bond. Bond rating agencies typically assign bond issues to discrete categories
that are in descending order of credit quality (i.e., increasing probability
of non-payment or default).
Solution to 3
Credit ratings are ordinal data. A rating places a bond issue in a category, and
the categories are ordered with respect to the expected probability of default.
But arithmetic operations cannot be done on credit ratings, and the difference in
the expected probability of default between categories of highly rated bonds, for
example, is not necessarily equal to that between categories of lowly rated bonds.
Identifying Data Types (I)
Identify the data type for each of the following kinds of investment-related
information:
- Hedge fund classification types. Note that hedge funds are investment
vehicles that are relatively unconstrained in their use of debt, derivatives,
and long and short investment strategies. Hedge fund classification types
group hedge funds by the kind of investment strategy they pursue.
Solution to 4
Hedge fund classification types are nominal data. Each type groups together
hedge funds with similar investment strategies. In contrast to credit ratings for
bonds, however, hedge fund classification schemes do not involve a ranking.
Thus, such classification schemes are not ordinal data.
Identifying Data Types (II)
Which of the following is most likely to be structured data?
A Social media posts where consumers are commenting on what they
think of a company’s new product.
B Daily closing prices during the past month for all companies listed on
Japan’s Nikkei 225 stock index.
C Audio and video of a CFO explaining her company’s latest earnings
announcement to securities analysts.
B is correct as daily closing prices constitute structured data. A is incorrect as
social media posts are unstructured data. C is incorrect as audio and video are
unstructured data
Identifying Data Types (II)
Which of the following statements describing panel data is most accurate?
A It is a sequence of observations for a single observational unit of a
specific variable collected over time at discrete and equally spaced
intervals.
B It is a list of observations of a specific variable from multiple observational
units at a given point in time.
C It is a mix of time-series and cross-sectional data that are frequently
used in financial analysis and modeling.
C is correct as it most accurately describes panel data. A is incorrect as it
describes time-series data. B is incorrect as it describes cross-sectional
data.
Identifying Data Types (II)
Which of the following data series is least likely to be sortable by values?
A Daily trading volumes for stocks listed on the Shanghai Stock
Exchange.
B EPS for a given year for technology companies included in the S&P
500 Index.
C Dates of first default on bond payments for a group of bankrupt
European manufacturing companies.
C is correct as dates are ordinal data that can be sorted by chronological order but
not by value. A and B are incorrect as both daily trading volumes and earnings
per share (EPS) are numerical data, so they can be sorted by values.
Identifying Data Types (II)
Which of the following best describes a time series?
A Daily stock prices of the XYZ stock over a 60-month period.
B Returns on four-star rated Morningstar investment funds at the end of
the most recent month.
C Stock prices for all stocks in the FTSE100 on 31 December of the most
recent calendar year.
A is correct since a time series is a sequence of observations of a specific variable
(XYZ stock price) collected over time (60 months) and at discrete intervals of
time (daily). B and C are both incorrect as they are cross-sectional
data.
Evaluating Data Visuals
You have a cumulative absolute frequency distribution graph (similar to
the one in Exhibit 21) of daily returns over a five-year period for an index
of Asian equity markets.
Interpret the meaning of the slope of such a graph.
The slope of the graph of a cumulative absolute frequency distribution reflects
the change in the number of observations between two adjacent return bins. A
steep (flat) slope indicates a large (small) change in the frequency of observations
between adjacent return bins.
Evaluating Data Visuals
You are creating a word cloud for a visual representation of text on a
company’s quarterly earnings announcements over the past three years.
The word cloud uses font size to indicate word frequency. This particular
company has experienced both quarterly profits and losses during the
period under investigation.
Describe how the word cloud might be used to convey information
besides word frequency.
Color can add an additional dimension to the information conveyed in the word
cloud. For example, red can be used for “losses” and other words conveying negative
sentiment, and green can be used for “profit” and other words indicative
of positive sentiment.
Evaluating Data Visuals
You are examining a scatter plot of monthly stock returns, similar to the
one in Exhibit 30, for two technology companies: one is a hardware manufacturer,
and the other is a software developer. The scatter plot shows a
strong positive association among their returns.
Describe what other information the scatter plot can provide.
Besides the sign and degree of association of the stocks’ returns, the scatter
plot can provide a visual representation of whether the association is linear or
non-linear, the maximum and minimum values for the return observations, and
an indication of which observations may have extreme values (i.e., are potential
outliers).
Evaluating Data Visual
You are reading a vertical bar chart displaying the sales of a company over
the past five years. The sales of the first four years seem nearly flat as the
corresponding bars are nearly the same height, but the bar representing
the sales of the most recent year is approximately three times as high as
the other bars.
Explain whether we can conclude that the sales of the fifth year tripled
compared to sales in the earlier years.
Typically, the heights of bars in a vertical bar chart are proportional to the values
that they represent. However, if the graph is using a truncated y-axis (i.e., one
that does not start at zero), then values are not accurately represented by the
height of bars. Therefore, we need to examine the y-axis of the bar chart before
concluding that sales in the fifth year were triple the sales of the prior years.
Selecting Visualization Types
A portfolio manager plans to buy several stocks traded on a small emerging
market exchange but is concerned whether the market can provide
sufficient liquidity to support her purchase order size. As the first step,
she wants to analyze the daily trading volumes of one of these stocks over
the past five years.
Explain which type of chart can best provide a quick view of trading volume
for the given period.
The five-year history of daily trading volumes contains a large amount of
numerical data. Therefore, a histogram is the best chart for grouping these data
into frequency distribution bins and for showing a quick snapshot of the shape,
center, and spread of the data’s distribution.
Selecting Visualization Types
An analyst is building a model to predict stock market downturns.
According to the academic literature and his practitioner knowledge and
expertise, he has selected 10 variables as potential predictors. Before
continuing to construct the model, the analyst would like to get a sense
of how closely these variables are associated with the broad stock market
index and whether any pair of variables are associated with each other.
Describe the most appropriate visual to select for this purpose.
To inspect for a potential relationship between two variables, a scatter plot is
a good choice. But with 10 variables, plotting individual scatter plots is not an
efficient approach. Instead, utilizing a scatter plot matrix would give the analyst
a good overview in one comprehensive visual of all the pairwise associations
between the variables.
Selecting Visualization Types
Central Bank members meet regularly to assess the economy and decide
on any interest rate changes. Minutes of their meetings are published on
the Central Bank’s website. A quantitative researcher wants to analyze the
meeting minutes for use in building a model to predict future economic
growth.
Explain which type of chart is most appropriate for creating an overview
of the meeting minutes.
Since the meeting minutes consist of textual data, a word cloud would be the
most suitable tool to visualize the textual data and facilitate the researcher’s
understanding of the topic of the text as well as the sentiment, positive or negative,
it may convey.