Organizing, Visualizing Data Flashcards
Numerical Data (quantitative data)
▪️ Continuous data: can take on any numerical value in a specified range of values.
▪️ Discrete data: The data are limited to a finite number of values.
Categorical Data (qualitative data)
▪️ Nominal data: Categorical values that are not amenable to being organized in a logical order.
▪️ Ordinal data: Categorical values that can be logically ordered or ranked.
Structured Data
are highly organized in a pre-defined manner, usually with repeating patterns.
For example:
▪️ Daily closing stock price
▪️ EPS, P/E, dividend yield, ROE
Unstructured Data
are data that do not follow any conventionally organized forms
For example:
▪️ Text, social media post
▪️ Corporate regulatory filings
One-dimensional array
is the simplest format for representing a collection of data of the same data type, so it is suitable for representing a single variable.
two-dimensional rectangular array
also called a “data table”, Similar to the structure in an Excel spreadsheet
Tree-map
It consists of a set of colored rectangles to represent distinct groups, and the area of each rectangle is proportional to the value of the corresponding group.
Heat-map
A type of graphic that organizes and summarizes data in a tabular format and represents them using a color spectrum. Besides their use in displaying frequency distribution and relationship.
Trimmed mean
computed by excluding a stated small percentage of the lowest and highest values.
Winsorized mean
computed by a stated percentage of the lowest values equal to one specified low value, and a stated percentage of the highest values equal to one specified high value.
Mode
▪️ unimodal
▪️ bimodal
▪️ trimodal
when such data are grouped into bins, however, we often find an interval with the highest frequency (modal interval).
Percentile
▪️ quartiles: 4
▪️ deciles: 10
▪️ quintile: 5
▪️ percentile: 100
L = (n+1) y/100
where L: location
Interquartile Range
IQR = Q3 - Q1
Upper / Lower fence of the Box plot
▪️ Upper fence = (1.5 x IQR range) + Q3 upper bound
▪️ Lower fence = -(1.5 x IQR range) + Q2 lower bound
Coefficient of variation
CV = s / X