Organizing, Visualizing, and Describing Data Flashcards
Data (Definition)
a collection of numbers, characters, words, text that represent FACTS or INFORMATION but NOT KNOWLEDGE (but analysis and interpretation on the facts and information develops knowledge)
What are the two main types of Data?
Numerical (quantitative) and Categorical (qualitative)
What is the definition of Categorical data?
Values that describe a quality or characteristic (mutually exclusive labels or groups)
What are the two types of Categorical data types?
Nominal and Ordinal
What is the definition of nominal data?
No logical order (e.g. sectors of the economy)
What is the definition of ordinal data?
Has logical order or rank (note that there is no information in the distance between groups)
What is the definition of Numerical data?
data that is measured or counted quantities
What are the two types of Numerical data?
Integer/Discrete - limited to a finite number of values (number of people)
Ratio/Continous - can take on any value within a range
What does NOIR stand for and how does it related to data types?
Nominal
Ordinal
Integer/Discrete
Ratio/Continuous
Define variable
a particular quality or characteristic (Stock price, height)
Define observation
a value of a specific variable (GM $53.30 and Trish is 5’9”)
Define cross-sectional data
multiple observations of a particular variable (the stock price of 60 companies)
Define time series
multiple observations of a particular variable for the same observational unit overtime // one unit and multiple observations (GM’s stock price over the last 60 months)
Define panel data set
cross-sectional and time-series combined
Define structured data
Highly organized in a pre-defined manner (stock prices, returns, EPS)
Define unstrucuture data
no organized form (news, social media post, company filings, audio/video)
Define absolute frequency
the actual count of observations per value of the variable
Define relative frequency
Percentage of observations per value of the variable which is the absolute frequency divided by total N)
How to create non-overlapping bins
Sort data in ascending order
Find the range: max-min
Decided on the number of intervals (which is K)
Calculate the interval width by dividing the range by k (always round up)
Add the internal to the first value and so on
What is a Contingency table
it’s a table that summarizes data for 2 or more categorical variables (helps visually find patterns)
What does a histogram or frequency polygon show?
represents the distribution of numerical data (y-axis shows frequency and x-axis shows intervals/values)
What does a bar chart show?
Represent the frequency distribution of categorical data
What does a tree map show
a set of coloured rectangles to represent groups
What is a line chart used for?
Used to visualize ordered observations
Typically used for time series data
Facilitates showing changes and underlying trends