Week 1 & 2 Flashcards
Why do we need to know data analysis?
There is a problem that needs to be solved and we need data and analytics to properly act on it
What is a population?
All entities of interest in a study
What is a sample?
A subset or portion of the populations that is randomly chosen
What is a dataset?
Table of data containing variables in the column section (horizontal), and observations in the row sections (vertical)
What are some examples of variable?
height, gender, income
What are some data types?
Numeric vs categorical; Ordinal vs nominal
What is numeric?
Meaningful arithmetic that can be performed on
What is categorical
otherwise, non numeric (not numbers (?))
What is ordinal?
There is a natural ordering of categories
What is nominal?
No natural ordering
What is a binary decision?
0/1 - a categorical variable with n different categories (n-1) (?)
What is binning or discretizing
Categorizing a numeric variable into discrete (not specific)
What are some more data types?
Discrete vs continuous; Cross sectional vs time series
What is discrete?
Count data (e.g. # of children)
What is continunous?
Continuous measurement like weight
What is cross sectional?
Cross section of a population in a FIXED time
What is time series?
Data that are collected overtime
What is an outlier?
An observation that lies outside of the norm (doesn’t mean it’s wrong)
What is missing values?
Value of a variable is missing for observation
What to do with missing values?
Ignore, average value, or estimate
What to do with an outlier?
Run analysis and report with and without the outlier
What is the most useful numeric system measure?
Correlation
What is the most useful graph?
Scatter plot
What is the tool to compare numerical variables across two or more subpopulations?
Side-by-Side Boxplots
Tools to study relationships among numeric variables?
Scatterplot, correlation, and covariance
What is a scatterplot?
2D graph to plot pairs from 2 numerical variables often used to examine relationships (e.g. temperature and sales)
What is correlations and covariance?
Measuring the strength and direction of a LINEAR relationship between 2 numerical variables: X & Y
Note:
X&Y should be paired variables
Xi and Yi for observation i
n: Number of observations
What is a perfect positive correlation?
An upward trend scatterplot graph that almost formed a straight line (Value = 1)
What is a perfect negative correlation?
A DOWNWARD trend scatterplot graph that almost formed a straight line (Value = -1)
What is and what value is a NO CORRELATION?
A scatterplot that is spread out and has no line trend. Value = 0