Data Exploration Flashcards
1
Q
What is EDA?
A
EDA: Exploratory Data Analysis
Descriptive Statistics
Graphical
Data-driven
2
Q
What is CDA?
A
CDA: Confirmatory Data Analysis
Inferential statistics
EDA and theory-driven
3
Q
How to describe data?
A
Describe data
- Case : A single object with several variables be measured E.g. A person, an email
- Variable: A property expressed as number or category
4
Q
Types of data
A
Types of data
- Qualitative (categorical)
- Nominal scales (the number is just a symbol that identifies a quality) - Ordinal — rank order
- Quantitative (continuous and discrete)
- Interval (unites are of identical size — e.g. years) - Ratio (distance from an absolute zero — e.g. age)
5
Q
Variables measures
A
Variables Measure of tendency Mean (average value) Median (middle value) Model (most frequent value)
Measure of variability
Variance (spread around the mean)
Standard deviation
Standard error of the mean (estimate)
6
Q
Outliers and errors: what are and how to fix them
A
- Mistake collection phase (typo)?
- Actually data from outside of targeted population?
- Multiple distributions?
- Simple chance?
- Complications
What to do?
- If you find a mistake: fix or delete
- If you find an outlier: trim winsome or delete
- If you distribution is skewed: transform data
7
Q
Interpret Standard Deviation (SD)-
A
Interpreting standard deviation (SD)
- SD will let you know about the distribution of the scores around the mean
- High SD (relative to the mean) indicate the scores are spread out
- Low SD tell you that most scores are very near to the mean