P1.F.4.2 Data Analytics - Data Mining Flashcards
Data Mining
P1.F.4.2 Data Analytics - Data Mining
- Techniques to extract information from relatively meaningless data.
- Overlap between business analytics and data analytics.
- Can find answers to questions that don’t exist yet.
Supervised vs. Unsupervised Data Mining
P1.F.4.2 Data Analytics - Data Mining
Supervised
1. Attempts to predict dependent variable
Example: forecasting, looking at loan history to predict charge-offs, customer characteristics and loan performance
Unsupervised
1. Looks for relationships among all variables
Example: clustering
Data Mining Challenges
P1.F.4.2 Data Analytics - Data Mining
- Data quality: poor quality aka dirty data
- Data coverage: holes in data
- Strain on system
- Need for judgement in understanding correlation vs. causation. Can’t demonstrate causation.
- Potential for skewed results: not actionable
- Need for overcoming biases
- Data governance issues: security of personally identifiable information
- Relies on a lot of new technology
Basics of SQL
P1.F.4.2 Data Analytics - Data Mining
- Used to access relational databases: arranged in tables
- Can be input via raw code or graphical interface
3 Retrieves information via query
Command: SELECT
Desired info: FROM
Location: WHERE
Data Mining - Iterative Process
P1.F.4.2 Data Analytics - Data Mining
- No promise of immediate results
- Meaningful results can be hidden
- Requires multiple variations of analysis to be repeated before yielding meaningful results.
- Must display data first to know
- Insight may appear partially
- Unsupervised exploration
Data Mining - Art & Science
P1.F.4.2 Data Analytics - Data Mining
Science
- Repeatable procedures
- Statistical principles are employed, based on math
- Generally accepted best practices
Art
- No clear right or wrong answer
- Success depends on judgement of analyst or team
- Creativity & intuition are applied
- Analyst is always questioning the veracity of the data, selection and coverage.
Data Mining Process
P1.F.4.2 Data Analytics - Data Mining
- Data pull: retrieves relevant information
- Data import and manipulation: Excel, pivot tables, etc.
- Analysis: looking at relationships among data
- Visualization: making sense of the relationships
- Iteration: getting closer to insight. Will need to go back to step 2 to get different insight.