P1.F.4.2 Data Analytics - Data Mining Flashcards

1
Q

Data Mining

P1.F.4.2 Data Analytics - Data Mining

A
  1. Techniques to extract information from relatively meaningless data.
  2. Overlap between business analytics and data analytics.
  3. Can find answers to questions that don’t exist yet.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised vs. Unsupervised Data Mining

P1.F.4.2 Data Analytics - Data Mining

A

Supervised
1. Attempts to predict dependent variable
Example: forecasting, looking at loan history to predict charge-offs, customer characteristics and loan performance

Unsupervised
1. Looks for relationships among all variables
Example: clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Mining Challenges

P1.F.4.2 Data Analytics - Data Mining

A
  1. Data quality: poor quality aka dirty data
  2. Data coverage: holes in data
  3. Strain on system
  4. Need for judgement in understanding correlation vs. causation. Can’t demonstrate causation.
  5. Potential for skewed results: not actionable
  6. Need for overcoming biases
  7. Data governance issues: security of personally identifiable information
  8. Relies on a lot of new technology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Basics of SQL

P1.F.4.2 Data Analytics - Data Mining

A
  1. Used to access relational databases: arranged in tables
  2. Can be input via raw code or graphical interface
    3 Retrieves information via query

Command: SELECT
Desired info: FROM
Location: WHERE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Mining - Iterative Process

P1.F.4.2 Data Analytics - Data Mining

A
  1. No promise of immediate results
  2. Meaningful results can be hidden
  3. Requires multiple variations of analysis to be repeated before yielding meaningful results.
  4. Must display data first to know
  5. Insight may appear partially
  6. Unsupervised exploration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Mining - Art & Science

P1.F.4.2 Data Analytics - Data Mining

A

Science

  1. Repeatable procedures
  2. Statistical principles are employed, based on math
  3. Generally accepted best practices

Art

  1. No clear right or wrong answer
  2. Success depends on judgement of analyst or team
  3. Creativity & intuition are applied
  4. Analyst is always questioning the veracity of the data, selection and coverage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Mining Process

P1.F.4.2 Data Analytics - Data Mining

A
  1. Data pull: retrieves relevant information
  2. Data import and manipulation: Excel, pivot tables, etc.
  3. Analysis: looking at relationships among data
  4. Visualization: making sense of the relationships
  5. Iteration: getting closer to insight. Will need to go back to step 2 to get different insight.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly