Module 3 Flashcards
What are some PROs & CONs of KDD / Data Mining?
o When you use data mining you may not understand why the correlation is there
o Pros
• No need to have a hypothesis first
• Not dependent on single expert
• Can process a higher volume of information than any human being could, enables usage of a more comprehensive data set
• Enables machine aided predictions
• Can reduce data complexity prior human analysis
o Cons
• Depends heavily on the data set used
• Noise in the data set can throw one off
• Based on historical data
• If the future context changes, then performance can drop
• The underlying basic rule may never be discovered
• More complex to understand
• Issue of mistrust- often people don’t trust it, computer and scientist don’t really know it
What are the key objectives to your BI solutions?
- Make information available faster
- Reduce the intention span of the user
- Make information as accessible as possible
- Minimize the attention users have to invest
What are the primary functions of the BI Frontend?
- Hides some of the complexities- star schema
- Help users find what they are looking for
- Automated suggestions- What makes sense to analyze next
- Optimal visualization
- Were to investigate next
- Formatting and selling
For what could you use clustering?
o Customer Segmentation, How similar are transactions
o Customer Segmentation
o Behavioral Analysis
o Batch Failure Analysis
o Impact of Customer Incentives
o Patient recruitment optimization
o Often used as part of predictive analytics- Which clusters are successful
o How do you differentiate the BI frontend and the DWH
- Demarcation line is at the reporting layer and logical mapping.
- Logical Mapping- Translate the technical keys into a language business user will understand.
- DWH stores, aggregates, and brings to it together
- BI Frontend- Userface to end user and IT person, allow you to interact with the DWH and make something meaningful about it
o 3 Main types of BI Frontends
- Dashboards- Trigger action, mainly for executives and upper level MGMT. Use dashboards whenever you need to create a visually engaging experience. Like a traffic light.
- Discovery & Analysis- Deliver engaging information to users when they need it. Track key performance indicators
- Reporting- Share information for those who want to consume it.
o What would you consider when deciding your BI strategy? (Single vs. Multiple breed approach)
- Price
* All the ones listed above in the blue. Important tho!
What is KDD (Knowledge Discovery in Databases)
• Find patterns or correlations in data. It’s the non-trivial process of identifying valid, novel, and potentially useful patterns in data. Data Mining is a step in the KDD process
o How is the process of gaining insight different between data mining and a traditional hypothesis driven approach?
- Traditional approach- always start with a hypothesis
* Data Mining Approach- No hypothesis in the beginning
What does data mining do?
- Groups of data records- Cluster analysis
- Unusual Records- Anomaly Detection
- Dependencies- Association Rule Mining
- Input data and may be used in further analysis in machine learning and predictive analysis.
When would you use it instead of the traditional approach? Data mining
When would you use it instead of the traditional approach?
• You have statistically relevant data volumes
• Data likely to contain interesting correlations (cause and effect)
• Decent data quality
• Problem is too complex to formulate hypothesis that can be validated with a reasonable resource investment
• Cause and effect change fast (manual analysis becomes impractical)