Information Management and BI 3(c)(i) Flashcards
What is “Data Mining”?
- Examine large data sets to learn something unknown from data itself
- Extract patterns from data, model and knowledge discovery, match transactions against criteria
- Data mining begins w/ understanding of data set as a whole
- Ex: Banks use to check effectiveness of loan and credit card application decisions
- Customer relationship mgmt is frequently the goal of data mining system
- Ex: Patterns of fraud, modeling/discovery of fraud, or ID transactions that are evidence if fraud committed
Name 5 Data Mining Methodologies:
(1) Memory-Based Reasoning (MBR)
(2) Cluster detection
(3) Decision tree algorithms
(4) Market-based analysis
(5) Link analysis
What is the Memory-based Reasoning (MBR) methodology?
- A Data Mining model
- Works well for matching and fraud detection
- Application of MBR is to assign new observation a pre-classified example (past transactions where results are known and can be accurately classified)
- A distance metric is then used to classify new observations – ID highest number of matching fields to pre-classified examples to predict outcome of new observation
What is the Cluster methodology?
- A Data Mining model
- Based on classical statistical clustering algorithms
- Avg characteristics of preclassified examples of same outcome used as measures for new observation
- Accumulated distance of attributes from new observation to body of each outcome’s attributes provides for prediction of outcome of new observation
- Attributes values usually statistically normalized (0 to 1 values) for effectiveness
- Useful for predictions (timely loan repayment)
What are Decision Tree Algorithms?
- A Data Mining model
- Developed to auto generate set of business process rules
- Most differentiating attribute of pre-classified examples is used to build decision rule
- If 1st “branch” of decision tree not high enough in prediction power, next branch examined
- Ex: Banking loan decision
What is “Market-Based Analysis”?
- A Data Mining model (least structured form)
- Involves “shopping basket analysis” retail outlets and food industry
- Intent to ID products that tend to be purchased together
What is “Link Analysis Methodology”?
- A Data Mining model
- Maps relationships among data and useful for situations like fraud detection
- Sometimes applied in insurance industry to ID fraudulent claims
- Tools like Analyst’s Notebook, Netmap, and Watson construct links to various objects to ID associations that might go unnoticed
- Latest generation
of link analysis tools provides not only graphical images of links but also some interpretation
of links.
What are 3 key functions to a “Data Analysis and Reporting Database (DARB)?
EDQ
(1) Extract data from DARB to use for analysis or reporting
- Result similar to data mart except it is user-defined, ad hoc, and on demand
(2) Data mining
(3) Querying
What is “Data Analysis”?
- Process of inspecting data w/ some goal or benchmark in mind
- Ex: Goal or benchmark is to determine whether or not it is evidence of fraudulent transactions.
Data mining and analysis is capable of examining what kind of data w/ efficiency and effectiveness?
- Data mining and analysis are capable of examining population of data, not just a sample, w/ efficiency and effectiveness
What are 2 broad types of “Data Mining Application”?
(1) Hypothesis testing
2) Knowledge discovery (pattern discovery
What is SQL?
- Structured Query Language
- Ability to filter data into meaningful info