Data Mining Flashcards
what is the definition of data mining?
the process of discovering new non-trivial potentially useful patterns from large data sets
what is the goal of data mining?
the discovery of new patterns and relationships from data
transforming data into knowledge
why do you need data visualization
its difficult to see patterns in just a table of numbers
What are the storage catapity trends effects of data
decreasing costs for storage capacity means the amt of data availiable is massive. need to find the relationships within this mountain of data to improve a business
What are some sources of data
internal systems, external systems, data from social networking and user generate3d data, transactional data from company ops
What does a data warehouse do?
aggregates data from all other databases
What are some problems with Operational Data
some data isnt suitable for sophistocated data mining, values missing or inconsistent across diff records, data too corse(broad) or too fine (detailed), too much data!
What is the curse of dimensionality and how do you solve it?
it is the curswe of being too overwhelmed by the massive amount of data with lots of diff info (too many rows!) cure this by only paying attention to a couple of things at a time. RESTRICTING data to display in an Operational Data Dashboard of a Score Card
What are business intelligence systems and what do they do
use data created by other systems and provide reporting and analysis for decision making. Pulling data from across the business is key
What is the value of BI systems
help to analyze the data, look for patterns, use patterns to make business decisions, share this info with bus partners, manage inventory, designing mktg and ad strategies
What are the 4 diff types of BI systems and what are their defining characteristics
1) reporting system (gets data organized to be viewed)
2) Data Mining Systems (use of Statistics to find patterns and relationships
3) Knowledge Management Systems (forums to share knowledge ie Piazza)
4) Expert Systems (turn human knowledge into if/then decisions to make recommendations)
Details on REporting Systems
integrate data from multiple sources, sort/group/sum/avg/compare, format into reports, GIVE THE RIGHT INFO TO THE RIGHT USER AT THE RIGHT TIME
What are data marts used for
they answer one business problem
What are data cubes used for
they aggregate and summarize data along multiple vectors(location, time, product) to make for faster querying esp in drill down queries
What does ETL stand for and mean
stands for Extract, Transform, Load. THis is the proces that reporting systems do everynight
Flow of data in Reporting system
(Legacy, Operational, Transactional, Application, WEbservices systems)->ETL->(Data WARehouse, Cube, Mart)
Details on Data Mining Systems
Process data using statistical techniques (regression and decision tree analysis) also look for patterns and relationships to predict outcomes (Market Basket analysis[what ppl buy together], Predict donations as to hit target audience)
Details on Knowledge Management Systems
Create value from intellectual capital, collect and share human knowledge, foster innovation, increase organizational responsiveness (Piazza like)
Details on Expert Systems
encapsulate expert knowledge and put that into the new employees by producing if/then rules to improve desicion making in non-experts, example: the longevity game and web MD. These are interactive things that you put info into and get a response back
What are some examples of Pattern finding and data mining
ppl with bad credit scores have more wrecks, on thursday nights ppl buy a lot of diapers and beer
What is RFM analysis
REcency+Frequency+Money(spent pervisit). its a technique used to evaluate how valuable a customer is. The program divides customers up into 5 groups on each different area and ranks 1->5 with 1 being the most recent/most frequent/biggest spender
Who do you want to target with your marketing when looking at RFM analysis
look for somone with good (low numbers) Frequency and Money but with bad (hi) reccecy. You want to get that consistent big spender back in the store
What is a loss leader, cross selling and upselling
loss leader is a product oyu are willing to take a loss on b/c you want to sell other goods to that customer, cross selling is that b/c you bought this you would also like this, upselling i smoving customers to a more expensive version,
What is the tricky part about pricing?
low pricing might signal low confidence in product, but dont go too crazy
Walmart used predictive tech to know what ppl buy when hurricanes are coming in and empowers employees to do what they think is necessary
true that