Data Mining Flashcards
what is the definition of data mining?
the process of discovering new non-trivial potentially useful patterns from large data sets
what is the goal of data mining?
the discovery of new patterns and relationships from data
transforming data into knowledge
why do you need data visualization
its difficult to see patterns in just a table of numbers
What are the storage catapity trends effects of data
decreasing costs for storage capacity means the amt of data availiable is massive. need to find the relationships within this mountain of data to improve a business
What are some sources of data
internal systems, external systems, data from social networking and user generate3d data, transactional data from company ops
What does a data warehouse do?
aggregates data from all other databases
What are some problems with Operational Data
some data isnt suitable for sophistocated data mining, values missing or inconsistent across diff records, data too corse(broad) or too fine (detailed), too much data!
What is the curse of dimensionality and how do you solve it?
it is the curswe of being too overwhelmed by the massive amount of data with lots of diff info (too many rows!) cure this by only paying attention to a couple of things at a time. RESTRICTING data to display in an Operational Data Dashboard of a Score Card
What are business intelligence systems and what do they do
use data created by other systems and provide reporting and analysis for decision making. Pulling data from across the business is key
What is the value of BI systems
help to analyze the data, look for patterns, use patterns to make business decisions, share this info with bus partners, manage inventory, designing mktg and ad strategies
What are the 4 diff types of BI systems and what are their defining characteristics
1) reporting system (gets data organized to be viewed)
2) Data Mining Systems (use of Statistics to find patterns and relationships
3) Knowledge Management Systems (forums to share knowledge ie Piazza)
4) Expert Systems (turn human knowledge into if/then decisions to make recommendations)
Details on REporting Systems
integrate data from multiple sources, sort/group/sum/avg/compare, format into reports, GIVE THE RIGHT INFO TO THE RIGHT USER AT THE RIGHT TIME
What are data marts used for
they answer one business problem
What are data cubes used for
they aggregate and summarize data along multiple vectors(location, time, product) to make for faster querying esp in drill down queries
What does ETL stand for and mean
stands for Extract, Transform, Load. THis is the proces that reporting systems do everynight