Lecture 7: Data Warehouses, Business Intelligence and Big Data Analytics Flashcards
What is a transaction processing system?
System that records data on fundamental operations occurring within the company
What is batch processing?
Data is stored in temporary storage and processed as a single unit at a specific time
What is online transaction processing?
Dta is processed immediately in real-time, current state of the system is always reflected
What is a ERP, CRM and SCM system?
Enterprise Resource Planning System: Integrates core functions of the company into homogenous system
Customer Relationship Management System: Integrates customer data to be used by various departments
Supply Chain Management System: Provides a holistic overview of value chain, including flow of raw materials
What are Operational Systems and Business Intelligence tools?
Operational systems: Represent the input side of databases, data warehouses and data marts
Business intelligence tools: More sophisticated analytics systems, represent the output side
What is online analytical processing?
- Transaction-level data stored in relational databases is aggregated and summarized
- Results of analysis are steroid in data cubes
- Data cubes structure results across multiple dimensions (Space, products, time)
- Running queries on data cubes enables substantially quicker response times than running them on original database
What is data mining?
- Data mining refers to the use of algorithms to identify hidden patterns in larger data sets
- Some basic types of patterns include: Associations, clusters and sequential relationships
What are association rules?
- Associations are certain attribute values that frequently occur together within a data set
- Association rule mining seeks to identify the most frequent affinities amongst items
- Support: is the fraction of transactions that contain a certain set of items X
- Confidence: is the fraction of transactions that contain Y among those transactions that contain X
What are the four Vs of Big Data?
- Volume
- Velocity
- Variety
- Veracity
What are neural networks?
They replicate the basic functionality of the human brain to support decision making by predicting future outcomes
What is hadoop?
Open-source software framework used for (distributed) storage and analysis of big data sets
What are the four primary advantages of hadoop?
- Flexibility - can handle any type of data from any source
- Scalability - Works on single low-end PC that can be scaled to combine hundreds of computers
- Cost effectiveness (open source)
- Fault tolerance (designed to avoid singe point failure, such as computer crashing)