Lecture 7 Flashcards
Transaction processing system (TPS)
System that records all transactions in an organisation (aka. fundamental operations), and saves them in a database.
What are the two main types of TPS?
- Batch Processing: puts transactions into temporary storage and then processes them all together (in a batch) at a specific time.
Benefit: More efficient process, as all transactions can be processed when computing resources are less busy
Disadvantage: The database does not reflect the current state of the business - Online transaction processing: all transactions are processed immediately in real time.
Advantage: current state of the business is always reflected in the database
What are enterprise systems?
They aim at consolidating the data that is collected and processed in various departments of the company.
These systems only provide interfaces, not the actual infrastructure (all about the front end).
What is an Enterprise Resource Planning System (ERP)?
Most famous type of enterprise system and it is at the core of the enterprise.
Integrates core functions of the company into a homogenous system.
Focus: to allocate resources to specific departments
Smaller companies often pick smaller, more customisable systems.
ERP leading vendors? and what kind of market are they operating in?
SAP and Oracle
The market is fragmented, which means neither of them have very high market shares.
What is Customer Relationship Management (CRM)?
Integrates customer data to be used by various departments.
Interface to the end customer.
What is Supply Chain Management (SCM)?
Provides a holistic overview of the value chain and is about the inventory of the company.
What is a Data Warehouse?
It collects and stores data from several different transactional systems in an organisation.
The data is consolidated, formatted and cannot be altered once its there –> standardisation.
Provides tools for querying, reporting, analysis which helps to make sense of the data.
What is a Data Mart?
A Data Mart contains specific (focused) data from the data warehouse and possible third-party external data to help solve a particular problem by particular users.
What is Business Intelligence?
It is the output side of data. Responsible for producing information and outputs from the data which can then be used to make decisions.
“Refers to tools for consolidating, analysing, and accessing data to support organisational decision-making”
What is online analytical processing (OLAP)?
Aggregates and summarises statistics/operations. The results of this analysis are stored in a data cube.
What is a data cube? Why are they useful?
It stores results of OLAP analysis. Updated every time a new transaction is made.
Running a query on a data cube is enables a much quicker response time than running them on the original database, since less data needs to be analysed.
Multidimensional
What is data mining? and what are three basic patterns?
It is the use of specific algorithms to identify hidden patterns in large sets of data.
The three basic patterns uncovered through data mining:
- Associations
- Clustering
- Sequential relationships (timeseries)
What are Associations?
Certain attribute values that frequently occur together within a data set.
What is a standard application of association analysis?
Market basket analysis
What is Association rule mining? And what are its two central concepts?
Seeks to identify the most common affinities among items.
- Support s(X): is the fraction of transactions that contain a certain set of items X
- Confidence c(X –> Y): the fraction of transactions that contain Y among those transactions that contain X
An association is strong when there is both high confidence and high support.
What is clustering?
Clustering seeks to identify natural groupings in data. The optimum number of clusters in unknown in advance and clusters require interpretation.
What are the four Vs of big data?
Velocity = the speed at which new data is coming in to be processed
Variety = the kind of data and variety of formats that it comes in
Volume = the amount of data that needs to be processed
Veracity = the reliability of the data
Analytics applies traditional statistical methods and AI to derive actionable insights from big data. What is an example of such a methods?
Neural networks: trained using huge historical data sets on the outcome of interests and other variables.
–> black box method
What is a black box method?
Neural networks are usually this type of method, it is extremely hard to quantify the impact of a particular variable on the outcome. ie. no external variables are used to calculate the output.
What is a database management system (DBMS)?
It stores and retrieves the data that an application creates and uses.
Different enterprise systems can share a DBMS to share common data –> localised between operation system and application level.
Improves efficiency since data is all in one place
What is Hadoop?
It is a open-source software used for storage and analysis of big data sets.
Allows distributed computing ie. split the data into multiple parts to run on different machines, and then put it back together again.
Uses MapReduce framework to process data
What are the four advantages of Hadoop?
- Flexibility: it can handle any kind of data from any source
- Scalability: it can be run on your own personal PC or it can be scaled efficiently to work on hundreds of computers
- Cost efficiency: it is fully open source software
- Fault tolerance: designed to avoid single points of failure