Data Science Flashcards
What is Business Intelligence System (BI)?
Business intelligence systems are information systems that assist managers and other professionals
1) To analyse current and past activities
2) To predict future events
2 characteristics of BI systems?
1) Reporting (RFM Analysis + OLAP)
2) Data Mining
Operational DB vs Dimensional DB?
Operational DB
1) Used for structured transaction data processing
2) Current data is used
3) Data are inserted, updated and deleted by the user
Dimensional DB
1) Used for unstructured analytical data processing
2) Current and historical data are used
3) Data are loaded and updated systematically, not by
user
What is RFM Analysis?
RFM Analysis analyses and ranks customers according to purchasing patterns.
R =>recent (most recent order)
F =>frequent (how often an order is made)
M => money (dollar amount of money)
1 (Highest/Best) to 5 (Lowest/Worst)
Customers sorted into 5 groups
What is Online Analytical Processing (OLAP)
An OLAP report has measures and dimensions
Measure - data item of interest
Dimension - a characteristic of a measure
What is a OLAP Cube?
A presentation of a measure with associated dimensions:
1) An OLAP cube can have any number of axes
2) The terms OLAP cube and OLAP report are synonymous
What does OLAP allow?
OLAP allows drill down a further division of data into more detail
What is a Star Schema?
Data modelling technique used to map multidimensional decision support data into relational database
Creates near equivalent of multidimensional database schema from existing relational database
Four components:
1) Facts
2) Dimensions
3) Attributes
4) Attribute Hierarchy
What is FACTS?
Facts contain numeric measurements (values) that represent a specific business aspect or activity
Normally stored in a fact table that is centre of the star schema
What is a FACT table?
Fact tables contain facts that are linked through their dimensions via keys.
Metrics are facts computed or derived at runtime
What are Dimension tables?
Dimensions describe the business objects that controllable keys to the fact table
1:N relationship between dimension tables and fact tables
Dimensional tables are denormalized to maximize performance
What are Attributes in terms of Star Schema?
Used to search, filter or classify facts
Dimensions provide descriptive characteristics about the facts through their attributes
What is Attribute Hierarchy in terms of Star Schema?
The attribute hierarchy allows the end user to perform drill down and null up searches
What is the snowflake schema?
Logical arrangement of tables in a multidimensional database, resembles snowflake shape
Extension of star schema, adds additional dimensions
Dimensions are normalized
Inmon Vs Kimball Architecture?
Inmon
1) Top-down approach
2) Data warehouse designed using a normalized
enterprise data model that contains atomic data
which is at lowest level of detail (typically 3NF)
Kimball
1) bottom-up approach
2) Data warehouse is nothing but a set of data marts
designed as per business processes and joined
together using conformed dimensions across the
business process