Adbase H2 Midterms Flashcards
warehouse is a database designed to enable and support business intelligence (BI) activities, especially analytics.
intended to perform queries and analysis
optimized for data retrieval, not for transaction processing
centralizes and consolidates large amounts of data from multiple sources
allows organizations to derive valuable business insights from their data to improve decision-making
can be considered an organization’s “single source of truth”
data warehouse
The DW can analyze data about a particular subject or functional area.
Subjects can be products, customers, departments, regions, etc.
The functional area can be sales, marketing, finance, distribution, etc.
Focuses on the data rather than on the processes that modify the data
Subject-Oriented
The DW creates consistency among different data types from different sources.
Integrated
A student’s level in the database might be defined as “freshman”, “sophomore”, “junior”, or “senior” in the accounting department, and “FR”, “SO”, “JR”, “SR” in the computer information systems department.
Integrated
Data in DW represents the flow of data through time. It can be organized weekly, monthly, or annually, etc.
Time-variant
Once data is in a data warehouse, it is stable and does not change.
Non-Volatile
This is a databank that stocks all enterprise data and makes it manageable for reporting.
Data Warehouse Database
always implemented on the relational database management system (RDBMS) technology like SQL
Data Warehouse Database
These tools are used for performing all the conversions, summarizations, and all the changes needed to transform data into a unified format in the data warehouse. These include:
In case of missing data, populating them with defaults
Calculating summaries and derived data
Eliminating unwanted data in operational databases from loading into the data warehouse
Converting to common data names and definitions
- Extraction, Transformation, and Loading Tools (ETL)
is data about data that describes the data warehouse. It provides the source, transformation, integration, storage, usage, relationships, and history of each data element.
Metadata
contains information about the warehouse, which is used by data warehouse designers and administrators.
Technical Metadata
contains details that give end-users an easy way to understand the information stored in the data warehouse.
Business Metadata
Corporate users generally cannot work with databases directly.
- Data Warehouse Access Tools
help users produce corporate reports for analysis that can be in the form of spreadsheets, calculations, or interactive visuals.
Query and reporting tool
In such cases, custom reports are developed using application development tools when built-in graphical and analytical tools do not satisfy the analytical needs of an organization.
Application development tools
a process of discovering meaningful new correlations, patterns, and trends by mining a large amount of data. Data mining tools are used to make this process automatic.
Data mining
allow users to analyze the data using elaborate and complex multi-dimensional views.
OLAP tools
a small, single-subject data warehouse subset that provides decision support for the particular user group.
Data Marts
- Provides consistent information on various cross-functional activities. It is also supporting “blank” reporting and query.
ad-hoc
is a data-modeling technique used to map multi- dimensional decision support data into a relational database.
star schema
Star schema has two (2) common components:
Facts table
Dimension table
data that will be included in reports and used as the basis of business decisions. It contains measurement or facts to the data and foreign key to dimension table.
Facts table
are attributes that qualify and provide more information about facts. It contains dimensions of a fact and they are joined to fact table via foreign key.
Dimension table
a software tool that is used for data analysis and reporting purposes for business decisions
used by business analysts, managers, and executives. Example: In Netflix, OLAP was used for movie recommendations based on watch history.
- Online Analytical Processing (OLAP)
an operational system that manages the day-to-day transactions of an organization
used by the Database Administrator (DBA) and Database Professionals
Example: In ATM centers, OLTP is used for money withdrawals, transfers, deposits, and inquiries.
- Online Transaction Processing (OLTP)
Data is processed and viewed as part of a multi-dimensional structure.
- Multi-dimensional data analysis techniques
To deliver efficient decision support, OLAP tools must have the following:
Access to many kinds of DBMSs, flat files, and internal and external data sources
Rapid and consistent query response times
Support for very large databases because the data warehouse could easily and quickly grow to multiple terabytes in size
- Advanced Database support
permit the user to navigate the data in a way that simplifies and accelerates decision making or data analysis with easy-to-use graphical interfaces
- Easy-to-use end-user interfaces
Works directly with relational databases
Fact and dimension tables are stored as relations.
- Relational OLAP (ROLAP)
extends OLAP functionality to multi-dimensional database management systems (MDBMS)
best suited to manage, store, and analyze multi-dimensional data
- Multi-dimensional OLAP (MOLAP)
an extension of the GROUP BY clause that is used to create subtotals and grand totals for a set of columns
- ROLLUP operator
Like ROLLUP, this generates subtotals for all the combinations of grouping column s specified in the GROUP BY clause.
- CUBE operator
allows you to write a cross-tabulation, which means you can aggregate your results and rotate rows into columns
- PIVOT operator
Using the “BLANK” operator, we will display the total number of students enrolled in specific campuses and the grand total of students enrolled in all campuses.
ROLLUP
Using the “BLANK operator, we will turn the unique values/rows in the
Program column into multiple columns.
PIVOT
refers to analyzing massive amounts of data in a data warehouse or other sources to uncover hidden trends, patterns, and relationships. This explains the past and predicting the future for analysis.
Data mining
In this step, the goals of the businesses are set, and the important factors that will help in achieving the goal are discovered.
Business Understanding
This step will collect the entire data and populate the data in the tool (if using any tool).
Data Understanding
This step involves selecting the appropriate data, cleaning, constructing attributes from data, integrating data from multiple databases.
Data Preparation
Selection of the data mining technique such as decision-tree, generate test design for evaluating the selected model, building models from the dataset, and assessing the built model with experts to discuss the result is done in this step.
Modeling
This step will determine the degree to which the resulting model meets the business requirements. The model is reviewed for any mistakes or steps that should be repeated.
Evaluation
In this step, a deployment plan is made. The strategy to monitor and maintain the data mining model results to check for its usefulness is formed. Final reports are also made, and a review of the whole process is done to check any mistake and see if any step is repeated.
Deployment
used to retrieve important and relevant information about data and metadata.
Classification
used to identify data that are like each other. This process helps to understand the differences and similarities between the data.
Clustering
used to identify and analyze the relationship between variables.
Regression
used to help find the association between two or more Items. It discovers a hidden pattern in the data set.
Association Rules
used to observe data items in the dataset that do not match an expected pattern or expected behavior.
Outer detection
used to discover or identify similar patterns or trends in transaction data for a certain period.
Sequential Patterns
used to combine other data mining techniques like trends, sequential patterns, clustering, classification, etc. It analyzes past events or instances in the right sequence for predicting a future event.
Prediction
- Helps with the decision-making process
- Helps companies to get knowledge-based information
- Facilitates automated prediction of trends and behaviors as well as the automated discovery of hidden patterns
- The speedy process which makes it easy for the users to analyze a huge amount of data in less time
Benefits of data mining