Chapter 2 Flashcards
it is a collaborative effort that involves multiple teams from multiple departments constantly communicating with each other.
Business analytics Implementation
is the starting point of an implementation, which will dictate which data will actually be conducive for the desired analysis.
Determining the information
what are the three major components of the system landscape?
Data sources
Enterprise data warehouse
Reporting and analysis tools
three main categories of
data sources in an Enterprise.
ERP systems
Other databases
Flat files
it is where an enterprises data is fed into and all reports are obtained directly from it
ERP System
makes extensive use of Master Data to help keep track of Business Partners and Items.
ERP System
Usually the maintenance of these is assigned to key people, who will be the ones to manage the creation of new Master Data or the updating of such.
ERP system
due to geographical or cost constraints, a branch of the company might be physically
impossible to connect to the corporate network.
Other databases
are usually Excel or delimited text files that business users create in order to make their own reports when needed.
Flat files
Delimited text files are usually either
Tab-delimited
Comma-separated value files (CSV)
is needed in order to work around these limitations.
Enterprise Data Warehouse
is built in order to consolidate the disparate data sources so that only the data necessary for reporting will actually be used.
Enterprise Data Warehouse
is concerned with
delivering “a single version of the truth”.
Consolidating Data
New hardware that will become the server hosting the Data Warehouse. It must be connected to the
Corporate Network
A dedicated project team from the Enterprise Side made up of
Business Users
is a tool to help build Data Warehouses,
SAP Business warehouse
is essentially a large Database, it is likely that technical column names are still used instead of more common, Business-friendly terms.
Enterprise Data Warehouse
is set up as a sort of “translator” so that the Business User can immediately understand what the data is, by allowing them to see technical terms as business terms.
Semantic Layer
what are the 3 tier architecture
Development (DEV)
Quality Assurance (QAS)
Production ( PRD)
is the most critical of the three, as it contains “live data”.
PRD
It is the system that is used in the day-to-day transactions of the company. A lot of redundancies might be required for this landscape, as it is needed for the proper function of the enterprise.
PRD
its physical hardware tends to be the most powerful of the three. Downtime for it must be reduced as much as possible due to its operational importance.
PRD
as its name states, is for development purposes.
DEV
When a new report needs to be created or a change in configuration needs to be made, it should be done here first.
DEV
or the configuration does not result in catastrophic failure, they will be rolled up and applied/promoted to ?
QAS
Other enterprises has a 4th, off-premises landscape known as
Disaster Recovery ( DR )
This is essentially a copy of PRD that is placed separate from the other three landscapes.
Disaster Recovery
It will act as a
contingency when PRD becomes subject to catastrophic failure
Disaster Recovery
examples of data reliability inconsistencies
Inconsistent Terminology
Round Errors and Truncation
Nulls and Zeroes
Incorrect Inputs
Outright Data Discrepancies
department might refer to an SKU as a
Product or material
Consider the number of decimal places a given piece of
numeric data has.
Rounding errors and truncation
This
could cause final numbers to deviate from the source.
Rounding Errors
have the same effect, however, instead of rounding the number, decimal places are outright omitted
Truncation
Null Values represent
Nothing
this is where the concept of “Garbage In, Garbage Out” is very apparent.
Incorrect Inputs
A company usually has some tactical decisions where promos and bundles of their products and services will be joined together
Outright Data Discrepancies
is the first data model that can be fully described mathematically. All data (fields/columns) is represented in terms of tuples (rows/record), grouped into relations.
Relational Model
can be obtained from multiple tables to produce one tuple of data by JOINing tables via their keys.
Data
initially pushed as
the standard language for relational databases
SQL
Three tables example
TXN
CUS_MAS
PROD_MAS
initially pushed as
the standard language for relational databases
SQL
stores all customer information.
CUS_MAS
stores all product information.
PROD_MAS
is a representation of the abstract structure of domain
information.
Schema or logical data model
It is often expressed as a diagram, and is used as foundation to designing database structures.
Schema
There are many different kinds of schemas, but the most-commonly used one in enterprise computing is
the
Star Schema
It is comprised of a Fact Table (usually just one) referencing any number of Dimension Tables.
Star Schema
records measurements for a specific event
Fact Table
by contrast will contain less records than Fact Tables.
Dimension Table
The data contain in dimension table are sometimes referred to as
Master Data
ensure that each row of data within the table is unique.
Keys
columns that automatically
increment, the more rows are populated, using some sort of algorithm
ID
Types of keys
Primary or foreign
Is to maintain a separate database that records all transactions for the day
Work around
One of the defining features today in Business Analytics Tools is what’s called
Self-Service BI.
is made available to the market to test its viability
trial run
SQL Meaning
stuctured query Language
It is the most
common way to store and access enterprise data, as it uses some form of Structured Query Language
Relational Model