OLTP for processing transactions process all kinds of queries (read, insert, update and delete) OLAP more complex queries, in a smaller volume, for the purpose of business intelligence or reporting generally optimized for read only and might not even support other kinds of queries

_Tables have a relationship based on keys_ Primary Key Each table has a one, unique to each row an entity that is also included in another dataset also knows as entity identification Foreign Key a column whose keys are the Primary Keys in another table an attribute that links the data to another dataset a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table the foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table.

Data Architecture Flashcards by Oscar Leander

Purpose of a BI Data Architecture

Purpose

Defines the data needed by the information architecture
- To provide the required analytics
- Also defines schemas, integration, transformations, storage and workflow
Guides how data is collected, integrated, enhanced, stored and delivered to business people

How well did you know this?

Not at all

Perfectly

Benefits of a BI Data Architecture

Gives better understanding of data
Helps align data with business strategies
Gives guidelines for managing information supply chains
Provides a structure for implementing data governance
Helps enforce security and privacy

How well did you know this?

Not at all

Perfectly

Requirements for DW/BI system

Must make information easy to access
Must present information consistently
Must adapt to change
Must present information in a timely way
Must be secure
Must be the authoritative and trustworthy foundation for improved decision making
The business community must accept the DW/BI system

How well did you know this?

Not at all

Perfectly

Information Supply Chains

A series of processes of data manipulations to produce BI reports or dashboards.

Examples:

Gather, transform, integrate and consolidate data into a data warehouse
Processes that take EDW data into data marts
BI tools that process the data in the marts

How well did you know this?

Not at all

Perfectly

OLTP – Online Transaction Processing

Processes business transactions
- e.g., order entry, sales, payments
- the system responds immediately to user requests
Key goals are ensuring
- fast throughput
- availability
- concurrency
- recoverability
Data usually stored in relational databases
- Systems of Record
  - keeps a record of every transaction
Not optimised for presenting information, e.g.
- How many widgets did we sell in our store today?
- How many widgets did we sell across all stores?
- What are our sales of computers per region?
- For that, you need summed or aggregated information

Some Benefits

reduced paper trails
faster, more accurate forecast for revenues and expenses

How well did you know this?

Not at all

Perfectly

OLAP – Online Analytical Processing

Deliver business intelligence to the business
Work with aggregated/summarised data
Can quickly answer high-level questions
Can use a relational database
- But we often use a multidimensional “OLAP cube”
  - Cube implies three dimensions
  - Usually have more than three

How well did you know this?

Not at all

Perfectly

OLTP vs. OLAP

OLTP

for processing transactions
process all kinds of queries (read, insert, update and delete)

OLAP

more complex queries, in a smaller volume,
for the purpose of business intelligence or reporting
generally optimized for read only and might not even support other kinds of queries

How well did you know this?

Not at all

Perfectly

Types of Data Models

Conceptual data model
- structured business view of the data required to support business processes, KPIs;
- independent of any database or physical storage
Logical data model
- the overall structure of the data required to support the business requirements
- independent of any software or data storage structures
Phsyical data model –
- operational systems, operational data store (developer view)

How well did you know this?

Not at all

Perfectly

Logical Data Model in detail

Data model most used in designing BI applications

Business rules become relationships between objects
Model is independent of DBMS or physical storage
Shows specific entities and attributes to be implemented
Defines primary keys, foreign keys, alternate keys

How well did you know this?

Not at all

Perfectly

Data Modelling workflow

These are the steps in data modelling, going from business requirements to applications and database

Implementing the physical data model requires you to understand the characteristics and constraints of the database system being used

You need in-depth knowledge of the DBMS in order to

Represent the logical data model in a database schema
Add the entities and attributes definitions needed to meet operating requirements
Configure and tune the database for performance requirements

How well did you know this?

Not at all

Perfectly

What is a database model + common types

A database model is a specification describing how a database is structured and used.

How well did you know this?

Not at all

Perfectly

Pros and Cons of

Relational E-R (entity relationship) data model

+ Very well suited to Systems of Record

+ Very good for OLTP

+ Flexible – can use joins to create virtual tables

– Not ideal for Business Intelligence

How well did you know this?

Not at all

Perfectly

Tables and Keys

Tables have a relationship based on keys

Primary Key

Each table has a one,
unique to each row
an entity that is also included in another dataset
also knows as entity identification

Foreign Key

a column whose keys are the Primary Keys in another table
an attribute that links the data to another dataset
a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table
the foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table.

How well did you know this?

Not at all

Perfectly

ER modelling building blocks

two entities:

a person
an email address

The person’s primary key: BusinessEntityID

Email entity attributes:

EmailAddressID
the email address itself

The email address entity’s primary key is a combination of BusinessEntityID and EmailAddressID

In between these two identities is the relationship.
A person can have one or more email accounts, but an email account is owned by one person.

The symbols on the ends of the relationship connector define the cardinality of the relationship:

one-to-many, many-to-many, one-to-zero-or-one, etc.

How well did you know this?

Not at all

Perfectly

Entity and Attribute types

Independent entities don’t need parents nor children
Dependent entities need parents to exist

two types of attributes in an entity:

key
non-key (don’t uniquely identify an attribute)

How well did you know this?

Not at all

Perfectly

E-R Models vs. Dimensional Models

Study These Flashcards

E-R models and dimensional models are both logical design techniques.

Dimensional modelling is better for Bi.

technique for making databases simple
means business users can easily understand the data
software can deliver results quickly and efficiently
optmised for business intelligence applications and data warehousing
depicts business processes throughout an enterprise and organises that data and its structure in a logical way
oriented towards needs of business users
- understandability and performance
- to enable BI reporting, query and analysis
star or snowflake E-R schemas or OLAP cubes

The dimensional model is simpler, easier to navigate, and more understandable than the ER model.

Key concepts in dimensional modelling

Study These Flashcards

Facts, Dimensions and Attributes can be organised in several ways, called schemas.

The choice of schema depends on variables such as the type of reporting the model needs to deliver, and the BI tool being used.

Define Facts

Study These Flashcards

aka. measures

A fact is a measurement of a business activity
- It is usually numeric
- e.g. sales, expenses, inventory levels
Facts measures business performance
Facts can be aggregated or derived
Is what the business people are interested in

Dimensional modelling overview

Study These Flashcards

Two main entities

facts (measures)
dimensions (context)

The key of the fact table is an aggregate key that uniquely identifies the sale.

Notice the dashed link to Tbl_Dim_Buyer, who bought the product for the store, rather than a link to a Store ID

In context, when someone buys a product we can use this model to capture information about that business process

Here we have aggregated data on sales into a fact table. Got a composite key.

Transformation of data from OLTP/SOR to BI involves building fact tables to allow business to focus on measurements of how the business is performing

two types of columns in fact tables

Study These Flashcards

keys and measures

Fact table: key column and primary key

Study These Flashcards

The key column is a group of foreign keys that point to primary keys of dimensional tables.

Primary key of a fact table is typically a multipart key consisting of the combination of foreign keys that point to the primary keys of dimensional tables that are associated with this fact table to enable business analysis

Fact table - measures

Study These Flashcards

actual measures of business activity

3 types of facts

Study These Flashcards

additive facts - can be summed up through all of the dimensions in the fact table
semi-additive facts - can be summed up for some of the dimensions in the fact table, but not the others
non-additive facts - annot be summed up for any of the dimensions present in the fact table

What is a Dimension?

Study These Flashcards

An entity that establishes the business context for the measures (facts) used by an enterprise
Define the who, what, where, why of dimensional model
Facts are numeric; dimensions are descriptive
Used to filter and analyse based on performance measures

Dimensional Modelling and Facts

A dimensional model **uses only a subset of facts** from the system of record * based on the _analytical need_, * and to _reduce complexity_.

3 types of common OLAP schemas

* Star Schema * Snowflake schema * OLAP cube

Star Schema (structure, where is it used, strength)

* the simplest style of data mart schema * fact table in the center * dimension tables surround * **dimensions are denormalized** * ****each dimension represented by a *single table* * most widely used to develop **data warehouses and dimensional data marts**. * consists of one or more fact tables referencing any number of dimension tables The star schema is an important special case of the snowflake schema, and is more effective for **handling simpler queries**.

Snowflake schema

* similar to the star schema * **dimensions are normalized into _multiple related tables_** * (whereas the star schema's dimensions are denormalized with each dimension represented by a single table) A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables.

OLAP Cube

a multidimensional schema It is a hierarchical database that consists of one structure, a multidimensional array. a spreadsheet pivot table is an approximation of a multidimensional cube each cell of the cube is an actual measure Depending on what data someone needs to get to in the cube, the combination of the different dimensions would get them to the cell with the data they need in that multidimensional array Dimensional attributes and hierarchies are defined in the multidimensional product’s metadata. Star and snowflake schemas are traditionally stored in relational databases. **Multidimensional schemas are stored in multidimensional databases, also called OLAP cubes.** Multidimensional databases store and aggregate at various levels in the hierarchy. They enable drill up and drill down, meaning that you can go from the lowest level (the most detail), all the way to summarize data, then back down again through the hierarchy These databases are specifically designed to understand the multidimensional arrays, the hierarchies, what can be added, what’s semi-additive and what’s not additive in the cube. Just as tables, columns, constraints etc. need to be defined for a relational database, the facts, dimensions, hierarchies, and so on need to be defined in the multidimensional database’s metadata repository.

Dimensional Modelling and Business

**All reporting and analytics is geared towards this dimensional model.** Although the business people in your company may never see the actual data models you create, they will become very familiar with the reports and dashboards they help generate. Unless the models can generate clear, effective reporting and analysis, the won’t help the business view the data and use it to make informed decisions that affect operations. **Dimensional modelling is the key to generating this critical business information.**

Some OLAP cube manipulations

**Slicing** * choose a single value for one dimension and create a new cube with one fewer dimension **Dicing** * fewer dimensions **Drilling up/down** * navigate among levels of data ranging from the most summarized (up) to the most detailed (down) **Pivoting** * rotate the cube in space to see its various faces

Data Architecture Flashcards

(31 cards)