Data Architecture Flashcards

1
Q

Purpose of a BI Data Architecture

A

Purpose

  • Defines the data needed by the information architecture
    • To provide the required analytics
    • Also defines schemas, integration, transformations, storage and workflow
  • Guides how data is collected, integrated, enhanced, stored and delivered to business people
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Benefits of a BI Data Architecture

A
  • Gives better understanding of data
  • Helps align data with business strategies
  • Gives guidelines for managing information supply chains
  • Provides a structure for implementing data governance
  • Helps enforce security and privacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Requirements for DW/BI system

A
  • Must make information easy to access
  • Must present information consistently
  • Must adapt to change
  • Must present information in a timely way
  • Must be secure
  • Must be the authoritative and trustworthy foundation for improved decision making
  • The business community must accept the DW/BI system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Information Supply Chains

A

A series of processes of data manipulations to produce BI reports or dashboards.

Examples:

  • Gather, transform, integrate and consolidate data into a data warehouse
  • Processes that take EDW data into data marts
  • BI tools that process the data in the marts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OLTP – Online Transaction Processing

A
  • Processes business transactions
    • e.g., order entry, sales, payments
    • the system responds immediately to user requests
  • Key goals are ensuring
    • fast throughput
    • availability
    • concurrency
    • recoverability
  • Data usually stored in relational databases
    • Systems of Record
      • keeps a record of every transaction
  • Not optimised for presenting information, e.g.
    • How many widgets did we sell in our store today?
    • How many widgets did we sell across all stores?
    • What are our sales of computers per region?
    • For that, you need summed or aggregated information

Some Benefits

  • reduced paper trails
  • faster, more accurate forecast for revenues and expenses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

OLAP – Online Analytical Processing

A
  • Deliver business intelligence to the business
  • Work with aggregated/summarised data
  • Can quickly answer high-level questions
  • Can use a relational database
    • But we often use a multidimensional “OLAP cube”
      • Cube implies three dimensions
      • Usually have more than three
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLTP vs. OLAP

A

OLTP

  • for processing transactions
  • process all kinds of queries (read, insert, update and delete)

OLAP

  • more complex queries, in a smaller volume,
  • for the purpose of business intelligence or reporting
  • generally optimized for read only and might not even support other kinds of queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of Data Models

A
  1. Conceptual data model
    • structured business view of the data required to support business processes, KPIs;
    • independent of any database or physical storage
  2. Logical data model
    • ​​the overall structure of the data required to support the business requirements
    • independent of any software or data storage structures
  3. Phsyical data model
    • operational systems, operational data store (developer view)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logical Data Model in detail

A

Data model most used in designing BI applications

  • Business rules become relationships between objects
  • Model is independent of DBMS or physical storage
  • Shows specific entities and attributes to be implemented
  • Defines primary keys, foreign keys, alternate keys
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Modelling workflow

A

These are the steps in data modelling, going from business requirements to applications and database

Implementing the physical data model requires you to understand the characteristics and constraints of the database system being used

You need in-depth knowledge of the DBMS in order to

  • Represent the logical data model in a database schema
  • Add the entities and attributes definitions needed to meet operating requirements
  • Configure and tune the database for performance requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a database model + common types

A

A database model is a specification describing how a database is structured and used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pros and Cons of

Relational E-R (entity relationship) data model

A

+ Very well suited to Systems of Record

+ Very good for OLTP

+ Flexible – can use joins to create virtual tables

– Not ideal for Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tables and Keys

A

Tables have a relationship based on keys

Primary Key

  • Each table has a one,
  • unique to each row
  • an entity that is also included in another dataset
  • also knows as entity identification

Foreign Key

  • a column whose keys are the Primary Keys in another table
  • an attribute that links the data to another dataset
  • a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table
  • the foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ER modelling building blocks

A

two entities:

  1. a person
  2. an email address

The person’s primary key: BusinessEntityID

Email entity attributes:

  • EmailAddressID
  • the email address itself

The email address entity’s primary key is a combination of BusinessEntityID and EmailAddressID

  • In between these two identities is the relationship.
  • A person can have one or more email accounts, but an email account is owned by one person.

The symbols on the ends of the relationship connector define the cardinality of the relationship:

  • one-to-many, many-to-many, one-to-zero-or-one, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Entity and Attribute types

A
  • Independent entities don’t need parents nor children
  • Dependent entities need parents to exist

two types of attributes in an entity:

  • key
  • non-key (don’t uniquely identify an attribute)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

E-R Models vs. Dimensional Models

A

E-R models and dimensional models are both logical design techniques.

Dimensional modelling is better for Bi.

  • technique for making databases simple
  • means business users can easily understand the data
  • software can deliver results quickly and efficiently
  • optmised for business intelligence applications and data warehousing
  • depicts business processes throughout an enterprise and organises that data and its structure in a logical way
  • oriented towards needs of business users
    • understandability and performance
    • to enable BI reporting, query and analysis
  • star or snowflake E-R schemas or OLAP cubes

The dimensional model is simpler, easier to navigate, and more understandable than the ER model.

17
Q

Key concepts in dimensional modelling

A

Facts, Dimensions and Attributes can be organised in several ways, called schemas.

  • The choice of schema depends on variables such as the type of reporting the model needs to deliver, and the BI tool being used.
18
Q

Define Facts

A

aka. measures

  • A fact is a measurement of a business activity
    • It is usually numeric
    • e.g. sales, expenses, inventory levels
  • Facts measures business performance
  • Facts can be aggregated or derived
  • Is what the business people are interested in
19
Q

Dimensional modelling overview

A

Two main entities

  • facts (measures)
  • dimensions (context)

The key of the fact table is an aggregate key that uniquely identifies the sale.

Notice the dashed link to Tbl_Dim_Buyer, who bought the product for the store, rather than a link to a Store ID

In context, when someone buys a product we can use this model to capture information about that business process

Here we have aggregated data on sales into a fact table. Got a composite key.

Transformation of data from OLTP/SOR to BI involves building fact tables to allow business to focus on measurements of how the business is performing

20
Q

two types of columns in fact tables

A

keys and measures

21
Q

Fact table: key column and primary key

A

The key column is a group of foreign keys that point to primary keys of dimensional tables.

Primary key of a fact table is typically a multipart key consisting of the combination of foreign keys that point to the primary keys of dimensional tables that are associated with this fact table to enable business analysis

22
Q

Fact table - measures

A

actual measures of business activity

23
Q

3 types of facts

A
  • additive facts - can be summed up through all of the dimensions in the fact table
  • semi-additive facts - can be summed up for some of the dimensions in the fact table, but not the others
  • non-additive facts - annot be summed up for any of the dimensions present in the fact table
24
Q

What is a Dimension?

A
  • An entity that establishes the business context for the measures (facts) used by an enterprise
  • Define the who, what, where, why of dimensional model
  • Facts are numeric; dimensions are descriptive
  • Used to filter and analyse based on performance measures
25
Q

Dimensional Modelling and Facts

A

A dimensional model uses only a subset of facts from the system of record

  • based on the analytical need,
  • and to reduce complexity.
26
Q

3 types of common OLAP schemas

A
  • Star Schema
  • Snowflake schema
  • OLAP cube
27
Q

Star Schema (structure, where is it used, strength)

A
  • the simplest style of data mart schema
    • fact table in the center
    • dimension tables surround
  • dimensions are denormalized
    • each dimension represented by a single table
  • most widely used to develop data warehouses and dimensional data marts.
  • consists of one or more fact tables referencing any number of dimension tables

The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.

28
Q

Snowflake schema

A
  • similar to the star schema
  • dimensions are normalized into multiple related tables
    • (whereas the star schema’s dimensions are denormalized with each dimension represented by a single table)

A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables.

29
Q

OLAP Cube

A

a multidimensional schema

It is a hierarchical database that consists of one structure, a multidimensional array.

a spreadsheet pivot table is an approximation of a multidimensional cube

each cell of the cube is an actual measure

Depending on what data someone needs to get to in the cube, the combination of the different dimensions would get them to the cell with the data they need in that multidimensional array

Dimensional attributes and hierarchies are defined in the multidimensional product’s metadata.

Star and snowflake schemas are traditionally stored in relational databases. Multidimensional schemas are stored in multidimensional databases, also called OLAP cubes.

Multidimensional databases store and aggregate at various levels in the hierarchy. They enable drill up and drill down, meaning that you can go from the lowest level (the most detail), all the way to summarize data, then back down again through the hierarchy

These databases are specifically designed to understand the multidimensional arrays, the hierarchies, what can be added, what’s semi-additive and what’s not additive in the cube. Just as tables, columns, constraints etc. need to be defined for a relational database, the facts, dimensions, hierarchies, and so on need to be defined in the multidimensional database’s metadata repository.

30
Q

Dimensional Modelling and Business

A

All reporting and analytics is geared towards this dimensional model. Although the business people in your company may never see the actual data models you create, they will become very familiar with the reports and dashboards they help generate.

Unless the models can generate clear, effective reporting and analysis, the won’t help the business view the data and use it to make informed decisions that affect operations. Dimensional modelling is the key to generating this critical business information.

31
Q

Some OLAP cube manipulations

A

Slicing

  • choose a single value for one dimension and create a new cube with one fewer dimension

Dicing

  • fewer dimensions

Drilling up/down

  • navigate among levels of data ranging from the most summarized (up) to the most detailed (down)

Pivoting

  • rotate the cube in space to see its various faces