Data Architecture Flashcards
Purpose of a BI Data Architecture
Purpose
- Defines the data needed by the information architecture
- To provide the required analytics
- Also defines schemas, integration, transformations, storage and workflow
- Guides how data is collected, integrated, enhanced, stored and delivered to business people
Benefits of a BI Data Architecture
- Gives better understanding of data
- Helps align data with business strategies
- Gives guidelines for managing information supply chains
- Provides a structure for implementing data governance
- Helps enforce security and privacy
Requirements for DW/BI system
- Must make information easy to access
- Must present information consistently
- Must adapt to change
- Must present information in a timely way
- Must be secure
- Must be the authoritative and trustworthy foundation for improved decision making
- The business community must accept the DW/BI system
Information Supply Chains
A series of processes of data manipulations to produce BI reports or dashboards.
Examples:
- Gather, transform, integrate and consolidate data into a data warehouse
- Processes that take EDW data into data marts
- BI tools that process the data in the marts
OLTP – Online Transaction Processing
- Processes business transactions
- e.g., order entry, sales, payments
- the system responds immediately to user requests
- Key goals are ensuring
- fast throughput
- availability
- concurrency
- recoverability
- Data usually stored in relational databases
- Systems of Record
- keeps a record of every transaction
- Systems of Record
- Not optimised for presenting information, e.g.
- How many widgets did we sell in our store today?
- How many widgets did we sell across all stores?
- What are our sales of computers per region?
- For that, you need summed or aggregated information
Some Benefits
- reduced paper trails
- faster, more accurate forecast for revenues and expenses
OLAP – Online Analytical Processing
- Deliver business intelligence to the business
- Work with aggregated/summarised data
- Can quickly answer high-level questions
- Can use a relational database
- But we often use a multidimensional “OLAP cube”
- Cube implies three dimensions
- Usually have more than three
- But we often use a multidimensional “OLAP cube”
OLTP vs. OLAP
OLTP
- for processing transactions
- process all kinds of queries (read, insert, update and delete)
OLAP
- more complex queries, in a smaller volume,
- for the purpose of business intelligence or reporting
- generally optimized for read only and might not even support other kinds of queries
Types of Data Models
-
Conceptual data model
- structured business view of the data required to support business processes, KPIs;
- independent of any database or physical storage
-
Logical data model
- the overall structure of the data required to support the business requirements
- independent of any software or data storage structures
-
Phsyical data model –
- operational systems, operational data store (developer view)
Logical Data Model in detail
Data model most used in designing BI applications
- Business rules become relationships between objects
- Model is independent of DBMS or physical storage
- Shows specific entities and attributes to be implemented
- Defines primary keys, foreign keys, alternate keys
Data Modelling workflow
These are the steps in data modelling, going from business requirements to applications and database
Implementing the physical data model requires you to understand the characteristics and constraints of the database system being used
You need in-depth knowledge of the DBMS in order to
- Represent the logical data model in a database schema
- Add the entities and attributes definitions needed to meet operating requirements
- Configure and tune the database for performance requirements
What is a database model + common types
A database model is a specification describing how a database is structured and used.
Pros and Cons of
Relational E-R (entity relationship) data model
+ Very well suited to Systems of Record
+ Very good for OLTP
+ Flexible – can use joins to create virtual tables
– Not ideal for Business Intelligence
Tables and Keys
Tables have a relationship based on keys
Primary Key
- Each table has a one,
- unique to each row
- an entity that is also included in another dataset
- also knows as entity identification
Foreign Key
- a column whose keys are the Primary Keys in another table
- an attribute that links the data to another dataset
- a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table
- the foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table.
ER modelling building blocks
two entities:
- a person
- an email address
The person’s primary key: BusinessEntityID
Email entity attributes:
- EmailAddressID
- the email address itself
The email address entity’s primary key is a combination of BusinessEntityID and EmailAddressID
- In between these two identities is the relationship.
- A person can have one or more email accounts, but an email account is owned by one person.
The symbols on the ends of the relationship connector define the cardinality of the relationship:
- one-to-many, many-to-many, one-to-zero-or-one, etc.
Entity and Attribute types
- Independent entities don’t need parents nor children
- Dependent entities need parents to exist
two types of attributes in an entity:
- key
- non-key (don’t uniquely identify an attribute)