Data Modelling and Design Flashcards
Data model pattern
A model for a particular type of problem which outline roles and relationships
Industry data models (consensus)
a model for a particular industry area
Deliverables for data modelling
- Diagrams
- Definitions
- Issues and questions
- Lineage (where the data comes from)
via conceptual, logical, physical data models
What is a data model?
describes the inherent logical structure of the data within a given domain and by the implication the underlying structure of that domain itself.
Levels of data model
enterprise
conceptual
logical
physical
Different DBs are different at the ____ and the same everywhere else
physical level
reverse engineerog
from the physical model to the logical model
forward engineering
from the logical model to the physical model
DDL
Data definitions language - used to create and modify the structure of objects in a database
External and Internal (SPARC)
External = Logical
Internal = Physical
What does a data model represent?
- entities
- attributes
- relationships
- organisation of data, irrespective of how it might be represented on the screen.
explicit relationship
x has y, carries data on the relationship itself
implicit relationship
derived through patterns or relationships within the dataset
Concrete Business Assertion
Relationship
Enterprise model
the big picture
Conceptual Model
Agree basic concepts and rules
Logical Model
detail, physical design
Physical Model
Optimised model for specific technical environment
Supertype Entity
Contains the primary key and common attributes
Subtype Entity
Contain Specific attributes for each type
Discriminator
Attribute to determine which subtype we are talking about
Generalisations
Supertypes
Specialisations
Subtypes
What are the two types of subtype
Exclusive and Inclusive
When are the boxes, crows feets, sub types and super types modeleld?
Logical Level
Non-identifying relationship types
A child entity (many end of relationship) does not depend upon its parent to get its uniqueness.
dotted line
How to tell which entity is a child and which is a parent
child is the many end of the relationship
How to tell in practise a non-identifying relationship type
the parents PK becomes just FK in the child, not the PK.
Identifying relationship type
A child entity inherits its parents identifier, the child is dependent on its parent.
solid line
Many to Many Relationship (nonspecific)
No way to determine parent and child, Each A have multiple Bs and each B has multiple As.
What does the resolution of a many to many relationship give
An installation/installation entity
Primary Key
Unique Identifier of an entity.
Recursive relationship
Relationship of an entity onto itself
e.g., employee manged by and manager of
A recursive many to many relationship is a…
Hierarchy
Features of a PK
- Mandatory
- Unique
- Unchanging
Surrogate key
artificial unique identifier not generated from the table data.
Candidate Key
minimal super key
Super Key
Any combination of attributes that could uniquely identify an entity (may end up with duplicated atrtibutes)
Alternate key
is a secondary key within a table
Nomalisation
- removing redundant attributes, duplicate attributes, compounded attributes (e.g., address, name)
- identifying a primary key and making sure everything depends on this
Why normalise data
- improved data quality (ensure integrity, remove redundancy)
- reduction on timescales
- easier to maintain
1NF
Every non-key attribute in an entity must depend on its primary key
2NF
Each entity must have the fewest possible correct primary key attributes
3NF
Each non key element must be directly dependent upon the primary key and non upon any other non-key attributes
Unnormalized data…
contains repeating groups
Transitive dependency
an attribute that is dependent on another attribute not just the PK
Normalised designs are best practise for
OLTP
Denormalised designs are best practise for
OLAP
Views, functions and stored procedures may be used for
restricting access to certain data
Best practise for naming conventions
Class - noun to describe general purpose/intent, maintained in repository (glossary)
Prime - word/phrase to identify the applciation area/category
Modifier - adjective or noun to add additional business information
Qualifier - special modifier, often units of measure
Class Naming Convention
noun to describe general purpose/intent, maintained in repository (glossary)
Prime Naming Convention
word/phrase to identify the applciation area/category
Modifier Naming Convention
adjective or noun to add additional business information
Qualifier Naming Convention
special modifier, often units of measure
Physical Database Design Best Practises
PRISM
P - performance
R - reusability
I - Integrity
S - Security
M - Maintainability
Denormalisation
Selectively and justifiably violating normalisation rules to reduce retrieval time, potentially at the expense of additional space, insert/update time and reduced data quality.
Views
logical virtual tables, a wrapper of code, used to simplify queries, control data access and rename columns.
Horizontal Partitioning
partitioning of the table into smaller tables on the basis of rows. still logically one table, but store separately making performance and security better.
Vertical Partitioning
Dividing the table based on different columns. E.g., hide sensitive columns.
ACID test for transaction processing
A - atomicity, every transaction is either completed completely or not at all
C - consistency, every transaction takes the database from one consistent state to another, data is always integral
I - isolation, two transactions happen separately
D - Durability, once its been committed then it is permanently saved
BASE test for transaction processing acronym
Basic Availability soft state and eventual consistency
When is BASE seen
- microservices, message services
- large companies processing lots of data
What are relationship labels?
The verb phrases describing the business rules in each direction between two entities
Relationship cardinality
whether a relationship is one to many, many to many or one to one
ACID acronym
atomicity, consistency, isolation, durability
Entity type, Entity instance, Dimension, Object, Hub and Node are all examples of what?
Entity Alias
Simple, Compound, Composite & surrogate are examples of
Construction type keys
Surrogate keys are an example of
a simple key
What is the role of data analysts and designers in the Data Governance process?
They act as intermediaries between data producers and data consumers and balance the needs of both stakeholder groups
Cardinality
defines the rules of relationships e.g. how many courses a membership can grant access to
three types of data modelling (in order)
conceptual, logical, physical
conceptual data model
captures high-level data requirements and relationships
logical data model
adds more detail and normalization to the conceptual model
Physical model
provides a detailed technical solution for storing and managing data e.g. primary keys and foreign keys
Schemes to present data
relational
dimensional
object orientated
fact-based
time-based
noSQL
WHat is a scheme
Schemes are different ways to organize and represent data based on specific requirements or use cases.
Domain
possible values that you could be assigned
What three characteristics does the DMBOK say are essential to create high-quality entity definitions (act as core metadata)
clarity accuracy completeness
Unary, Binary and Ternary are examples of what?
Arity of relationship
What is a fact table
a table that contains measurements (usually additive e.g., sales)
What is a dimension table
a table combining business elements (field are description of those elements)
Normalisation occurs as a conceptual model…
becomes a logical model
grain of a dimensional model
number of rows in the fact table
If we have generalised the common attributes and relationships of a group of entities, this generalised entity is known as a?
supertype
What does relationship cardinality achieve?
Cardinality rules show how many of each entity that can participate with how many of another entity they are related to
examples of Construction type keys?
Simple, Compound, Composit
non-identifying relationship
The primary key of the parent entity becomes a foreign key in the child entity
What are the major design objectives for the relational data modelling scheme?
Having an exact representation of business data and keeping one fact in one place
description of the third normal form (3NF)?
Ensures each entity has no hidden primary keys and that each attribute depends on no attributes outside the key (“the key, the whole key and nothing but the key”)
Entity definitions are essential to the value of the data model and act as core Metadata for the business. Which three characteristics does the DMBOK say are essential to create high-quality definitions?
Clarity, Accuracy and Completeness
Unary, Binary and Ternary are examples of what?
Arity of relationships
Surrogate keys are an example of a
simple key
Deliverables of the data modelling process
Data lineage
Definitions of entities attributes and relationships
a data model diagram
a list of unresolved business questions
What are fact tables used for
Storing numeric measurements, such as counts and quantities
How does ISO 11179 connect Data Modelling to other data management disciplines?
How does ISO 11179 connect Data Modelling to other data management disciplines?