Data Modeling Flashcards

1
Q

_________ is the process of discovering, analyzing, and scoping data
requirements, and then representing and communicating these data requirements in a precise form. This process is iterative and may include a conceptual, logical, and physical.

A

Data Modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the goals of data modeling?

A

To confirm and document an understanding of different perspectives, which leads to applications that more closely align with current and future business requirements, and creates a foundation to successfully complete broad-scoped initiatives such as master data management and data governance programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the business drivers for data modeling?

A

Provide a common vocabulary around data

Capture and document explicit knowledge about an organization’s data and systems

Serve as a primary communications tool during projects

Provide the starting point for customization, integration, or even replacement of an application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data modeling is most frequently performed in the context of systems __________ and __________ efforts,
known as the system development lifecycle (SDLC).

A

development and maintenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main purpose of a data model?

A

To describe a business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Within data modeling, a(n) ______ is a thing about which an organization collects information. They are also sometimes referred
to as the nouns of an organization.

A

entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A ___________ captures the high-level interactions between conceptual entities, the detailed interactions between logical entities, and the constraints between physical entities.

A

relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An __________ is a property that identifies, describes, or measures an entity. The physical correspondent of an ________ in an entity is a column, field, tag, or node in a table, view, document, graph, or file.

A

attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A _________ data model is a detailed representation of data requirements, usually in support of a specific usage context, such as application requirements. These types of data models are still independent of any technology or specific implementation constraints.

A

Logical

“Detailed, may lead to a physical design”

Logical Model captures detailed data requirements
— Analyze information requirements and existing documentation
— Add associative entities
— Add attributes
— Assign domains
— Assign keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A _________ data model represents a detailed technical solution, often using the logical data model as a starting point and then adapted to work within a set of hardware, software, and network tools. These types of data models are built for a particular technology.

A

physical

“Optimized for specific technical environments”

Physical Model outlines how data will be stored in enterprise systems
— Resolve logical abstractions
— Add attributes
— Add reference data objects
— Assign surrogate keys
— Denormalize for performance
— Index for performance
— Partition for performance
— Create views

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A __________ data model captures the high-level data requirements as a collection of related concepts. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.

A

Conceptual

“Agree on basic concepts and rules”

Conceptual Model may start off looking like a glossary
— Conceptual Data Model components
— Concepts = entities
— Activities = relationships
— Incorporate enterprise terminology
— Build by connecting nouns (entities) and verbs (relationships)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

____________ data model is a holistic, business-level, implementation-independent conceptual or logical data model providing a common consistent view of data across the enterprise. It is common to use the term to mean a high-level, simplified data model, but that is a question of abstraction for presentation.

A

Enterprise

“Big Picture”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A(n) _________ entity is one where the primary key contains only attributes that belong to that entity. Conversely, a________ entity is one where the primary key contains at least one attribute from another entity.

A

independent, dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In data modeling, a ________ is the complete set of possible values that an attribute can be assigned. It may be articulated in different ways (see points at the end of this section).

A

domain

examples
— Data type (e.g. Character(30) )
— Data format (e.g. phone number template)
— List
— Range
— Rule-based (e.g. ItemPrice > ItemCost)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This type of domain allows all values that are between a minimum and maximum value. For example, Order Delivery Date must be between Today’s Date and three months in the future.

A

Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A(n) __________ relationship is one where the primary key of the parent (the entity on the one side of the relationship) is migrated as a foreign key to the child’s primary key.

A

identifying

“The primary key of the parent becomes part of the primary key of the child.”

Engineering Notation is a solid line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In _____________ relationships, the primary key of the parent is migrated as a non-primary foreign key attribute to the child.

A

non-identifying

“The child does not depend on the parent to get its uniqueness”

Engineering Notation is a dotted line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which normal form ensures each entity has a valid primary key, and every attribute depends on the primary key; removes repeating groups, and ensures each attribute is atomic (not multi-valued)?
This form includes the resolution of many-to-many relationships with an additional entity often called an associative entity.

A

First normal form (1NF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which normal form ensures each entity has the minimal amount of attributes as the primary key and that every attribute depends on the complete primary key?

A

Second normal form (2NF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which normal form ensures each entity has no hidden primary keys and that each attribute depends on no attributes outside the key (“the key, the whole key and nothing but the key”).

A

Third normal form (3NF)

Reaching this form equates to the model being normalized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In Chris Bradley Data Modeling video, the relationship between many to many entities represents what?

A

A real business concept.

An example was if you had many assets at many locations, those could be called “installations” with other properties \ attributes such as installation date etc. Another example would be many customers who have\get many products. This would be “orders”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

_____________ is the removal of details in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects.

A

abstraction

An example of abstraction is the Party/Role structure, which can be used to capture how people and organizations play certain roles (e.g.,
employee and customer). Not all modelers or developers are comfortable with, or have the ability to work with abstraction. The modeler needs to weigh the cost of developing and maintaining an abstract structure versus the amount of rework required if the unabstracted structure needs to be modified in the future (Giles 2011).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A unary (also known as a recursive or self-referencing) relationship involves only one entity. A one-to-many recursive relationship describes a _______ , whereas a many-to-many relationship describes a ________ .

Which type has at most one parent (or higher-level entity).

A

hierarchy, network or graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A _______ is a generic entity type that has a relationship with one or more subtypes. A _______ is a sub-grouping of the entities in an entity type that is meaningful to the organization and that shares common attributes or relationships distinct from other subgroups.

A

supertype, subtype

Example would be a person table that has first last name. The person table would be the super type. Example subtypes would be doctor, lawyer where we then gather and store additional info pertinent to those types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A ________ key is the candidate key that is chosen to be the unique identifier for an entity. Even though an entity may contain more than one candidate key, only one candidate key can serve as the primary key for an entity. A(n) _______ key is a candidate key that although unique, was not chosen as the primary key.

A

primary, alternate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

This type key is an example of a simple key. It is a unique identifier for a table. Often a counter and always system-generated without intelligence, it is an integer whose meaning is unrelated to its face value. These keys serve technical functions and should not be visible to end users of a database. What type of key is this?

A

Surrogate

27
Q

A _______ key is a type of unique key in a database formed of attributes that exist and are used in the external world outside the database

A

natural

28
Q

___________is the process of applying rules in order to organize business complexity into stable data structures. The basic goal of this process is to keep each attribute in only one place to eliminate redundancy and the inconsistencies that can result from redundancy.

A

Normalization

29
Q

The DBA should keep these design principles in mind when designing and building a database. (hint PRISM):

A

Performance and ease of use: Ensure quick and easy access to data by approved users in a usable and business-relevant form, maximizing the business value of both applications and data.

Reusability: The database structure should ensure that, where appropriate, multiple applications can use the data and that the data can serve multiple purposes (e.g., business analysis, quality
improvement, strategic planning, customer relationship management, and process improvement). Avoid coupling a database, data structure, or data object to a single application.

Integrity: The data should always have a valid business meaning and value, regardless of context, and should always reflect a valid state of the business. Enforce data integrity constraints as close to the data
as possible, and immediately detect and report violations of data integrity constraints.

Security: True and accurate data should always be immediately available to authorized users, but only to authorized users. The privacy concerns of all stakeholders, including customers, business partners, and government regulators, must be met. Enforce data security, like data integrity, as close to the data as possible, and immediately detect and report security violations.

Maintainability: Perform all data work at a cost that yields value by ensuring that the cost of creating,
storing, maintaining, using, and disposing of data does not exceed its value to the organization. Ensure
the fastest possible response to changes in business processes and new business requirements.

30
Q

In ACID Transactions, ________ indicates that all operations are performed, or none of them is, so that if one part of the transaction fails, then the entire transaction fails.

A

Atomicity

31
Q

In ACID Transactions, ________ indicates that the transaction must meet all rules defined by the system at all times and must void half completed transactions.

A

Consistency

32
Q

In ACID Transactions, ________ indicates that each transaction is independent unto itself.

A

Isolation

33
Q

In ACID Transactions, ________ indicates that once complete, the transaction cannot be undone.

A

Durability

34
Q

An alternative to ACID transactions is BaSE. What does it stand for?

A

Basically Available, Soft State, and Eventual Consistency

BASE-type systems are common in Big Data environments. Large online organizations and social media companies commonly use BASE implementations, as immediate accuracy of all data elements at all times is not necessary.

35
Q

ACID or Base?
In which system must the table and schema exist?

A

ACID

BaSE is dynamic and can adjust on the fly

36
Q

ACID or Base?
Which system supports consistency that is Strong, Eventual or None

A

BaSE

ACID “Strong Consistency is available”

37
Q

ACID or Base?
Which system type has processing focus that is Row/Column focused ?

A

ACID

BaSE is Wide-Column focused ?

38
Q

What are some data modeling schemes used to represent data?

A

Schemes to represent data: Relational, Dimensional, Object-Oriented, Fact-Based, Time-Based, NoSQL

Data Modeling Schemes: mid to large organizations usually have an app landscape with multiple schemes and models evolved over time

39
Q

True or False:
Once database architecture is complete, compare a reverse-engineered version of the Physical Model to the Logical Model to ensure requirements are met

A

True

40
Q

Canonical - used for data in motion between systems - describes structure sending and receiving services
What are two that should be used?

A

— Enterprise Service Bus (ESB)
— Enterprise Application Integration (EAI)

41
Q

Within data modelling, ______ is the numerical relationship between rows of one table and rows in another.

A

cardinality

42
Q

What are the three types of cardinality?

A

zero, one, many

43
Q

Rectangular or Round : Which box type is used for primary key entities?

A

Rectangular

44
Q

Rectangular or Round : Which box type is used for foreign key entities?

A

Round

45
Q

Construction keys: _______ is a simple counter that provides unique id within a table.

A

Surrogate

46
Q

Construction keys: _______ has 2+ attributes to uniquely id instance (e.g. phone number - area code + exchange + local number)

A

Compound

47
Q

Construction keys: _______ is a compound key + simple or compound key

A

Composite

48
Q

Function keys: __________ is any set of attributes that uniquely id an entity instance

A

Super Key

49
Q

Function keys: __________ is a minimal set of one or more attributes that id entity instance. i.e. Natural key - business key

A

Candidate key

50
Q

Function keys: __________ is a candidate key chosen as unique id

A

Primary key

51
Q

True or False:
Often primary key is business key, and alternate keys are primary keys

A

False.
Often primary key is surrogate key, and alternate keys are business keys

52
Q

In Dimensional modeling, ______ tables include these characteristics:
— Rows correspond to particular measurements and are numeric
— 90% of contents of database
— Many rows

A

fact

53
Q

In Dimensional modeling, ______ tables include these characteristics:
— Mostly textual descriptions
— Must have a unique identifier for ea row (surrogate or natural key)

A

dimension

54
Q

The ______ of the dimensional model is the finest level of detail that is implied when the fact and dimension tables are joined.

A

grain; granularity

55
Q

___________ dimensions allow facts and measures to be categorized and described in the same way across multiple facts or data marts, ensuring consistent reporting across the enterprise. These dimensions can be referenced by multiple fact tables in a data warehouse.

A

Conformed

56
Q

______ is a graphical language for modeling software

A

UML
Unified Modeling Language

57
Q

Fact-based modeling - form plausible sentences business user might use - Object-Role Modeling

A

Just a note to know

58
Q

Name some Time based models

A

Data vault -
hubs, links, satellites - normalized

59
Q

__________refers to the process of splitting a table. It is performed to facilitate archiving and to improve retrieval performance.

What are the two ways to do it?

A

Partitioning

Vertically split: To reduce query sets, create subset tables that contain subsets of columns

Horizontally split: To reduce query sets, create subset tables using the value of a column as the
differentiator.

60
Q

What are 3 reasons to denormalize?

A

— Combine data and avoid run-time joins
— Create smaller, pre-filtered copies of data to reduce table scans of large tables
— Pre-calculate and store expensive calcs

61
Q

What risks does denormalization introduce?

A

Denormalization introduces risk of errors due to duplication

62
Q

How is star schema denormalized?

A

Star schemas denormalize the data, which means adding redundant columns to some dimension tables to make querying and working with the data faster and easier.

“Collapsed”!

63
Q

__________ is the process of understanding, recording, and visualizing data as it flows from data sources to consumption.

A

Data lineage

Source to target mapping

64
Q

What are the steps to build a data model?

A

select scheme and notation
gather entities and relationships
utilize business specific terminology (i.e. from data glossary)
obtain signoff