CC6 - midterms Flashcards
is the development, execution, and supervision of plans, policies, programs, and practices
that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles.
dm
Data Management
is any person who works in any aspect of data managemen
t (from technical management of data throughout its lifecycle to ensuring that data is properly utilized and leveraged) to meet strategic organizational goals. Data management professionals fill numerous roles, from the highly technical (e.g., database administrators, network administrators, programmers) to strategic business (e.g., Data Stewards, Data Strategists, Chief Data Officers).
- fill numerous roles, from highly technical (e.g., database administrators, network administrators, programmers
) to strategic business roles (e.g., Data Stewards, Data Strategists, Chief Data Officers
).
dmp
Data Management Professional
are not just assets in the sense that organizations invest in them in order to derive future value
. They are also vital to the day-to-day operations
of most organizations. They have been called the ‘currency’
, the 'life blood’
, and even the ‘new oil’
of the information economy. Whether or not an organization gets value from its analytics, it cannot even transact business without data.
di
Data and information
- In relation to information technology, it is also understood as
information that has been stored in digital form
(though data is not limited to information that has been digitized and data management principles apply to data captured on paper as well as in databases). Still, because today we can capture so much information electronically, we call many things ______ that would not have been called ______ in earlier times– things likenames
,addresses
,birthdates
, what one ate for dinner on Saturday, the most recent book one purchased.
d
1. Data
- the “
raw material of information
” - “
data in context
”
di
2. Data and Information
- An asset is an
economic resource
, that can be owned or controlled, and that holds or produces value. Assetscan be converted to money
. Data is widely recognized as an enterprise asset, though understanding of what it means to manage data as an asset is still evolving.
daaoa
3. Data as an Organizational Asset
- Data management shares characteristics with other forms of asset management, it involves
knowing what data an organization has and what might be accomplished with it
, then determining how best to use data assets to reach organizational goals. This balance can best be struck by following a set of principles that recognize salient features of data management and guide data management practice.
dmp
4. Data Management Principles
Because data management has distinct characteristics derived from the properties of data itself, it also presents challenges in following these principles.
dmc
5. Data Management Challenges.
Physical assets can be pointed to, touched, and moved around
. They can be in only one place at a time.
Financial assets must be accounted for on a balance sheet
. However, data is different.
Data is not tangible
. Yet it is durable
; it does not wear out, though the value of data often changes as it ages. Data is easy to copy and transport
. But it is not easy to reproduce if itis lost or destroyed.
ddfoa
5.1. Data Differs from Other Assets
Value is the difference between the cost of a thing and the benefit derived from that thing
. For some assets, like stock, calculating value is easy. It is the difference between what the stock cost when it was purchased and what it was sold for. But for data, these calculations are more complicated, because neither the costs nor the benefits of data are standardized.
dv
5.2. Data Valuation
Data is not tangible
. Yet it is durable
; it does not wear out, though the value of data often changes as it ages. Data is easy to copy and transport
. But it is not easy to reproduce if itis lost or destroyed.
dq
5.3. Data Quality
Deriving value from data does not happen by accident. It requires planning in many forms
. It starts with the recognition that organizations can control how they obtain and create data
. If they view data as a product that they create, they will make better decisions about it throughout its lifecycle.
pfbd
5.4. Planning for Better Data
Management Metadata describes what data an organization has, what it represents, how it is classified, where it came from, how it moves within the organization, how it evolves through use, who can and cannot use it, and whether it is of high quality
. Data is abstract. Definitions and other descriptions of context enable it to be understood. They make data, the data lifecycle, and the complex systems that contain data comprehensible
md
5.5. Metadata and Data
Data management is a complex process
. Data is managed in different places within an organization by teams that have responsibility for different phases of the data lifecycle. Data management requires design skills to plan for systems, highly technical skills to administer hardware and build software, data analysis skills to understand issues and problems, analytic skills to interpret data, language skills to bring consensus to definitions and models, as well as strategic thinking to see opportunities to serve customers and meet goals.
dmicf
5.6. Data Management is Cross-functional
Managing data requires understanding the scope and range of data within an organization
. Data is one of the ‘horizontals’
of an organization. It moves across verticals, such as sales, marketing, and operations.
eaep
5.7. Establishing an Enterprise Perspective
Today’s organizations use data that they create internally, as well as data that they acquire from external sources. They have to account for different legal and compliance requirements
across national and industry lines.
afop
5.8. Accounting for Other Perspectives
Like other assets, data has a lifecycle. To effectively manage data assets, organizations need to understand and plan for the data lifecycle
. Well-managed data is managed strategically, with a vision of how the organization will use its data.
dl
5.9. The Data Lifecycle
Managing data is made more complicated by the fact that there are different types of data that have different lifecycle management requirements
. Any management system needs to classify the objects that are managed.
dtd
5.10. Different Types of Data
Data not only represents value, it also represents risk. Low quality data
(inaccurate, incomplete, or out-of-date) obviously represents risk because its information is not right. But data is also risky because it can be misunderstood and misused.
dr
5.11. Data and Risk
Data management activities are wide-ranging and require both technical and business skills
. Because almost all of today’s data is stored electronically, data management tactics are strongly influenced by technology. From its inception, the concept of data management has been deeply intertwined with management of technology.
dmt
5.12. Data Management and Technology
The Leader’s Data Manifesto (2017)
recognized that an “organization’s best opportunities for organic growth lie in data.” Although most organizations recognize their data as an asset, they are far from being data-driven.
edmrlc
5.13. Effective Data Management Requires Leadership and Commitment
a set of choices and decisions
that together chart a high-level course of action to achieve high-level goals
. In the game of chess, a strategy is a sequenced set of moves to win by checkmate or to survive by stalemate.
A strategic plan is a high-level course of action to achieve high-level goals
. Typically, a data strategy requires a supporting Data Management program strategy – a plan for maintaining and improving the quality of data
, data integrity, access, and security while mitigating known and implied risks. The strategy must also address known challenges related to data management.
dms
6. Data Management Strategy
is a high-level course of action to achieve high-level goals
. Typically, a data strategy requires a supporting Data Management program strategy
sp
strategic plan
– a plan for maintaining and improving the quality of data
, data integrity, access, and security while mitigating known and implied risks. The strategy must also address known challenges related to data management.
dmps
Data Management program strategy
is a formal document that outlines an organization's principles, guidelines, and framework
for managing its data, defining roles, responsibilities, and processes to ensure data quality, security, compliance, and accessibility across the entire data lifecycle, aligning with the organization’s overall strategy and goals.
dmc
A Data Management Charter
- is a
document that clearly defines the boundaries and parameters of a data management project
, outlining what data will be included, the processes to be implemented, the expected deliverables, and any limitations or exclusions, ensuring all stakeholders have a shared understanding of what is and is not included within the project scope.
dmss
data management scope statement
-
outlines a structured plan for an organization to effectively manage its data
, including key phases like data assessment, governance establishment, data quality improvement, integration, storage, and security measures, with defined timelines and responsible parties to achieve optimal data utilization for informed decision-making.
dmir
Data Management Implementation Roadmap
- Data management involves a set of
interdependent functions
, each with its own goals, activities, and responsibilities. - Data management professionals must balance
strategic and operational goals, business and technical requirements, risk and compliance
, and various interpretations of data quality. - Different frameworks provide
different perspectives
to approach data management, clarifyingstrategy, roadmaps, team organization, and function alignment
.
dmf
DATA MANAGEMENT FRAMEWORKS
- Developed by
Henderson and Venkatraman (1999)
. - Focuses on the
relationship between data and information
within an organization. - Information is associated with
business strategy and operational use of data
. - Data is linked to
IT processes
that supportphysical data management and accessibility
.
sam
1. Strategic Alignment Model
- Developed by
Abcouwer, Maes, and Truijens (1997)
. - Also called the
9-cell model
. - Recognizes a
middle layer
between business and IT that focuses onplanning and architecture
. - Helps align
data management strategies
with an organization’stactical and operational needs
.
aim
2. The Amsterdam Information Model
expands on data management
by defining Knowledge Areas that make up the scope of data management.
ddf
The DAMA-DMBOK Framework
Three key visual representations describe DAMA-DMBOK Framework:
a. The DAMA Wheel – Places Data Governance at the center, surrounded by other Knowledge Areas (Data Architecture, Data Modeling, Data Quality, etc.).
b. The Environmental Factors Hexagon – Shows how people, processes, and technology interact.
c. The Knowledge Area Context Diagram – Details data management activities and their relationships using the SIPOC (Suppliers, Inputs, Processes, Outputs, Consumers) approach.
– Places Data Governance at the center, surrounded by other Knowledge Areas (Data Architecture, Data Modeling, Data Quality, etc.).
dw
The DAMA Wheel
– Shows how people, processes, and technology interact.
efh
The Environmental Factors Hexagon
– Details data management activities and their relationships using the SIPOC (Suppliers, Inputs, Processes, Outputs, Consumers) approach.
kacd
The Knowledge Area Context Diagram
- Describes how organizations evolve in data management.
-
Outlines four phases for improving data maturity:
Phase 1: The organization implements basic database capabilities through applications.
Phase 2: They address data quality challenges, focusing on Metadata and Data Architecture.
Phase 3: Establish Data Governance to structure and support data management.
Phase 4: Organizations leverage well-managed data for analytics and business intelligence.
dp
4. The DMBOK Pyramid (Aiken)
- Another variation of the
DAMA framework
developed bySue Geuens
. - Highlights the
dependencies between data management functions
. - Shows that
Business Intelligence and Analytics
rely on allother Knowledge Areas
(Data Architecture, Data Quality, Data Integration, etc.). - Positions
Data Governance
as essential for ensuring organizationsextract value from their data
ddmfe
5. DAMA Data Management Framework Evolved
DAMA AND THE DMBOK
- DAMA (Data Management Association International) was founded to address data management challenges.
- The DMBOK (Data Management Body of Knowledge) serves as an authoritative reference for data management professionals.
Purpose of the DMBOK: - Provides a functional framework for enterprise data management practices.
- Establishes a common vocabulary for data management concepts.
- Serves as the fundamental reference for the CDMP (Certified Data Management Professional) exam.
- was founded to
address data management challenges
.
d….
DAMA (Data Management Association International)
- serves as an
authoritative reference
for data management professionals.
d….
DMBOK (Data Management Body of Knowledge)
- describes the
central role that data ethics
plays in making informed, socially responsible decisions about data and its uses. Awareness of the ethics of data collection, analysis, and use should guide all data management professionals.
dhe
Data Handling Ethics
- describes the
technologies and business processes
that emerge as our ability to collect and analyze large and diverse data sets increases.
bdds
Big Data and Data Science
- outlines an
approach to evaluating and improving
an organization’s data management capabilities.
dmma
Data Management Maturity Assessment
- provide
best practices and considerations
for organizing data management teams and enabling successful data management practices.
dmore
Data Management Organization and Role Expectations
- describes
how to plan for and successfully move through
the cultural changes that are necessary to embed effective data management practices within an organization.
dmocm
Data Management and Organizational Change Management
manages computer databases
. The role may include capacity planning, installation, configuration, database design, migration
, performance monitoring, security, troubleshooting, as well as backup and data recovery.
da
Database administrator (DBA)
is responsible for maintaining an organization's computer networks
, including hardware and software. They ensure that networks are secure, efficient, and reliable.
na
network administrator
are data governance employees who collect and maintain data for the organizations
they work for while also protecting their data assets.
ds
Data stewards
is a leader who uses data to help a company make strategic decisions
. They are responsible for integrating data from various sources to create a unified view.
ds
data strategist
is a senior executive
who manages a company's data strategy and use
. They are responsible for ensuring that data is used effectively to support business decisions.
cdo
Chief Data Officer (CDO)
- Ensure that data is
accurate
and ofgood quality
dq
Data quality
-
Protect data
from unauthorized access, theft, or corruption
ds
Data security
-
Manage data governance strategies
, practices, and requirements
dg
Data governance
- Lead the
development of a data strategy
that aligns with business objectives
ds
Data strategy
-
Implement data analytics
into business processes
da
Data analytics
-
Promote data literacy
and a data-driven culture
dl
Data literacy
-
Ensure compliance
with data protection and privacy regulations
rc
Regulatory compliance
Some examples of basic metadata are:
- author
- date created
- date modified
- file size.
is also used for unstructured data
such as images, video, web pages, spreadsheets, etc. Web pages often include metadata in the form of meta tags.
m
Metadata
: Identify key business goals
that data can support and prioritize data needs aligned with strategic initiatives.
dbo
Define Business Objectives
: Conduct a comprehensive data audit
to identify all data sources, their formats, locations, quality, and usage across the organization.
dim
Data Inventory and Mapping
: Assess data accuracy, completeness
, consistency, and relevance to identify areas for improvement
.
dqa
Data Quality Analysis
: Identify key stakeholders
, their data requirements, and establish communication channels.
se
Stakeholder Engagement
: Develop clear guidelines
for data ownership, access control, data quality standards, retention policies, and privacy compliance
.
edgp
Establish Data Governance Policies
: Assign data stewards, data owners, and data custodians
with defined accountability for data management.
dgrr
Data Governance Roles and Responsibilities
: Implement processes to monitor and improve data quality
through data cleansing, validation, and standardization.
dqmp
Data Quality Management Plan
: Determine which data sources are critical
for integration and prioritize based on business needs.
dss
Data Source Selection
: Map data elements
from different sources to a unified schema and transform data to ensure consistency.
dmt
Data Mapping and Transformation
: Select appropriate data integration tools
to extract, transform, and load (ETL) data from disparate sources.
dit
Data Integration Tools
: Choose appropriate data storage architecture
(relational, dimensional, cloud-based) to facilitate analysis and reporting.
dw/ld
Data Warehouse/Lake Design
: Implement robust data security controls
including encryption, access controls, and data masking to protect sensitive information.
dsm
Data Security Measures
: Establish a reliable data backup and disaster recovery plan
to mitigate data loss risks.
dbrs
Data Backup and Recovery Strategy
: Choose appropriate BI tools
to visualize
and analyze data
for decision-making.
bits
Business Intelligence (BI) Tool Selection
: Create customized dashboards
and reports aligned with key business metrics to provide actionable insights.
dd
Dashboard Development
: Develop data models
to enable efficient querying and analysis of data
across different dimensions.
dma
Data Modeling and Analysis
: Regularly monitor data quality
metrics to identify and address data quality issues proactively.
dqm
Data Quality Monitoring
: Track key performance indicators
(KPIs) related to data management to assess the effectiveness of implemented strategies.
pe
Performance Evaluation
: Review and update the data management strategy
as business needs evolve and new technologies emerge.
ac
Adapting to Change
: Ensure strong support from leadership
and involve key stakeholders throughout the implementation process.
oa
Organizational Alignment
: Communicate changes effectively
and provide training to users to facilitate adoption of new data management practices
.
cm
Change Management
: Adhere to relevant data privacy regulations
(e.g., GDPR, CCPA) when managing sensitive data.
cr
Compliance Requirements
describe the purpose the Knowledge Area
and the fundamental principles that guide performance of activities within each Knowledge Area.
g
Goals
are the actions and tasks
required to meet the goals of the Knowledge Area. Some activities are described in terms of sub-activities, tasks, and steps.
a
Activities
- set the
strategic and tactical course
for meeting data management goals. it s occur on arecurring basis
.
pa
(P)Planning Activities
- are
organized around the system development lifecycle (SDLC)
(analysis, design, build, test, preparation, and deployment).
da
(D)Development Activities
-
ensure the ongoing quality of data and the integrity
, reliability, and security of systems through which datais accessed and used
.
ca
(C) Control Activities
- support the
use, maintenance
, and enhancement of systems and processes through which data is accessed and used.
oa
(O)Operational Activities
are the tangible things that each Knowledge Area requires to initiate its activities
. Many activities require the same inputs. For example, many require knowledge of the Business Strategy as input.
i
Inputs
are the outputs of the activities
within the Knowledge Area, the tangible things that each function is responsible for producing. Deliverables may be ends in themselves or inputs into other activities. Several primary deliverables are created by multiple functions.
d
Deliverables
describe how individuals and teams contribute to activities within the Knowledge Area
. Roles are described conceptually, with a focus on groups of roles required in most organizations. Roles for individuals are defined in terms of skills and qualification requirements. Skills Framework for the Information Age (SFIA) was used to help align role titles. Many roles will be cross-functional.
rr
Roles and Responsibilities
are the people responsible for providing
or enabling access to inputs
for the activities.
s
Suppliers
those that directly benefit
from the primary deliverables created by the data management activities.
c
Consumers
are the people that perform
, manage the performance of, or approve the activities
in the Knowledge Area.
p
Participants
are the applications and other technologies
that enable the goals of the Knowledge Area.
t
Tools
are the methods and procedures
used to perform activities and produce deliverables within a Knowledge Area. Techniques include common conventions, best practice recommendations, standards and protocols, and, where applicable, emerging alternative approaches.
t
Techniques
are standards for measurement
or evaluation of performance, progress, quality, efficiency, or other effect. The metrics sections identify measurable facets of the work that is done within each Knowledge Area. Metrics may also measure more abstract characteristics, like improvement or value
m
Metrics
is the process of discovering, analyzing, and scoping data requirements
, and then representing and communicating these data requirements in a precise form called the data model. Data modeling is a critical component of data management.
dm
Data modeling
is answering the question of “how”
* How the data will be gathered
* How the data will analysed
* How the data requirements will be grouped depending on their subset
* After that processes makakabuo na ng data model by communicating the data requirements.
dm
Data modeling
are critical to effective management of data. They:
Provide a common vocabulary around data
Capture and document explicit knowledge about an organization’s data and systems
Serve as a primary communications tool during projects
Provide the starting point for customization, integration, or even replacement of an application
dm
Data models
Goals and Principles
Confirming and documenting understanding of different perspectives facilitates:
Formalization: A data model documents a concise definition of data structures and relationships
. It enables assessment of how data is affected by implemented business rules, for current as-is states or desired target states.
Scope definition: A data model can help explain the boundaries for data context and implementation
of purchased application packages, projects, initiatives, or existing systems.
Knowledge retention/documentation: A data model can preserve corporate memory
regarding a system or project by capturing knowledge in an explicit form. It serves as documentation for future projects to use as the as-is version.
is most frequently performed in the context of systems development and maintenance efforts, known as the system development lifecycle (SDLC)
.
dm
Data modeling
is a representation of something that exists or a pattern
for something to be made. A model can contain one or more diagrams.
m
model
describes an organization’s data as the organization understands it
, or as the organization wants it to be. A data model contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small, for a project, to large, for an organization.
dm
Data model
: Data used to classify and assign types
to things. For example, customers classified by market categories or business sectors; products classified by color, model, size, etc.; orders classified by whether they are open or closed.
ci
Category information
: Basic profiles of resources needed conduct operational processes such as Product, Customer, Supplier, Facility, Organization, and Account.
ri
Resource information
: Data created while operational processes
are in progress. Examples include Customer Orders, Supplier Invoices, Cash Withdrawal, and Business Meetings.
bei
Business event information
: is often produced through point-of-sale systems
(either in stores or online).
dti
Detail transaction information
- is a thing about which an
organization collects information
. - sometimes referred to as the
nouns of an organization
. - can be thought of as the answer to a fundamental question –
who, what, when, where, why, or how
– or to a combination of these questions.
e
entity
are the occurrences or values of a particular entity
ei
Entity instances
Entity -/ type, instance
Entity – Jane, Employee
Entity type – Employee
Entity instance – Jane
Entity – Raine, Lecturer
Entity type – Lecturer
Entity instance – Raine
In _________ the term relationship
is often used, _________________ the term navigation path
is often used, and in _____________ terms such as **edge
or link
**are used, for example._______ can also vary based on level of detail. A relationship at the conceptual and logical levels is called a relationship, but a relationship at the physical level may be called by other names, such as constraint or reference, depending on the database technology.
rs ds ns ra
relational schemes
dimensional schemes
NoSQL schemes
Relationship aliases
Relationships between two entities
c dr
Cardinality is represented by the symbols that appear on both ends of a relationship line.
Data rules are specified and enforced through cardinality.
* Without cardinality, the most one can say about a relationship is that two entities are connected in some way.
The number of entities
in a relationship is the __________________ of the relationship. The most common are unary, binary, and ternary relationships
‘arity’
relationship involves only one entity
. A one-to-many
recursive relationship describes a hierarchy, whereas a many-to-many
relationship describes a network or graph. In a hierarchy, an entity instance has at most one parent (or higher-level entity). In relational modeling, child entities are on the many side of the relationship, with parent entities on the one side of the relationship. Ina network, an entity instance can have more than one parent.
u
unary (also known as a recursive or self-referencing)
An arity of two is also known as _____________. A binary relationship, the most common on a traditional data model diagram, involves two entities.
b
binary
An arity of three, known as ________, is a relationship that includes three entities. An example in fact-based modeling (object-role notation) appears in Figure 35. Here Student can register for a particular Course in a given Semester.
t
ternary
- is used in
physical and sometimes logical relational data
modelling schemes to represent a relationship. - may be
created implicitly when a relationship is defined between two entities
, depending on the database technology or data modeling tool, and whether the two entities involved have mutual dependencies.
fk
foreign key
(also called a key) is a set of one or more attributes that uniquely defines an instance of an entity
. This section defines types of keys by construction
(simple, compound, composite, surrogate)
and function
(candidate, primary, alternate).
i
identifier
is one attribute that uniquely identifies an entity instance.
* Ex. Universal Product Codes (UPCs) and Vehicle Identification Numbers(VINs).
sk
simple key
- is also an example of a
simple key
. - is a
unique identifier for a table
. Often a counter and always system-generated without intelligence, a surrogate key is an integer whose meaning is unrelated to its face value.
sk
surrogate key
is a set of two or more
attributes that together uniquely identify an entity instance. Ex. Phone number (area code + exchange + local number).
ck
compound key
contains one compound key
and at least one other simple or compound key or non-key attribute.
ck
composite key
A is any set of attributes
that uniquely identify an entity instance.
sk
super key
A is a minimal set of one or more attributes
(i.e., a simple or compound key) that identifies the entity instance to which it belongs.
ck
candidate key
is one or more attributes that a business professional
would use to retrieve a single entity instance.
bk
business key
is the candidate key that is chosen to be the unique identifier for an entity
.
pk
primary key
can still be used to find specific entity instances
. Often the primary key is a surrogate key and the ____________________ are business keys.
ak
alternate key
is one where the primary key contains only attentityributes that belong to that entity.
ie
independent entity
is one where the primary key contains at least one attribute from another entity
.
de
dependent entity
: Domains that specify the standard types of data
one can have in an attribute assigned to that domain. For example, Integer, Character(30), and Date are all data type domains.
dt
Data Type
: Domains that use patterns
including templates and masks, such as are found in postal codes and phone numbers, and character limitations (alphanumeric only, alphanumeric with certain special characters allowed, etc.) to define valid values.
df
Data Format
: Domains that contain a finite set of values
. These are familiar to many people from functionality like dropdown lists.
* For example, the list domain for OrderStatusCode can restrict values to only {Open, Shipped, Closed, Returned}.
l
List
: Domains that allow all values of the same data type
that are between one or more minimum and/or maximum values. Some ranges can be open-ended
.
* For example, OrderDeliveryDate must be between OrderDate and three months in the future.
r
Range
: Domains defined by the rules
that values must comply with in order to be valid. These include rules comparing values to calculated values or other attribute values in a relation or set.
* For example, ItemPrice must be greater than ItemCost.
rb
Rule-based
The use of schemes depends in part on the database being built
, as some are suited to particular technologies
dms
Data Model Schemes
CDM, LDM, PDM
- In aCDM, you can define data items and entity attributes. In aLDM, you can only define entity attributes.
- In theCDM, the foreign attribute migration does not occur until you generate aLDMorPDM.
- In theLDM, the foreign attribute migrates immediately.
- Conceptual, logical, physical data models(PDM)
First articulated by Dr. Edward Codd in 1970, _______________ provides a systematic way to organize data
so that they reflected their meaning(Codd, 1970). This approach had the additional effect of reducing redundancy in data storage
rt
relational theory
The concept of _________ started from a joint research project conducted by General Mills and Dartmouth College in the 1960’s. 33 In dimensional models, data is structured to optimize the query and analysis of large amounts of data
. In contrast, operational systems that support transaction processing are optimized for fast processing of individual transactions.
dm
dimensional modeling
The three main types of change are sometimes known by ORC.
-
Overwrite (Type 1): The
new value overwrites the old value
in place. -
New Row (Type 2): The n
ew values are written in a new row
, and the old row is marked as not current. -
New Column (Type 3): Multiple instances of a
value are listed in columns
on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value. The last value is discarded.
is the term given to normalizing the flat, single-table, dimensional structure
in a star schema into the respective component hierarchical or network structures.
Snowflaking
stands for the meaning or description of a single row of data
in a fact table; this is the most detail any row will have.
g
grain
are built with the entire organization in minD
instead of just a particular project; this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.
Conformed dimensions
use standardized definitions of termS
across individual marts. Different business users may use the same term in different ways
.
‘Customer additions’ may be different from ‘gross additions’ or ‘adjusted additions.’
cf
Conformed facts
- is a
graphical language
for modeling software. - has a
variety of notations
of which one (the class model) concerns databases. - class model specifies
classes
(entity types) and theirrelationship
types (Blaha, 2013).
uml
Unified Modeling Language (UML)
has Operations or Methods (also called its “behavior”). Class behavior is only loosely connected to business logic because it still needs to be sequenced and timed. In ER terms, the table has stored procedures/triggers. Class Operations can be:
c
class
, a family of conceptual modeling languages
, originated in the late 1970s. Fact-based languages view the world in terms of objects, the facts that relate or characterize those objects, and each role that each object plays in each fact.
fbm
Fact-Based Modeling
do not use attributes
, reducing the need for intuitive or expert judgment by expressing the exact relationships between objects (both entities and values).
fbm
Fact-based models
is a model-driven engineering approach
that starts with typical examples of required information or queries presented in any external formulation familiar to users, and then verbalizes these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language.
orm
Object-Role Modeling (ORM)
is similar in notation and approach to ORM
. The numbers in Figure 43 are references to verbalizations of facts.
fcom
Fully Communication Oriented Modeling (FCO-IM)
are used when data values must be associated in chronological order and with specific time values
.
tbp
Time-based patterns
is a detail-oriented, time-based, and uniquely linked set of normalized tables
that support one or more functional areas of business. Itis a hybrid approach, encompassing the best of breed between third normal form and star schema. Data Vaults are designed specifically to meet the needs of enterprise data warehouses.
dv
Data Vault
is a technique suited for information that changes overtime in both structure and content
. It provides graphical notation used for conceptual modeling similar to traditional data modeling, with extensions for working with temporal data.
am
Anchor Modeling
is a name for the category of databases
built on non-relational technology
.
n
NoSQL
Instead of taking a business subject and breaking it up into multiple relational structures, document databases frequently store the business subject in one structure called a ___________.
d
document
databases allow an application to store its data in only two columns
(‘key’ and ‘value’), with the feature of storing both simple (e.g., dates, numbers, codes) and complex information (unformatted text, video, music, documents, photos) stored within the ‘value’ column.
kv
Key-value
Out of the four types of NoSQL databases, _________________ is closest to the RDBMS
. Both have a similar way of looking at data as rows and values.
co
column-oriented
A____________ database is designed for data whose relations are well represented
as a set of nodes with an undetermined number of connections between these nodes.
g
graph
: This embodies the ‘real world’ view
of the enterprise being modeled in the database. It represents the current ‘best model’ or ‘way of doing business’
for the enterprise.
c
Conceptual
: The various users of the database management system operate on subsets
of the total enterprise model that are relevant to their particular needs. These subsets are represented as ‘external schemas’.
e
External
: The ‘machine view’ of the data
is described by the internal schema. This schema describes the stored representation of the enterprise’s information
i
Internal
This section provides an overview of conceptual, logical, and physical data modeling.
dm
Data Model
A ____________ captures the high-level data requirements
as a collection of related concepts. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.
cdm
conceptual data model
A _____________________ is a detailed representation of data requirements,
usually in support of a specific usage context, such as application requirements. Logical data models are still independent of any technology or specific implementation constraints.
* often begins as an extension of a conceptual data model
ldm
logical data model
A is in many cases a fully-attributed perspective
of the dimensional conceptual data model, as illustrated in Figure 49.
dldm
dimensional logical data model
A ___________ represents a detailed technical solution
, often using the logical data model as a starting point and then adapted to work within a set of hardware, software, and network tools. Physical data models are built for a particular technology.
pdm
physical data model (PDM)
A variant of a physical scheme is a , used for data in motion between systems
.
This model describes the structure of data
being passed between systems as packets or messages. When sending data through web services, an Enterprise Service Bus (ESB), or through Enterprise Application Integration (EAI), the canonical model describes what data structure the sending service and any receiving services should use.
cm
Canonical Model
is a virtual table
.
provide a means to look at data from one or many tables that contain or reference the actual attributes. A standard view runs SQL to retrieve data at the point when an attribute in the view is requested. An instantiated (often called ‘materialized’) view runs at a predetermined time. Views are used to simplify queries, control data access, and rename columns, without the redundancy and loss of referential integrity due to denormalization.
v
view
refers to the process of splitting a table
. It is performed to facilitate archiving and to improve retrieval performance
p
Partitioning
Vertically vs. Horizontally split
-
Vertically split: To reduce query sets,
create subset tables that contain subsets of columns
.
For example, split a customer table in two based on whether the fields are mostly static or mostly volatile (to improve load / index performance), or based on whether the fields are commonly or uncommonly included in queries (to improve table scan performance). -
Horizontally split: To reduce query sets,
create subset tables using the value of a column as the differentiator
.
For example, create regional customer tables that contain only customers in a specific region.
- is the deliberate
transformation of normalized logical data model entities into physical tables
with redundant or duplicate data structures. There are several reasons to denormalize data. - can also be used to
enforce user security
by segregating data into multiple views or copies of tables according to access needs. This process does introduce a risk of data errors due to duplication.
d
Denormalization
In dimensional data modeling, is called collapsing or combining. If each dimension is collapsed into a single structure
, the resulting data model is called a Star Schema (see Figure 51). If the dimensions are not collapsed
, the resulting data model is called a Snowflake (See Figure 49).
d
denormalization
is the process of applying rules in order to organize business complexity into stable data structures
. The basic goal of normalization is to keep each attribute in only one place to eliminate redundancy and the inconsistencies that can result from redundancy.
n
Normalization
: Ensures each entity has a valid primary key
, and every attribute depends on the primary key
; removes repeating groups, and ensures each attribute is atomic(not multi-valued). 1NF includes the resolution of many-to-many relationships with an additional entity often called an associative entity.
fnf
First normal form (1NF)
: Ensures each entity has the minimal primary key
and that every attribute depends on the complete primary key.
snf
Second normal form (2NF)
: Ensures each entity has no hidden primary keys
and that each attribute depends on no attributes outside the key
(“the key, the whole key and nothing but the key”).
tnf
Third normal form (3NF)
: Resolves overlapping composite candidate keys
. A candidate key is either a primary or an alternate key. ‘Composite’ means more than one (i.e., two or more attributes in an entity’s primary or alternate keys), and ‘overlapping’ means there are hidden business rules between the keys.
b/cnf
Boyce / Codd normal form (BCNF)
: Resolves all many-to-many-to-many relationships
(and beyond) in pairs until they cannot bebroken down into any smaller pieces.
fnf
Fourth normal form (4NF)
: Resolves inter-entity dependencies
into basic pairs, and all join dependencies use parts of primary keys.
fnf
Fifth normal form (5NF)
is the removal of details
in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects.
includes generalization and specialization
.
* Generalization groups the common attributes and relationships
of entities into super type entities, while specialization separates distinguishing attributes within an entity into subtype entities.
* This specialization
is usually based on attribute values
within an entity instance.
a
Abstraction
is the concept of exposing only the required essential characteristics and behavior
with respect to a context.
a
Abstraction