CC6 - Chapter 2 Flashcards

1
Q

is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. Data modeling is a critical component of data management.

A

Data modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

is answering the question of “how”
* How the data will be gathered
* How the data will analysed
* How the data requirements will be grouped depending on their subset
* After that processes makakabuo na ng data model by communicating the data requirements.

A

Data modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

are critical to effective management of data. They:

Provide a common vocabulary around data
Capture and document explicit knowledge about an organization’s data and systems
Serve as a primary communications tool during projects
Provide the starting point for customization, integration, or even replacement of an application

A

Data models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Goals and Principles

A

Confirming and documenting understanding of different perspectives facilitates:

Formalization: A data model documents a concise definition of data structures and relationships. It enables assessment of how data is affected by implemented business rules, for current as-is states or desired target states.

Scope definition: A data model can help explain the boundaries for data context and implementation of purchased application packages, projects, initiatives, or existing systems.

Knowledge retention/documentation: A data model can preserve corporate memory regarding a system or project by capturing knowledge in an explicit form. It serves as documentation for future projects to use as the as-is version.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is most frequently performed in the context of systems development and maintenance efforts, known as the system development lifecycle (SDLC).

A

Data modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

is a representation of something that exists or a pattern for something to be made. A model can contain one or more diagrams.

A

model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describes an organization’s data as the organization understands it, or as the organization wants it to be. A data model contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small, for a project, to large, for an organization.

A

Data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

: Data used to classify and assign types to things. For example, customers classified by market categories or business sectors; products classified by color, model, size, etc.; orders classified by whether they are open or closed.

A

Category information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

: Basic profiles of resources needed conduct operational processes such as Product, Customer, Supplier, Facility, Organization, and Account.

A

Resource information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

: Data created while operational processes are in progress. Examples include Customer Orders, Supplier Invoices, Cash Withdrawal, and Business Meetings.

A

Business event information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

: is often produced through point-of-sale systems(either in stores or online).

A

Detail transaction information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • is a thing about which an organization collects information.
  • sometimes referred to as the nouns of an organization.
  • can be thought of as the answer to a fundamental question – who, what, when, where, why, or how – or to a combination of these questions.
A

entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

are the occurrences or values of a particular entity

A

Entity instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Entity -/ type, instance

A

Entity – Jane, Employee
Entity type – Employee
Entity instance – Jane

Entity – Raine, Lecturer
Entity type – Lecturer
Entity instance – Raine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In _________ the term relationship is often used, _________________ the term navigation path is often used, and in _____________ terms such as **edge or link **are used, for example._______ can also vary based on level of detail. A relationship at the conceptual and logical levels is called a relationship, but a relationship at the physical level may be called by other names, such as constraint or reference, depending on the database technology.

A

relational schemes
dimensional schemes
NoSQL schemes
Relationship aliases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Relationships between two entities

A

Cardinality is represented by the symbols that appear on both ends of a relationship line.
Data rules are specified and enforced through cardinality.
* Without cardinality, the most one can say about a relationship is that two entities are connected in some way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The number of entities in a relationship is the __________________ of the relationship. The most common are unary, binary, and ternary relationships

A

‘arity’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

relationship involves only one entity. A one-to-many recursive relationship describes a hierarchy, whereas a many-to-many relationship describes a network or graph. In a hierarchy, an entity instance has at most one parent (or higher-level entity). In relational modeling, child entities are on the many side of the relationship, with parent entities on the one side of the relationship. Ina network, an entity instance can have more than one parent.

A

unary (also known as a recursive or self-referencing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

An arity of two is also known as _____________. A binary relationship, the most common on a traditional data model diagram, involves two entities.

A

binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

An arity of three, known as ________, is a relationship that includes three entities. An example in fact-based modeling (object-role notation) appears in Figure 35. Here Student can register for a particular Course in a given Semester.

A

ternary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
  • is used in physical and sometimes logical relational data modelling schemes to represent a relationship.
  • may be created implicitly when a relationship is defined between two entities, depending on the database technology or data modeling tool, and whether the two entities involved have mutual dependencies.
A

foreign key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

(also called a key) is a set of one or more attributes that uniquely defines an instance of an entity. This section defines types of keys by construction
(simple, compound, composite, surrogate)
and function
(candidate, primary, alternate).

A

identifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

is one attribute that uniquely identifies an entity instance.
* Ex. Universal Product Codes (UPCs) and Vehicle Identification Numbers(VINs).

A

simple key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
  • is also an example of a simple key.
  • is a unique identifier for a table. Often a counter and always system-generated without intelligence, a surrogate key is an integer whose meaning is unrelated to its face value.
A

surrogate key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

is a set of two or more attributes that together uniquely identify an entity instance. Ex. Phone number (area code + exchange + local number).

A

compound key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

contains one compound key and at least one other simple or compound key or non-key attribute.

A

composite key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A is any set of attributes that uniquely identify an entity instance.

A

super key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A is a minimal set of one or more attributes (i.e., a simple or compound key) that identifies the entity instance to which it belongs.

A

candidate key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

is one or more attributes that a business professional would use to retrieve a single entity instance.

A

business key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

is the candidate key that is chosen to be the unique identifier for an entity.

A

primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

can still be used to find specific entity instances. Often the primary key is a surrogate key and the ____________________ are business keys.

A

alternate key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

is one where the primary key contains only attentityributes that belong to that entity.

A

independent entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

is one where the primary key contains at least one attribute from another entity.

A

dependent entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

: Domains that specify the standard types of data one can have in an attribute assigned to that domain. For example, Integer, Character(30), and Date are all data type domains.

35
Q

: Domains that use patterns including templates and masks, such as are found in postal codes and phone numbers, and character limitations (alphanumeric only, alphanumeric with certain special characters allowed, etc.) to define valid values.

A

Data Format

36
Q

: Domains that contain a finite set of values. These are familiar to many people from functionality like dropdown lists.
* For example, the list domain for OrderStatusCode can restrict values to only {Open, Shipped, Closed, Returned}.

37
Q

: Domains that allow all values of the same data type that are between one or more minimum and/or maximum values. Some ranges can be open-ended.
* For example, OrderDeliveryDate must be between OrderDate and three months in the future.

38
Q

: Domains defined by the rules that values must comply with in order to be valid. These include rules comparing values to calculated values or other attribute values in a relation or set.
* For example, ItemPrice must be greater than ItemCost.

A

Rule-based

39
Q

The use of schemes depends in part on the database being built, as some are suited to particular technologies

A

Data Model Schemes

40
Q

CDM, LDM, PDM

A
  • In aCDM, you can define data items and entity attributes. In aLDM, you can only define entity attributes.
  • In theCDM, the foreign attribute migration does not occur until you generate aLDMorPDM.
  • In theLDM, the foreign attribute migrates immediately.
  • Conceptual, logical, physical data models(PDM)
41
Q

First articulated by Dr. Edward Codd in 1970, _______________ provides a systematic way to organize data so that they reflected their meaning(Codd, 1970). This approach had the additional effect of reducing redundancy in data storage

A

relational theory

42
Q

The concept of _________ started from a joint research project conducted by General Mills and Dartmouth College in the 1960’s. 33 In dimensional models, data is structured to optimize the query and analysis of large amounts of data. In contrast, operational systems that support transaction processing are optimized for fast processing of individual transactions.

A

dimensional modeling

43
Q

The three main types of change are sometimes known by ORC.

A
  • Overwrite (Type 1): The new value overwrites the old value in place.
  • New Row (Type 2): The new values are written in a new row, and the old row is marked as not current.
  • New Column (Type 3): Multiple instances of a value are listed in columns on the same row, and a new value means writing the values in the series one spot down to make space at the front for the new value. The last value is discarded.
44
Q

is the term given to normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures.

A

Snowflaking

45
Q

stands for the meaning or description of a single row of data in a fact table; this is the most detail any row will have.

46
Q

are built with the entire organization in minD instead of just a particular project; this allows these dimensions to be shared across dimensional models, due to containing consistent terminology and values.

A

Conformed dimensions

47
Q

use standardized definitions of termS across individual marts. Different business users may use the same term in different ways.
‘Customer additions’ may be different from ‘gross additions’ or ‘adjusted additions.’

A

Conformed facts

48
Q
  • is a graphical language for modeling software.
  • has a variety of notations of which one (the class model) concerns databases.
  • class model specifies classes(entity types) and their relationship types (Blaha, 2013).
A

Unified Modeling Language (UML)

49
Q

has Operations or Methods (also called its “behavior”). Class behavior is only loosely connected to business logic because it still needs to be sequenced and timed. In ER terms, the table has stored procedures/triggers. Class Operations can be:

50
Q

, a family of conceptual modeling languages, originated in the late 1970s. Fact-based languages view the world in terms of objects, the facts that relate or characterize those objects, and each role that each object plays in each fact.

A

Fact-Based Modeling

51
Q

do not use attributes, reducing the need for intuitive or expert judgment by expressing the exact relationships between objects (both entities and values).

A

Fact-based models

52
Q

is a model-driven engineering approach that starts with typical examples of required information or queries presented in any external formulation familiar to users, and then verbalizes these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language.

A

Object-Role Modeling (ORM)

53
Q

is similar in notation and approach to ORM. The numbers in Figure 43 are references to verbalizations of facts.

A

Fully Communication Oriented Modeling (FCO-IM)

54
Q

are used when data values must be associated in chronological order and with specific time values.

A

Time-based patterns

55
Q

is a detail-oriented, time-based, and uniquely linked set of normalized tables that support one or more functional areas of business. Itis a hybrid approach, encompassing the best of breed between third normal form and star schema. Data Vaults are designed specifically to meet the needs of enterprise data warehouses.

A

Data Vault

56
Q

is a technique suited for information that changes overtime in both structure and content. It provides graphical notation used for conceptual modeling similar to traditional data modeling, with extensions for working with temporal data.

A

Anchor Modeling

57
Q

is a name for the category of databases built on non-relational technology.

58
Q

Instead of taking a business subject and breaking it up into multiple relational structures, document databases frequently store the business subject in one structure called a ___________.

59
Q

databases allow an application to store its data in only two columns (‘key’ and ‘value’), with the feature of storing both simple (e.g., dates, numbers, codes) and complex information (unformatted text, video, music, documents, photos) stored within the ‘value’ column.

60
Q

Out of the four types of NoSQL databases, _________________ is closest to the RDBMS. Both have a similar way of looking at data as rows and values.

A

column-oriented

61
Q

A____________ database is designed for data whose relations are well represented as a set of nodes with an undetermined number of connections between these nodes.

62
Q

: This embodies the ‘real world’ view of the enterprise being modeled in the database. It represents the current ‘best model’ or ‘way of doing business’ for the enterprise.

A

Conceptual

63
Q

: The various users of the database management system operate on subsets of the total enterprise model that are relevant to their particular needs. These subsets are represented as ‘external schemas’.

64
Q

: The ‘machine view’ of the data is described by the internal schema. This schema describes the stored representation of the enterprise’s information

65
Q

This section provides an overview of conceptual, logical, and physical data modeling.

A

Data Model

66
Q

A ____________ captures the high-level data requirements as a collection of related concepts. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.

A

conceptual data model

67
Q

A _____________________ is a detailed representation of data requirements, usually in support of a specific usage context, such as application requirements. Logical data models are still independent of any technology or specific implementation constraints.
* often begins as an extension of a conceptual data model

A

logical data model

68
Q

A is in many cases a fully-attributed perspective of the dimensional conceptual data model, as illustrated in Figure 49.

A

dimensional logical data model

69
Q

A ___________ represents a detailed technical solution, often using the logical data model as a starting point and then adapted to work within a set of hardware, software, and network tools. Physical data models are built for a particular technology.

A

physical data model (PDM)

70
Q

A variant of a physical scheme is a , used for data in motion between systems.
This model describes the structure of data being passed between systems as packets or messages. When sending data through web services, an Enterprise Service Bus (ESB), or through Enterprise Application Integration (EAI), the canonical model describes what data structure the sending service and any receiving services should use.

A

Canonical Model

71
Q

is a virtual table.
provide a means to look at data from one or many tables that contain or reference the actual attributes. A standard view runs SQL to retrieve data at the point when an attribute in the view is requested. An instantiated (often called ‘materialized’) view runs at a predetermined time. Views are used to simplify queries, control data access, and rename columns, without the redundancy and loss of referential integrity due to denormalization.

72
Q

refers to the process of splitting a table. It is performed to facilitate archiving and to improve retrieval performance

A

Partitioning

73
Q

Vertically vs. Horizontally split

A
  • Vertically split: To reduce query sets, create subset tables that contain subsets of columns.
    For example, split a customer table in two based on whether the fields are mostly static or mostly volatile (to improve load / index performance), or based on whether the fields are commonly or uncommonly included in queries (to improve table scan performance).
  • Horizontally split: To reduce query sets, create subset tables using the value of a column as the differentiator.
    For example, create regional customer tables that contain only customers in a specific region.
74
Q
  • is the deliberate transformation of normalized logical data model entities into physical tables with redundant or duplicate data structures. There are several reasons to denormalize data.
  • can also be used to enforce user security by segregating data into multiple views or copies of tables according to access needs. This process does introduce a risk of data errors due to duplication.
A

Denormalization

75
Q

In dimensional data modeling, is called collapsing or combining. If each dimension is collapsed into a single structure, the resulting data model is called a Star Schema (see Figure 51). If the dimensions are not collapsed, the resulting data model is called a Snowflake (See Figure 49).

A

denormalization

76
Q

is the process of applying rules in order to organize business complexity into stable data structures. The basic goal of normalization is to keep each attribute in only one place to eliminate redundancy and the inconsistencies that can result from redundancy.

A

Normalization

77
Q

: Ensures each entity has a valid primary key, and every attribute depends on the primary key; removes repeating groups, and ensures each attribute is atomic(not multi-valued). 1NF includes the resolution of many-to-many relationships with an additional entity often called an associative entity.

A

First normal form (1NF)

78
Q

: Ensures each entity has the minimal primary key and that every attribute depends on the complete primary key.

A

Second normal form (2NF)

79
Q

: Ensures each entity has no hidden primary keys and that each attribute depends on no attributes outside the key (“the key, the whole key and nothing but the key”).

A

Third normal form (3NF)

80
Q

: Resolves overlapping composite candidate keys.
* A candidate key is either a primary or an alternate key.
* ‘Composite’ means more than one (i.e., two or more attributes in an entity’s primary or alternate keys), and
* ‘overlapping’ means there are hidden business rules between the keys.

A

Boyce / Codd normal form (BCNF)

81
Q

: Resolves all many-to-many-to-many relationships (and beyond) in pairs until they cannot be broken down into any smaller pieces.

A

Fourth normal form (4NF)

82
Q

: Resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of primary keys.

A

Fifth normal form (5NF)

83
Q

is the removal of details in such a way as to broaden applicability to a wide class of situations while preserving the important properties and essential nature from concepts or subjects.

includes generalization and specialization.
* Generalization groups the common attributes and relationships of entities into super type entities, while specialization separates distinguishing attributes within an entity into subtype entities.
* This specialization is usually based on attribute values within an entity instance.

A

Abstraction

84
Q

is the concept of exposing only the required essential characteristics and behavior with respect to a context.

A

Abstraction