Database Architecture Flashcards

1
Q

Blank describes the components of a computer system and the relationships between components

A

Architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MySQL components are organized in four blank

A

Layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the four layers of MySQL components

A

Tools

The query processor

The storage engine

The file system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Blank interact directly with database users and administrators, and send queries to the query processor.

A

Tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The blank manages connections from multiple users and compiles queries into low-level instructions for the storage engine.

A

Query processor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The blank, also called a storage manager, executes instructions, manages indexes, and interacts with the file system. Some storage engines support database transactions, described elsewhere in this material.

A

Storage engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The blank accesses data on storage media. The file system contains both system and user data, such as log files, tables, and indexes.

A

File system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The Enterprise Edition includes MySQL Server and components for high-end commercial installations, such as what two things

A

Monitor
Audit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Blank collects and displays information on CPU, memory, and index utilization, as well as queries and results. Database administrators use Enterprise Monitor to manage and tune large databases with many users.

A

Monitor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Blank keeps track of all database changes. For each change, Audit tracks the time of change and who made the change. Audit supports government and business audit requirements for sensitive databases such as financial, medical, and defense.

A

Audit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The blank layer includes Connectors and APIs, Workbench, and utility programs.

A

Tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Blank and blank are groups of application programming interfaces, linking applications to the query processor layer.

A

Connectors and APIs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Blank are newer and developed by Oracle, which sponsors MySQL.

A

Connectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Blank are older and, with the exception of the C API, developed by other organizations.

A

APIs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Most programmers use blank but system programmers may write specialized utilities in C with the C API.

A

Connectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Blank is a desktop application to manage and use databases. Blank is designed for both database administrators and users.

A

Workbench

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Blank programs include approximately 30 tools, grouped in five categories.

A

Utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Name the five categories of utility programs

A

installation, client, administrative, developer, and miscellaneous tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Upgrade existing databases to a new MySQL release
Backup databases
Import data to databases
Inspect log files
Administer database servers

Are all things that the blank programs do.

A

utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The blank is a particularly important utility program, commonly used by both database administrators and users. The blank displays the mysql> prompt and processes individual SQL queries interactively.

A

Command-Line Client

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The query processor layer has two main functions. Name them.

A

manage connections and compile queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A blank is a link between tools and the query processor.

A

connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Each connection specifies a blank, blank, blank, and blank. The connection manager creates connections and manages communications between tools and the query parser.

A

database name, server address, logon name, and password

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Blank generates a query execution plan.

A

Query compilation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
An blank is a detailed, low-level sequence of steps that specify exactly how to process a query.
execution plan
26
The blank generates an execution plan in two steps.
query processor
27
The query processor generates an execution plan in two steps. Name them.
The query parser checks each query for syntax errors and converts valid queries to an internal representation. The query optimizer reads the internal representation, generates alternative execution plans, estimates execution times, and selects the fastest plan. Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary, described below.
28
For optimal performance, the query processor layer has a blank that stores reusable information in main memory.
cache manager
29
If data used in repeated queries does not change, the cache manager may also blank query results.
save
30
The storage engine layer has two main functions. Name them.
transaction management and data access.
31
Blank includes the concurrency system, recovery system, and lock manager. These components ensure all transactions are atomic, consistent, isolated, and durable,
Transaction management
32
The data access component communicates with the file system and translates table, column, and index reads into blank.
block addresses
33
To reduce data access time, the blank retains data blocks from the file system for possible reuse.
buffer manager
34
The data blocks are retained in an area of main memory called the blank.
buffer
35
The buffer manager is similar to the blank of the query processor layer.
cache manager
36
The buffer manager has a fixed amount of blank.
memory
37
As the database processes queries and reads blocks, an blank determines which blocks to retain and which to discard.
algorithm
38
The InnoDB buffer manager uses a least recently used or blank, which tracks the time each block was last accessed and, when space is needed, discards 'stale' blocks. If data in a block has been updated, discarded blocks are first saved on disk.
LRU algorithm
39
The database administrator can assign a different blank to each table in a database. InnoDB is the default and most commonly used storage engine.
storage engine
40
The blank layer consists of data stored on storage media and organized in files.
file system
41
The file system contains three types of data for each database. Name them
user data, log files, and a data dictionary.
42
Blank data includes tables and indexes.
User
43
Blank files contain a detailed, sequential record of each change applied to a database.
Log
44
The recovery system uses blank to restore data in the event of a transaction, system, or storage media failure.
log files
45
A blank, also known as a data dictionary, is a directory of tables, columns, keys, indexes, and other objects in a relational database.
catalog
46
All relational databases contain a blank. Query processors and storage managers use this information when queries are processed and executed.
catalog
47
MySQL uses the term blank instead of catalog
'data dictionary'
48
Data dictionary tables cannot be accessed directly with SELECT, INSERT, UPDATE, and DELETE queries. However, the table contents can be accessed indirectly. The blank query is compiled as a SELECT query against dictionary tables.
SHOW
49
Multiple computers linked by a network are often grouped in layers, called blank, and arranged in a hierarchy.
tiers
50
Prior to 1990, most software ran in a blank, consisting of a personal or corporate computer connected directly to monitors. Although computers often communicated with each other, the dependencies between applications running on different computers were limited.
single-tier architecture
51
Since 1990, complex corporate and government applications have increasingly been implemented in a blank.
multi-tier architecture
52
In a multi-tier architecture, the top tier consists of computers interacting directly with blank
end-users
53
In a multi-tier architecture, the bottom tier consists of blank managing resources like databases and email.
servers
54
In a multi-tier architecture, one or more blank execute a variety of functions, such as user authorization, business logic, and communication with other computers.
middle tiers
55
Typically, blank run on a middle tier and implement business logic.
application programs
56
Since user interaction and data are managed in the top and bottom tiers, applications are easier to write and maintain in a blank.
multi-tier architecture
57
Blank is a multi-tier architecture consisting of web browsers and web servers communicating over the internet
Web architecture
58
In a web architecture, blank on the top tier, manage user interaction.
web browsers
59
In a web architecture, blank, on a middle tier, generate web pages for display on web browsers and transmit user requests to services running on lower tiers.
Web servers
60
In a web architecture, blank run application software, process user requests, and communicate with databases and other services.
Application servers
61
In a web architecture, blank, such as database and authentication, comprise the bottom tier.
Services
62
The term blank refers to either a software or hardware layer
tier
63
Prior to 2000, most commercial software was blank, or installed and run on customer computers.
on-premise
64
Since 2000, blank have increasingly replaced on-premise software. With blank, a vendor such as Amazon, Microsoft, or Google implements computer services on lower tiers of a web architecture. For a fee, blank are made available over the internet to customers.
cloud services
65
Cloud services fall into three broad categories. Name them.
Infrastructure-as-a-service, or IaaS Platform-as-a-service, or PaaS Software-as-a-service, or SaaS
66
Blank provides computer processing, memory, and storage media, as if the customer were renting a computer. Ex: Elastic Compute Cloud, or EC2, from Amazon Web Services offers it
Infrastructure-as-a-service, or IaaS,
67
Blank provides tools and services, such as databases, application development tools, and messaging services. Ex: Azure is Microsoft's cloud services environment, offering the SQL Database service.
Platform-as-a-service, or PaaS
68
Blank provides complete applications, usually through web browsers on customer machines. Ex: Salesforce offers sales management software, and Google offers document processing applications like Docs, Sheets, and Pages.
Software-as-a-service, or SaaS
69
Usually blank are offered on virtual machines.
cloud services
70
A blank, is a software layer that emulates a complete, independent computing environment.
virtual machine, or VM
71
Multiple virtual machines can run on blank, enabling cloud providers to support many customers on the same machine.
one computer
72
A cloud database is a database offered as a blank cloud service. Most databases are now available either on-premise or as a cloud service, but cloud database use is growing rapidly.
PaaS
73
A blank is a statement or proposition, from which another statement is inferred.
premise
74
Premises refers to buildings and land occupied by a business. Thus, blank is technically correct in the context of cloud software. However on-premise is easier to say and therefore commonly used.
on-premises
75
Cloud databases have a number of compelling benefits. Name them (5)
Administration. Security. Reliability. Elasticity. Capital cost.
76
Installing, managing, upgrading, and backing up database systems is time-consuming and complex. With cloud databases, consumers delegate blank to cloud providers.
administrative activities
77
Cloud providers are large companies with extensive resources. Cloud providers can invest heavily in blank providing better than most cloud customers.
security professionals and infrastructure
78
Cloud providers provide blank computing systems with little or no down-time.
redundant
79
Many organizations struggle with daily, monthly, or seasonal fluctuations in processing workload. By averaging fluctuations over many customers, cloud providers provide blank on demand.
flexible database resources
80
Cloud providers absorb all initial, or capital, costs of computers and facilities. Blank is recovered by cloud service fees.
Capital cost
81
Cloud databases raise blank. Companies entrust data to cloud providers, which may store data on servers located in countries with different privacy regulations.
data privacy questions
82
In the United States, data privacy is governed by blank in specific areas, such as medical and financial. As a result, a European company may avoid servers located in the United States.
limited regulations
83
Data privacy is a concern primarily for blank, such as financial and medical applications. For organizations that do not manage blank, cloud databases offer convincing benefits and have been widely adopted.
sensitive data
84
A blank consists of multiple processors managed by a single operating system instance.
parallel computer
85
Parallel computers achieve faster processing speeds by processing multiple instructions blank.
concurrently
86
Parallel computers fall into three categories. Name them.
A shared memory computer A shared storage computer A shared nothing computer
87
In a blank, processors share the same memory and storage media.
shared memory computer
88
In a blank, processors share storage media only. Each processor has private memory.
shared storage computer
89
In a blank, processors share neither memory nor storage media.
shared nothing computer
90
Blank memory is optimal for parallel processing against a common data set in a single memory space.
Shared
91
Blank and blank scale to more processors, since processors do not contend for the same memory.
Shared storage and shared nothing
92
Multiple computers can communicate via a blank or blank network
local or wide area
93
A blank consists of cables extending over a small area, typically within one facility.
local area network
94
Local area networks usually use the blank protocol.
Ethernet communication
95
A blank spans multiple facilities in different geographic locations, separated by many miles.
wide area network
96
Wide area networks may communicate via cables, satellite, or telephone lines, often using blank protocols.
internet communication
97
A blank is one of a group of computers connected by either a local or wide area network.
node
98
A blank is a group of nodes connected by a local area network, managed by separate operating system instances, and coordinated by specialized cluster management software.
cluster
99
A cluster is similar to a blank. Both can execute program instructions in parallel on multiple processors. Both can share storage or share nothing. Computers in a cluster cannot share memory, however, since local area networks are too slow to support memory access.
parallel computer
100
Blank can often be decomposed into parts that run concurrently and execute faster on parallel computers or clusters.
Queries
101
Parallel and distributed databases exploit blank for faster query execution
multiple processors
102
A blank runs on a parallel computer or cluster.
parallel database
103
A blank runs on multiple computers connected by a wide area network.
distributed database
104
Both parallel and distributed databases present a blank view of data to database users and programmers. The physical location of data on storage media is visible to database administrators only.
unified
105
In a parallel database, data location has blank on query processing since local area networks are relatively fast and reliable.
limited impact
106
In a distributed database, data location is blank since wide area networks are relatively slow and unreliable. Wide area networks create technical challenges with distributed transaction
significant
107
Despite technical challenges, blank offer compelling benefits for databases with users in many locations
distributed databases
108
A blank updates data on multiple nodes of a distributed database.
distributed transaction
109
In a distributed transaction, either blank or blank must be successfully updated.
all nodes or no nodes
110
Databases commonly implement distributed transactions with a technique called blank.
two-phase commit
111
The two-phase commit has four steps. Name them.
In phase 1, a central transaction coordinator notifies all participating nodes of the required updates. Participating nodes receive the notification, store the update in a local log, and send a confirmation message to the transaction coordinator. Participating nodes do not yet commit the update to the database. Phase 2 begins when the transaction coordinator receives confirmation from all participating nodes. The transaction coordinator now instructs all nodes to commit. Participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.
112
The two-phase commit must account for what two failure scenarios:
In step 2, if the transaction coordinator does not receive confirmation from all nodes within a fixed time period, the transaction coordinator instructs participating nodes to roll back the update. In step 4, if a node becomes unavailable and fails to notify the transaction coordinator of success, the transaction coordinator resends the commit message until the node responds.
113
The two-phase commit ensures updates are applied to either all nodes or no nodes. In the first failure scenario, the transaction blank, and no updates are applied. In the second failure scenario, the transaction blank, and all updates are applied.
rolls back commits
114
Two-phase commit and two-phase locking are different procedures. Two-phase commit blank and blank at the end of distributed transactions only. Two-phase locking, governs blank and blank of locks during either local or distributed transactions.
governs commit and rollback acquisition and release
115
A blank updates data on a single node of a distributed database.
local transaction
116
Distributed transactions are relatively blank, as multiple nodes must respond before the transaction commits. As a faster alternative, multiple nodes can be updated blank with local transactions
slow independently
117
In a local transaction, the blank notifies participating nodes of required updates. Nodes blank and confirm with the transaction coordinator. If a node is unavailable, the transaction coordinator blank until confirmation is received.
transaction coordinator commit immediately repeats the update message
118
Local transactions create blank, as nodes are updated at different times.
temporary inconsistency
119
The choice of local or distributed transactions depends on blank and blank requirements
performance and consistency
120
In many financial databases, all nodes must be consistent at all times. Blank are necessary.
Distributed transactions
121
In databases that log website activity, temporary inconsistency may be acceptable. Blank might be used to process updates quickly and support a high volume of web clicks.
Local transactions
122
Updates in a distributed transaction are blank, since the updates occur at the same time from the perspective of the database user.
synchronous
123
Updates in separate local transactions are blank.
asynchronous
124
Databases that use local rather than distributed transactions are called blank.
eventually consistent
125
A blank database conforms to all rules at all times.
consistent
126
In a distributed database, a blank may govern data on multiple nodes.
rule
127
In an blank database, 'live' nodes must respond to queries at all times.
available
128
A 'dead' node may be blank, but 'live' nodes must respond regardless of the state of other nodes.
unresponsive
129
A blank forms when a network error prevents nodes from communicating.
network partition
130
A blank occasionally experiences network partitions since nodes are connected by wide area networks that occasionally fail.
distributed database
131
A blank database continues to function when a network partition occurs.
partition-tolerant
132
The blank states that a distributed database cannot simultaneously be Consistent, Available, and Partition-tolerant.
CAP theorem
133
A distributed database can guarantee blank, but blank, of the CAP properties.
any two, but not all three
134
As a practical matter, most distributed databases must always function and are therefore partition-tolerant. Consequently, most distributed databases guarantee either blank or blank, but not both.
consistency or availability
135
The blank between consistency and availability is relative, not absolute.
tradeoff
136
Since wide area networks are relatively slow, the time to propagate an update from one node to another is significant. If a query accesses updated data before all nodes are updated, the database must either return blank or blank.
inconsistent data or not respond immediately
137
Rather than choose between consistency and availability, a database must choose blank to provide a consistent response.
how long to wait
138
Blank commonly means the percentage of time a database is responsive to users and programs. In the context of the CAP theorem, however, blank is the response of individual nodes rather than the entire database system.
Availability
139
In the context of networks, a blank is a subset of nodes. In the context of data storage, a blank is a subset of table data.
partition
140
A blank is a copy of an entire database, a table, or a subset of table data.
replica
141
A blank maintains two or more replicas on separate storage devices
replicated database
142
Data can be replicated in any database with multiple storage devices, such as blank and blank.
parallel and distributed databases
143
Replicated databases have several major advantages. Name three.
High availability. Fast concurrent reads. Local reads.
144
If one storage device fails, the database blank to a replica on another storage device.
routes queries
145
In general, if a database maintains N replicas, the database can survive simultaneous failure of blank
N-1 storage devices.
146
Concurrent queries can read separate replicas without blank with each other. One large query can be blank into smaller queries that read separate replicas in parallel.
interfering decomposed
147
In a distributed database, reads can be executed blank, eliminating network delays and outages.
locally
148
Replicated databases have one major disadvantage. Name it
Slow or inconsistent updates
149
Updates must be applied to all replicas on blank. If all replicas on different nodes are updated with a blank, the update is relatively slow. If replicas on different nodes are updated with blank, updates are relatively fast but replicas are temporarily inconsistent.
multiple storage devices distributed transaction local transactions
150
Blank simplifies some database administration activities but makes others more complex.
Replication
151
One replica can be blank while transactions execute against other replicas.
backed up
152
Blank can be restricted to one replica, accessible only to trusted database users. Updates are propagated to blank, accessible to a broader user group.
Updates read-only replicas
153
In a replica database, database administrators must determine how to blank updates across replicas.
propagate
154
Blank is commonly used in parallel and distributed databases, particularly when reads are frequent, updates are infrequent, and temporary inconsistency is acceptable.
Replication
155
Updating replicated data in a database running on a single node is blank.
straightforward
156
Some storage devices, called blank, manage replicas internally, without database intervention. Alternatively, the database can update all replicas within a single local transaction. Either way, synchronizing replicas does not require special database capabilities.
storage arrays
157
Updating all replicas in a blank guarantees consistency but is relatively slow and fails when any replica is unavailable.
distributed transaction
158
The blank technique designates one node as primary. All updates are first applied to the primary node in local transactions. Secondary nodes are updated after the primary node commits, with independent local transactions. If the primary node fails, the database automatically designates a new primary node to ensure continued availability.
primary/secondary
159
The blank technique applies updates to any node in a group. Prior to committing, a node broadcasts transaction information to other nodes, which look for conflicts with concurrent transactions. If any node detects a conflict, an algorithm determines which transaction commits and which rolls back. This algorithm may be simple, such as the transaction that commits first wins, or complex. If a network partition occurs and nodes cannot communicate, processing is temporarily suspended.
group replication
160
Blank update technique is eventually consistent, not partition-tolerant.
Group replication
161
Blank update technique is always consistent, not partition-tolerant
Distributed transaction
162
Blank update is eventually consistent, partition-tolerant.
Primary/Secondary
163
A blank is a directory of information describing database objects such as tables, columns, keys, and indexes
Catalog
164
Blank is necessary to process queries and access data. Each node in a distributed database can process queries and therefore requires access to the catalog.
Catalog information
165
In a distributed database, the catalog can be structured in what two ways
In a central catalog In a replicated catalog
166
In a blank, the entire catalog resides on a single node. Storing the catalog on a single node is relatively easy to manage. However, query processing at remote nodes must access the catalog via a wide area network, which may be slow or unreliable. Furthermore, query processing at all nodes interact with the blank, which may become a bottleneck.
central catalog
167
In a blank a copy of the catalog resides on each node. Most queries are fast and reliable since all catalog data is available locally. However, statements that update the catalog, such as CREATE, ALTER, and DROP, must update all replicas. Updating replicas generates increased network traffic and, if executed in a distributed transaction, fails when any replica is unavailable.
replicated catalog,
168
Since catalog updates are infrequent compared to other database queries, many distributed databases use a blank.
replicated catalog
169
To improve performance of catalog updates, many databases use a variation of the blank
primary/secondary technique
170
Updates are first applied to the replica on the blank containing the affected data object and then propagated to other replicas.
node
171
When catalog replicas are updated with blank, some replicas are momentarily out of date. If a query cannot be processed due to an out-of-date replica, the database might display an error and advise the user to resubmit the query. This rarely occurs, however, since catalog updates are infrequent and the delay between replica updates is short.
local transactions
172
Organizations use blank to conduct daily business functions.
operational data
173
Organizations use blank to understand, manage, and plan the business.
analytic data
174
Analytic data is sometimes called blank or blank.
reporting data or decision support data
175
Operational and analytic data differ in what four ways
Volatility. Detail. Scope. History.
176
Blank changes in real time as business functions are executed. Blank is updated at fixed intervals, often daily or weekly, so that reports and summaries always refer to a known time.
Operational data Analytic data
177
Most blank is detailed, reflecting individual transactions. Blank is often summarized by time period, business unit, geography, and other business dimensions.
operational data Analytic data
178
Most blank are designed for a specific business function. Consequently, operational databases supporting different business functions are often incompatible. Blank combine data from many business functions in an integrated, enterprise-wide view of data, with standard formats, data types, and keys across all tables.
operational databases Analytic databases
179
Many blank are concerned primarily with current data. Blank often track trends over time and therefore usually contain current and historic data. Ex: Operational data may include active employees only. Analytic data may include past employees and illustrate changes in total employment by month.
operational databases Analytic databases
180
Blank and blank are often maintained in separate databases with different designs.
operational and analytic data
181
Operational data is blank and blank
Volatile and detailed
182
Analytic detail is blank, blank, blank and blank
Detailed, Summary, Enterprise-wide, and Historical
183
In a blank database, a database stores each line item of an order, including item name, product code, cost, and quantity.
Detailed
184
A blank database, the database records the current project assignments for each employee. Updates are made right away when assignments change. The database does not retain prior assignments.
Volatile
185
In a blank database, each year, employees receive raises for performance and increased cost of living. The database retains salary data for every year employees have worked at the company.
historic
186
in a blank database, the database stores total count of employees working at each corporate facility.
summary
187
In a blank database, A report lists employee names alongside employee office numbers. Employee names are extracted from a human resources database. Employee office assignments are stored in a separate facilities database.
Enterprise-wide
188
Storing operational and analytic data in the same database creates what three problems:
Database design. Interference. Reference time.
189
Since blank is volatile, operational databases are typically optimized for updates, with most tables in third normal form. Third normal form minimizes redundancy but generates many tables and is not optimal for blank. Analytic queries often combine columns from many third normal form tables, resulting in complex joins that are difficult to write and slow to run.
operational data analytic queries
190
Blank often summarize large volumes of data. When executed against an operational database, analytic queries compete with blank, degrade query response time, and interfere with business operations.
Analytic queries operational queries
191
Blank usually reference a specific point in time, such as sales totals as of midnight on the last day of the month. Since operational data is volatile, results depend on the blank a query is submitted. Analytic queries against operational databases thus have an uncertain reference time and may be misleading.
Analytic queries precise time
192
A blank is a separate database optimized for analytics rather than operations.
data warehouse
193
A data warehouse consists of data extracted from blank and restructured to support analytic queries.
operational databases
194
Data is usually extracted periodically, at a fixed time, so that data in the warehouse has a known blank. Data is extracted during times of low database use to minimize impact on operational queries.
reference time
195
Data warehouses integrate data from blank for use by the entire organization.
multiple business functions
196
A blank is a data warehouse designed for a specific business area, such as sales, human resources, or product development.
data mart
197
Since data marts have blank than a data warehouse, data marts are easier to build and maintain. A data mart can be derived directly from operational databases or indirectly from a data warehouse.
smaller scope
198
Data warehouses are refreshed periodically with a five-step process. Name the steps
Extract data from operational databases Cleanse data Integrate data into a uniform structure. Restructure data Load data to the data warehouse.
199
Extract data from operational databases into a temporary database, called a blank. Since the data warehouse already contains data from the prior period, only data that has changed since the prior period is extracted.
'staging area'
200
The five-step process is commonly referred to as the blank, or ETL, process.
extract-transform-load
201
Since the ETL process is time-consuming and difficult to automate, many organizations use special software products, called blank, to minimize programming.
ETL tools
202
Blank from Informatica is a high-end ETL product intended to manage large extracts for complex organizations.
PowerCenter
203
Blank from Microsoft is designed for SQL Server data warehouses.
SQL Server Integration Services
204
Blank supports many data sources but is optimized to load Oracle database products.
Oracle Data Integrator
205
Reading data from operational systems and writing the data to a temporary database is called blank.
extraction
206
Correcting errors in operational data and converting to a standard format is called blank.
'cleansing'
207
Data in different operational systems often have incompatible or missing keys. Creating uniform primary and foreign keys is one example of blank.
data integration
208
Converting data from a design optimized for operations to a design optimized for analytics is called blank.
restructuring
209
Data warehouses are periodically blanked with cleansed, integrated, and restructured data from the temporary database.
loaded
210
To simplify analytic queries, data warehouses commonly use a blank.
dimensional design
211
A dimensional design, also called a blank, consists of fact and dimension tables:
star schema
212
A blank contains numeric data used to measure business performance, such as sales revenue or number of employees.
fact table
213
Each row in a fact table consists of numeric fact columns and foreign keys that reference blank.
dimension tables
214
A blank contains textual data that describes the fact data, such as product line, organizational unit, and geographical region.
dimension table
215
The blank of a fact table is the composite of all foreign keys referencing dimension tables.
primary key
216
The primary key of a blank is a small, meaningless integer
dimension table
217
Most data warehouses have many blanks
fact tables
218
A blank is a sequence of columns in which each column has a one-many relationship to the next column.
dimension hierarchy
219
A dimension table usually contains one or more column blank.
hierarchies
220
In some cases, several columns are at the same blank of a hierarchy
level
221
Blank usually summarize data at one level of one hierarchy from each dimension.
Analytic queries
222
For fast execution, frequently used summary data may be computed in advance and stored in a blank.
data warehouse
223
Since data warehouses track historical data, dimensional designs usually have blank and blank dimension tables.
date and time
224
Each row of the blank dimension table corresponds to a day
date
225
Each row of the blank dimension table corresponds to a minute of the day.
time
226
The time dimension contains blank rows (24 hours × 60 minutes per hour).
1,440
227
Fact tables contain blank referencing date, time, or both dimensions, to establish the time of a fact.
foreign keys
228
The blank and blank dimensions provide an elegant way to track historical data.
date and time
229
Foreign keys StartDateID and EndDateID are added to the fact table and indicate the blank of each row. Current rows have an end date in the distant future, such as December 31, 2999.
effective dates
230
f a fact changes to a new value on date X, the end date of the current row is set to X and a new row is inserted with what values
The fact column is the new value. StartDateID refers to date X. EndDateID refers to December 31, 2999. Other columns are identical to the prior row.
231
Adding start and end foreign keys to the fact table is called blank. Historical data can be tracked with other designs, but type 2 design is simple, effective, and commonly used.
type 2 design for slowly changing dimensions
232
A straightforward approach for considering the value of a BI program suggests at least what four dimensions of value?
Financial value Productivity value Trust value Risk value
233
A blank is the primary source of information that feeds the analytical processing within an organization. There are a number of different analytic applications that are driven by business needs, yet most, if not all of these applications are driven by the data that has been migrated into a data warehouse.
data warehouse
234
A data warehouse is a centralized repository of blank.
information
235
A data warehouse is organized around the relevant blank important to the organization.
subject areas
236
A data warehouse provides a platform for different blank (both human and automated) to submit queries about enterprise information.
consumers
237
A data warehouse is used for analysis and not for blank. n The data in a data warehouse is nonvolatile.
transaction processing
238
A data warehouse is the target location for blank from multiple sources, both internal and external to an enterprise.
integrating data
239
A blank is a subject-oriented data repository, similar in structure to the enterprise data warehouse, but it holds the data necessary for the decision support and BI needs of a specific department or group within the organization.
data mart
240
Blank is different from the typical operational or transaction processing systems. There are many proposed definitions of OLAP, most of which describe what OLAP is used for. The most frequently used terms are ‘‘multidimensional’’ and ‘‘slice-and-dice.’’
Online analytical processing
241
Online analytical processing tools provide a means for presenting data sourced from a data warehouse or data mart in a way that allows the data consumer to view blank across multiple dimensions. In addition, these metrics are summarized in a way that allows the data consumer blank (which means to expose greater detail) on any particular aspect of the set of facts.
comparative metrics to drill down
242
The data to be analyzed in an OLAP environment are arranged in a way that enables visibility along any of the dimensions. Usually this is described as a blank, although the organization is intended to allow the analyst to fix some set of dimensions and then see aggregates associated with the other dimensional hierarchies
cube
243
The value of an OLAP tool is derived from the ability to quickly analyze the data from blank, and so OLAP tools are designed to pre-calculate the aggregations and store them directly in the OLAP databases.
multiple points of view
244
Data integration is not just limited to extracting data sets from internal sources and loading them into a data warehouse, but focuses on effectively facilitating the blank. Data integration goes beyond ETL, data replication, and change data capture, although these remain key components of the integration fabric.
delivery of information to the right places within the appropriate time
245
A basic concept for populating a data warehouse is that data sets from multiple sources are collected and then added to a data repository from which blank can source their input data.
analytical applications
246
What is the general steps of an ETL process. 7 steps
Get the data from the source location. Map the data from its original form into a data model that is suitable for manipulation at the staging area. Validate and clean the data. Apply any transformations to the data that are required before the data sets are loaded into the repository. Map the data from its staging area model to its loading model. Move the data set to the repository. Load the data into the warehouse.
247
The first part of the ETL process is to assemble the infrastructure needed for aggregating the raw data sets and for the application of the transformation and the subsequent preparation of the data to be forwarded to the data warehouse. This is typically a combination of a hardware platform and appropriate management software that we refer to as the blank.
staging area
248
What data is to be extracted essentially relies on what the BI clients expect to see ultimately factored into their analytical applications, and will have been identified as a result of the blank.
data requirements analysis process
249
How data should be extracted may depend on the scale of the project, the number (and disparity) of data sources, and how far into the implementation the developers are. Extraction can be as simple as a blank, the use of blank that connect to different originating sources, yet can be as complex as to require blank written in a proprietary programming language.
collection of simple SQL queries adapters specially designed programs
250
Blank can be used as a way to capture the metadata of a data set
Data profiling
251
Blank includes parsing strings representing integer and numeric values and transforming them into the proper representational form for the target machine, and converting physical value representations from one platform to another (EBCDIC to ASCII being the best example).
Data type conversion
252
Blank are rules we can uncover through the profiling process can be applied, along with directed actions that can be used to correct data that is known to be incorrect and where the corrections can be automated. This component also covers data-duplicate analysis and elimination and merge/purge.
Data cleansing
253
Blank includes exploiting the discovery of table and foreign keys for representing linkage between different tables, along with the generation of alternate (i.e., artificial) keys that are independent of any systemic business rules, mapping keys from one system to another, archiving data domains and codes that are mapped into those data domains, and maintaining the metadata (including full descriptions of code values and master key-lookup tables).
Integration
254
Blank in relation to the foreign key relationships exposed through profiling or as documented through interaction with subject matter experts, this component checks that any referential integrity constraints are not violated and highlights any nonunique (supposed) key fields and any detected orphan foreign keys.
Referential integrity checking
255
Blank are any transformations based on business rules, new calculations, string manipulations, and such that need to be applied as the data moves from source to target are applied during the transformation stage. For example, a new ‘‘revenue’’ field might be constructed and populated as a function of ‘‘unit price’’ and ‘‘quantity sold.’’
Derivations
256
Blank and blank are frequently data that is in normalized form when it comes from the source system needs to be broken out into a denormalized form when dimensions are created in repository data tables. Conversely, data sourced from join extractions may be denormalized and may need to be renormalized before it is forwarded to the warehouse.
Denormalization and renormalization.
257
Blank is any aggregate information that is used for populating summaries or any cube dimensions can be performed at the staging area.
Aggregation
258
Blank is as a matter of reference for integrity checking, it is always useful to calculate some auditing information, such as row counts, table counts, column counts, and other tests, to make sure that what you have is what you wanted. In addition, some data augmentation can be done to attach provenance information, including source, time and date of extraction, and time and date of transformation.
Audit information.
259
Blank is because nulls can appear in different forms, ranging from system nulls to explicit strings representing different kinds of nulls, it is useful to have some kind of null conversion that transforms different nulls from disparate systems.
Null conversion
260
261
The blank component of ETL is centered on moving the transformed data into the data warehouse.
Loading
262
The loading component of ETL has what two critical issues
Target dependencies Refresh volume and frequency
263
The blank compares pairs of records taken from different data sets to determine if they represent the same entity and are therefore candidates for merging
Merge/purge operation
264
The more data sets that are being blank, the greater the amount of work that needs to be done for the integration to complete
Integrated
265
Blank refers to the process of discovering patterns that lead to actionable knowledge from large data sets through one or more traditional data mining techniques, such as market basket analysis and clustering
Knowledge discovery
266
Data Mining Techniques is the process of mining data can be described as a blank
virtuous cycle.
267
Blank is the task of taking a large collection of entities and dividing that collection into smaller groups of entities that exhibit some similarity
Clustering
268
Blank is the process of organizing data into predefined classes
Classification
269
Blank is a process of assigning some continuously valued numeric value to an object.
Estimation
270
The subtle difference between blank and the previous two tasks is that blank is the attempt to classify objects according to some expected future behavior.
Prediction
271
Blank is a process of evaluating relationships or associations between data elements that demonstrate some kind of affinity between objects.
Affinity grouping
272
The last of the tasks is blank, which is the process of trying to characterize what has been discovered or trying to explain the results of the data mining process.
Description
273
An blank is a database that stores data in main memory, instead of or in addition to storage media
in-memory database
274
Main memory is much faster than storage media, such as flash memory and disk drives. Consequently, in-memory databases are appropriate for blank, which require fast execution of lengthy queries.
Analytic applications
275
In-memory databases are also appropriate for applications that rapidly insert blank, such as data collection from internet devices.
High volume of data
276
In-memory databases can now store blank of data, which is adequate for many databases.
Terabytes
277
Main memory is volatile and lost when power fails or the database process crashes, so in-memory data is periodically backed up on blank. In-memory databases may also record insert, update, and delete operations in a log file on blank, which can be used to reconstruct databases in the event of a crash.
Storage media
278
Blank is an extension to SQL Server supporting in-memory tables. In-memory tables offer the same transaction and recovery options as storage media tables.
SQL Server In-Memory OLTP
279
Blank creates in-memory copies of tables. The table source data remains on storage media, grouped by rows in blocks. In-memory copies are physically organized by column, rather than by row. The memory's columnar organization is optimal for analytic queries, which often summarize large volumes of data from one or two columns.
Oracle Database In-Memory
280
MySQL assigns a specific storage engine to individual tables. Both the blank and blank storage engines support in-memory tables. MEMORY does not support transactions or recovery in the event of a failure, and consequently is appropriate for temporary tables only. NDB Cluster supports transactions, recovery, and distributed data, and is recommended for persistent data.
MEMORY and MySQL NDB Cluster
281
Blank either run under different database systems or have incompatible schema.
Heterogeneous databases
282
Databases with blank might have inconsistent primary and foreign keys, similar tables with different designs, or similar columns with different names and data types
incompatible schema
283
The coordinating software layer is called blank since the software lies between application programs and database software.
Middleware
284
A blank is a directory of participating database objects, such as tables, columns, and indexes.
A global catalog
285
A blank decomposes a federated query into queries for each participating database.
A global query processor
286
A blank converts the decomposed queries to the appropriate syntax for each participating database.
Database wrapper
287
Some products support blank or SQL/MED, an extension of the SQL standard for federated databases. SQL/MED adds constructs such as nicknames and user mappings to SQL.
SQL/Management of External Data,
288
A blank is a federated database name for a participating database object, such as tables and columns.
Nickname
289
A blank associates a federated database user with a participating database user.
User mapping
290
Give the syntax for nickname and user mapping
CREATE NICKNAME Employee FOR DB2SERVER.HRdatabase.Emp2table; CREATE USER MAPPING FOR SamSnead SERVER DB2SERVER OPTIONS (REMOTE_AUTHID 'sam.snead@gmail.com', REMOTE_PASSWORD 'X!8sflHn');
291
Although a federated database does not provide a seamless view of data, a federated database is relatively easy to build and often the only practical way to blank from existing, incompatible databases.
Combine data
292
A blank is an analytic database of raw, unprocessed data copied from multiple data sources.
Data lake
293
Data lakes share some characteristics of blank and some characteristics of blank
data warehouses and federated databases
294
Like a blank, a data lake is a separate database designed for analytic queries and consisting of data extracted from multiple source systems.
Data warehouse
295
Like a blank, data in a data lake is not cleansed, integrated, or restructured. Data is stored in the original format and structure. Depending on the data source, data may be loaded continuously rather than at fixed intervals.
Federated database
296
Data lakes often contain blank of data, such as sensor data or website clicks. Data lakes also contain blank, such as images, video, and text documents, which consume megabytes or gigabytes per data item. As a result, data lakes usually require a large amount of storage and utilize inexpensive, but relatively slow, storage media.
Large volumes Unstructured data
297
Data lakes are more suitable for blank, who are trained to work with complex, unstructured data, than for business analysts.
Data scientists