Database Architecture Flashcards by Joshua Lane

Blank describes the components of a computer system and the relationships between components

Architecture

How well did you know this?

Not at all

Perfectly

MySQL components are organized in four blank

Layers

How well did you know this?

Not at all

Perfectly

Name the four layers of MySQL components

Tools

The query processor

The storage engine

The file system

How well did you know this?

Not at all

Perfectly

Blank interact directly with database users and administrators, and send queries to the query processor.

Tools

How well did you know this?

Not at all

Perfectly

The blank manages connections from multiple users and compiles queries into low-level instructions for the storage engine.

Query processor

How well did you know this?

Not at all

Perfectly

The blank, also called a storage manager, executes instructions, manages indexes, and interacts with the file system. Some storage engines support database transactions, described elsewhere in this material.

Storage engine

How well did you know this?

Not at all

Perfectly

The blank accesses data on storage media. The file system contains both system and user data, such as log files, tables, and indexes.

File system

How well did you know this?

Not at all

Perfectly

The Enterprise Edition includes MySQL Server and components for high-end commercial installations, such as what two things

Monitor
Audit

How well did you know this?

Not at all

Perfectly

Blank collects and displays information on CPU, memory, and index utilization, as well as queries and results. Database administrators use Enterprise Monitor to manage and tune large databases with many users.

Monitor

How well did you know this?

Not at all

Perfectly

Blank keeps track of all database changes. For each change, Audit tracks the time of change and who made the change. Audit supports government and business audit requirements for sensitive databases such as financial, medical, and defense.

Audit

How well did you know this?

Not at all

Perfectly

The blank layer includes Connectors and APIs, Workbench, and utility programs.

Tools

How well did you know this?

Not at all

Perfectly

Blank and blank are groups of application programming interfaces, linking applications to the query processor layer.

Connectors and APIs

How well did you know this?

Not at all

Perfectly

Blank are newer and developed by Oracle, which sponsors MySQL.

Connectors

How well did you know this?

Not at all

Perfectly

Blank are older and, with the exception of the C API, developed by other organizations.

APIs

How well did you know this?

Not at all

Perfectly

Most programmers use blank but system programmers may write specialized utilities in C with the C API.

Connectors

How well did you know this?

Not at all

Perfectly

Blank is a desktop application to manage and use databases. Blank is designed for both database administrators and users.

Workbench

How well did you know this?

Not at all

Perfectly

Blank programs include approximately 30 tools, grouped in five categories.

Utility

How well did you know this?

Not at all

Perfectly

Name the five categories of utility programs

installation, client, administrative, developer, and miscellaneous tools.

How well did you know this?

Not at all

Perfectly

Upgrade existing databases to a new MySQL release
Backup databases
Import data to databases
Inspect log files
Administer database servers

Are all things that the blank programs do.

utility

How well did you know this?

Not at all

Perfectly

The blank is a particularly important utility program, commonly used by both database administrators and users. The blank displays the mysql> prompt and processes individual SQL queries interactively.

Command-Line Client

How well did you know this?

Not at all

Perfectly

The query processor layer has two main functions. Name them.

manage connections and compile queries.

How well did you know this?

Not at all

Perfectly

A blank is a link between tools and the query processor.

connection

How well did you know this?

Not at all

Perfectly

Each connection specifies a blank, blank, blank, and blank. The connection manager creates connections and manages communications between tools and the query parser.

database name, server address, logon name, and password

How well did you know this?

Not at all

Perfectly

Blank generates a query execution plan.

Query compilation

How well did you know this?

Not at all

Perfectly

An blank is a detailed, low-level sequence of steps that specify exactly how to process a query.

execution plan

The blank generates an execution plan in two steps.

query processor

The query processor generates an execution plan in two steps. Name them.

The query parser checks each query for syntax errors and converts valid queries to an internal representation. The query optimizer reads the internal representation, generates alternative execution plans, estimates execution times, and selects the fastest plan. Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary, described below.

For optimal performance, the query processor layer has a blank that stores reusable information in main memory.

cache manager

If data used in repeated queries does not change, the cache manager may also blank query results.

save

The storage engine layer has two main functions. Name them.

transaction management and data access.

Blank includes the concurrency system, recovery system, and lock manager. These components ensure all transactions are atomic, consistent, isolated, and durable,

Transaction management

The data access component communicates with the file system and translates table, column, and index reads into blank.

block addresses

To reduce data access time, the blank retains data blocks from the file system for possible reuse.

buffer manager

The data blocks are retained in an area of main memory called the blank.

buffer

The buffer manager is similar to the blank of the query processor layer.

cache manager

The buffer manager has a fixed amount of blank.

memory

As the database processes queries and reads blocks, an blank determines which blocks to retain and which to discard.

algorithm

The InnoDB buffer manager uses a least recently used or blank, which tracks the time each block was last accessed and, when space is needed, discards 'stale' blocks. If data in a block has been updated, discarded blocks are first saved on disk.

LRU algorithm

The database administrator can assign a different blank to each table in a database. InnoDB is the default and most commonly used storage engine.

storage engine

The blank layer consists of data stored on storage media and organized in files.

file system

The file system contains three types of data for each database. Name them

user data, log files, and a data dictionary.

Blank data includes tables and indexes.

User

Blank files contain a detailed, sequential record of each change applied to a database.

Log

The recovery system uses blank to restore data in the event of a transaction, system, or storage media failure.

log files

A blank, also known as a data dictionary, is a directory of tables, columns, keys, indexes, and other objects in a relational database.

catalog

All relational databases contain a blank. Query processors and storage managers use this information when queries are processed and executed.

catalog

MySQL uses the term blank instead of catalog

'data dictionary'

Data dictionary tables cannot be accessed directly with SELECT, INSERT, UPDATE, and DELETE queries. However, the table contents can be accessed indirectly. The blank query is compiled as a SELECT query against dictionary tables.

SHOW

Multiple computers linked by a network are often grouped in layers, called blank, and arranged in a hierarchy.

tiers

Prior to 1990, most software ran in a blank, consisting of a personal or corporate computer connected directly to monitors. Although computers often communicated with each other, the dependencies between applications running on different computers were limited.

single-tier architecture

Since 1990, complex corporate and government applications have increasingly been implemented in a blank.

multi-tier architecture

In a multi-tier architecture, the top tier consists of computers interacting directly with blank

end-users

In a multi-tier architecture, the bottom tier consists of blank managing resources like databases and email.

servers

In a multi-tier architecture, one or more blank execute a variety of functions, such as user authorization, business logic, and communication with other computers.

middle tiers

Typically, blank run on a middle tier and implement business logic.

application programs

Since user interaction and data are managed in the top and bottom tiers, applications are easier to write and maintain in a blank.

multi-tier architecture

Blank is a multi-tier architecture consisting of web browsers and web servers communicating over the internet

Web architecture

In a web architecture, blank on the top tier, manage user interaction.

web browsers

In a web architecture, blank, on a middle tier, generate web pages for display on web browsers and transmit user requests to services running on lower tiers.

Web servers

In a web architecture, blank run application software, process user requests, and communicate with databases and other services.

Application servers

In a web architecture, blank, such as database and authentication, comprise the bottom tier.

Services

The term blank refers to either a software or hardware layer

tier

Prior to 2000, most commercial software was blank, or installed and run on customer computers.

on-premise

Since 2000, blank have increasingly replaced on-premise software. With blank, a vendor such as Amazon, Microsoft, or Google implements computer services on lower tiers of a web architecture. For a fee, blank are made available over the internet to customers.

cloud services

Cloud services fall into three broad categories. Name them.

Infrastructure-as-a-service, or IaaS Platform-as-a-service, or PaaS Software-as-a-service, or SaaS

Blank provides computer processing, memory, and storage media, as if the customer were renting a computer. Ex: Elastic Compute Cloud, or EC2, from Amazon Web Services offers it

Infrastructure-as-a-service, or IaaS,

Blank provides tools and services, such as databases, application development tools, and messaging services. Ex: Azure is Microsoft's cloud services environment, offering the SQL Database service.

Platform-as-a-service, or PaaS

Blank provides complete applications, usually through web browsers on customer machines. Ex: Salesforce offers sales management software, and Google offers document processing applications like Docs, Sheets, and Pages.

Software-as-a-service, or SaaS

Usually blank are offered on virtual machines.

cloud services

A blank, is a software layer that emulates a complete, independent computing environment.

virtual machine, or VM

Multiple virtual machines can run on blank, enabling cloud providers to support many customers on the same machine.

one computer

A cloud database is a database offered as a blank cloud service. Most databases are now available either on-premise or as a cloud service, but cloud database use is growing rapidly.

PaaS

A blank is a statement or proposition, from which another statement is inferred.

premise

Premises refers to buildings and land occupied by a business. Thus, blank is technically correct in the context of cloud software. However on-premise is easier to say and therefore commonly used.

on-premises

Cloud databases have a number of compelling benefits. Name them (5)

Administration. Security. Reliability. Elasticity. Capital cost.

Installing, managing, upgrading, and backing up database systems is time-consuming and complex. With cloud databases, consumers delegate blank to cloud providers.

administrative activities

Cloud providers are large companies with extensive resources. Cloud providers can invest heavily in blank providing better than most cloud customers.

security professionals and infrastructure

Cloud providers provide blank computing systems with little or no down-time.

redundant

Many organizations struggle with daily, monthly, or seasonal fluctuations in processing workload. By averaging fluctuations over many customers, cloud providers provide blank on demand.

flexible database resources

Cloud providers absorb all initial, or capital, costs of computers and facilities. Blank is recovered by cloud service fees.

Capital cost

Cloud databases raise blank. Companies entrust data to cloud providers, which may store data on servers located in countries with different privacy regulations.

data privacy questions

In the United States, data privacy is governed by blank in specific areas, such as medical and financial. As a result, a European company may avoid servers located in the United States.

limited regulations

Data privacy is a concern primarily for blank, such as financial and medical applications. For organizations that do not manage blank, cloud databases offer convincing benefits and have been widely adopted.

sensitive data

A blank consists of multiple processors managed by a single operating system instance.

parallel computer

Parallel computers achieve faster processing speeds by processing multiple instructions blank.

concurrently

Parallel computers fall into three categories. Name them.

A shared memory computer A shared storage computer A shared nothing computer

In a blank, processors share the same memory and storage media.

shared memory computer

In a blank, processors share storage media only. Each processor has private memory.

shared storage computer

In a blank, processors share neither memory nor storage media.

shared nothing computer

Blank memory is optimal for parallel processing against a common data set in a single memory space.

Shared

Blank and blank scale to more processors, since processors do not contend for the same memory.

Shared storage and shared nothing

Multiple computers can communicate via a blank or blank network

local or wide area

A blank consists of cables extending over a small area, typically within one facility.

local area network

Local area networks usually use the blank protocol.

Ethernet communication

A blank spans multiple facilities in different geographic locations, separated by many miles.

wide area network

Wide area networks may communicate via cables, satellite, or telephone lines, often using blank protocols.

internet communication

A blank is one of a group of computers connected by either a local or wide area network.

node

A blank is a group of nodes connected by a local area network, managed by separate operating system instances, and coordinated by specialized cluster management software.

cluster

A cluster is similar to a blank. Both can execute program instructions in parallel on multiple processors. Both can share storage or share nothing. Computers in a cluster cannot share memory, however, since local area networks are too slow to support memory access.

parallel computer

Blank can often be decomposed into parts that run concurrently and execute faster on parallel computers or clusters.

Queries

Parallel and distributed databases exploit blank for faster query execution

multiple processors

A blank runs on a parallel computer or cluster.

parallel database

A blank runs on multiple computers connected by a wide area network.

distributed database

Both parallel and distributed databases present a blank view of data to database users and programmers. The physical location of data on storage media is visible to database administrators only.

unified

In a parallel database, data location has blank on query processing since local area networks are relatively fast and reliable.

limited impact

In a distributed database, data location is blank since wide area networks are relatively slow and unreliable. Wide area networks create technical challenges with distributed transaction

significant

Despite technical challenges, blank offer compelling benefits for databases with users in many locations

distributed databases

A blank updates data on multiple nodes of a distributed database.

distributed transaction

In a distributed transaction, either blank or blank must be successfully updated.

all nodes or no nodes

Databases commonly implement distributed transactions with a technique called blank.

two-phase commit

The two-phase commit has four steps. Name them.

In phase 1, a central transaction coordinator notifies all participating nodes of the required updates. Participating nodes receive the notification, store the update in a local log, and send a confirmation message to the transaction coordinator. Participating nodes do not yet commit the update to the database. Phase 2 begins when the transaction coordinator receives confirmation from all participating nodes. The transaction coordinator now instructs all nodes to commit. Participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.

The two-phase commit must account for what two failure scenarios:

In step 2, if the transaction coordinator does not receive confirmation from all nodes within a fixed time period, the transaction coordinator instructs participating nodes to roll back the update. In step 4, if a node becomes unavailable and fails to notify the transaction coordinator of success, the transaction coordinator resends the commit message until the node responds.

The two-phase commit ensures updates are applied to either all nodes or no nodes. In the first failure scenario, the transaction blank, and no updates are applied. In the second failure scenario, the transaction blank, and all updates are applied.

rolls back commits

Two-phase commit and two-phase locking are different procedures. Two-phase commit blank and blank at the end of distributed transactions only. Two-phase locking, governs blank and blank of locks during either local or distributed transactions.

governs commit and rollback acquisition and release

A blank updates data on a single node of a distributed database.

local transaction

Distributed transactions are relatively blank, as multiple nodes must respond before the transaction commits. As a faster alternative, multiple nodes can be updated blank with local transactions

slow independently

In a local transaction, the blank notifies participating nodes of required updates. Nodes blank and confirm with the transaction coordinator. If a node is unavailable, the transaction coordinator blank until confirmation is received.

transaction coordinator commit immediately repeats the update message

Local transactions create blank, as nodes are updated at different times.

temporary inconsistency

The choice of local or distributed transactions depends on blank and blank requirements

performance and consistency

In many financial databases, all nodes must be consistent at all times. Blank are necessary.

Distributed transactions

In databases that log website activity, temporary inconsistency may be acceptable. Blank might be used to process updates quickly and support a high volume of web clicks.

Local transactions

Updates in a distributed transaction are blank, since the updates occur at the same time from the perspective of the database user.

synchronous

Updates in separate local transactions are blank.

asynchronous

Databases that use local rather than distributed transactions are called blank.

eventually consistent

A blank database conforms to all rules at all times.

consistent

In a distributed database, a blank may govern data on multiple nodes.

rule

In an blank database, 'live' nodes must respond to queries at all times.

available

A 'dead' node may be blank, but 'live' nodes must respond regardless of the state of other nodes.

unresponsive

A blank forms when a network error prevents nodes from communicating.

network partition

A blank occasionally experiences network partitions since nodes are connected by wide area networks that occasionally fail.

distributed database

A blank database continues to function when a network partition occurs.

partition-tolerant

The blank states that a distributed database cannot simultaneously be Consistent, Available, and Partition-tolerant.

CAP theorem

A distributed database can guarantee blank, but blank, of the CAP properties.

any two, but not all three

As a practical matter, most distributed databases must always function and are therefore partition-tolerant. Consequently, most distributed databases guarantee either blank or blank, but not both.

consistency or availability

The blank between consistency and availability is relative, not absolute.

tradeoff

Since wide area networks are relatively slow, the time to propagate an update from one node to another is significant. If a query accesses updated data before all nodes are updated, the database must either return blank or blank.

inconsistent data or not respond immediately

Rather than choose between consistency and availability, a database must choose blank to provide a consistent response.

how long to wait

Blank commonly means the percentage of time a database is responsive to users and programs. In the context of the CAP theorem, however, blank is the response of individual nodes rather than the entire database system.

Availability

In the context of networks, a blank is a subset of nodes. In the context of data storage, a blank is a subset of table data.

partition

A blank is a copy of an entire database, a table, or a subset of table data.

replica

A blank maintains two or more replicas on separate storage devices

replicated database

Data can be replicated in any database with multiple storage devices, such as blank and blank.

parallel and distributed databases

Replicated databases have several major advantages. Name three.

High availability. Fast concurrent reads. Local reads.

If one storage device fails, the database blank to a replica on another storage device.

routes queries

In general, if a database maintains N replicas, the database can survive simultaneous failure of blank

N-1 storage devices.

Concurrent queries can read separate replicas without blank with each other. One large query can be blank into smaller queries that read separate replicas in parallel.

interfering decomposed

In a distributed database, reads can be executed blank, eliminating network delays and outages.

locally

Replicated databases have one major disadvantage. Name it

Slow or inconsistent updates

Updates must be applied to all replicas on blank. If all replicas on different nodes are updated with a blank, the update is relatively slow. If replicas on different nodes are updated with blank, updates are relatively fast but replicas are temporarily inconsistent.

multiple storage devices distributed transaction local transactions

Blank simplifies some database administration activities but makes others more complex.

Replication

One replica can be blank while transactions execute against other replicas.

backed up

Blank can be restricted to one replica, accessible only to trusted database users. Updates are propagated to blank, accessible to a broader user group.

Updates read-only replicas

In a replica database, database administrators must determine how to blank updates across replicas.

propagate

Blank is commonly used in parallel and distributed databases, particularly when reads are frequent, updates are infrequent, and temporary inconsistency is acceptable.

Replication

Updating replicated data in a database running on a single node is blank.

straightforward

Some storage devices, called blank, manage replicas internally, without database intervention. Alternatively, the database can update all replicas within a single local transaction. Either way, synchronizing replicas does not require special database capabilities.

storage arrays

Updating all replicas in a blank guarantees consistency but is relatively slow and fails when any replica is unavailable.

distributed transaction

The blank technique designates one node as primary. All updates are first applied to the primary node in local transactions. Secondary nodes are updated after the primary node commits, with independent local transactions. If the primary node fails, the database automatically designates a new primary node to ensure continued availability.

primary/secondary

The blank technique applies updates to any node in a group. Prior to committing, a node broadcasts transaction information to other nodes, which look for conflicts with concurrent transactions. If any node detects a conflict, an algorithm determines which transaction commits and which rolls back. This algorithm may be simple, such as the transaction that commits first wins, or complex. If a network partition occurs and nodes cannot communicate, processing is temporarily suspended.

group replication

Blank update technique is eventually consistent, not partition-tolerant.

Group replication

Blank update technique is always consistent, not partition-tolerant

Distributed transaction

Blank update is eventually consistent, partition-tolerant.

Primary/Secondary

A blank is a directory of information describing database objects such as tables, columns, keys, and indexes

Catalog

Blank is necessary to process queries and access data. Each node in a distributed database can process queries and therefore requires access to the catalog.

Catalog information

In a distributed database, the catalog can be structured in what two ways

In a central catalog In a replicated catalog

In a blank, the entire catalog resides on a single node. Storing the catalog on a single node is relatively easy to manage. However, query processing at remote nodes must access the catalog via a wide area network, which may be slow or unreliable. Furthermore, query processing at all nodes interact with the blank, which may become a bottleneck.

central catalog

In a blank a copy of the catalog resides on each node. Most queries are fast and reliable since all catalog data is available locally. However, statements that update the catalog, such as CREATE, ALTER, and DROP, must update all replicas. Updating replicas generates increased network traffic and, if executed in a distributed transaction, fails when any replica is unavailable.

replicated catalog,

Since catalog updates are infrequent compared to other database queries, many distributed databases use a blank.

replicated catalog

To improve performance of catalog updates, many databases use a variation of the blank

primary/secondary technique

Updates are first applied to the replica on the blank containing the affected data object and then propagated to other replicas.

node

When catalog replicas are updated with blank, some replicas are momentarily out of date. If a query cannot be processed due to an out-of-date replica, the database might display an error and advise the user to resubmit the query. This rarely occurs, however, since catalog updates are infrequent and the delay between replica updates is short.

local transactions

Organizations use blank to conduct daily business functions.

operational data

Organizations use blank to understand, manage, and plan the business.

analytic data

Analytic data is sometimes called blank or blank.

reporting data or decision support data

Operational and analytic data differ in what four ways

Volatility. Detail. Scope. History.

Blank changes in real time as business functions are executed. Blank is updated at fixed intervals, often daily or weekly, so that reports and summaries always refer to a known time.

Operational data Analytic data

Most blank is detailed, reflecting individual transactions. Blank is often summarized by time period, business unit, geography, and other business dimensions.

operational data Analytic data

Most blank are designed for a specific business function. Consequently, operational databases supporting different business functions are often incompatible. Blank combine data from many business functions in an integrated, enterprise-wide view of data, with standard formats, data types, and keys across all tables.

operational databases Analytic databases

Many blank are concerned primarily with current data. Blank often track trends over time and therefore usually contain current and historic data. Ex: Operational data may include active employees only. Analytic data may include past employees and illustrate changes in total employment by month.

operational databases Analytic databases

Blank and blank are often maintained in separate databases with different designs.

operational and analytic data

Operational data is blank and blank

Volatile and detailed

Analytic detail is blank, blank, blank and blank

Detailed, Summary, Enterprise-wide, and Historical

In a blank database, a database stores each line item of an order, including item name, product code, cost, and quantity.

Detailed

A blank database, the database records the current project assignments for each employee. Updates are made right away when assignments change. The database does not retain prior assignments.

Volatile

In a blank database, each year, employees receive raises for performance and increased cost of living. The database retains salary data for every year employees have worked at the company.

historic

in a blank database, the database stores total count of employees working at each corporate facility.

summary

In a blank database, A report lists employee names alongside employee office numbers. Employee names are extracted from a human resources database. Employee office assignments are stored in a separate facilities database.

Enterprise-wide

Storing operational and analytic data in the same database creates what three problems:

Database design. Interference. Reference time.

Since blank is volatile, operational databases are typically optimized for updates, with most tables in third normal form. Third normal form minimizes redundancy but generates many tables and is not optimal for blank. Analytic queries often combine columns from many third normal form tables, resulting in complex joins that are difficult to write and slow to run.

operational data analytic queries

Blank often summarize large volumes of data. When executed against an operational database, analytic queries compete with blank, degrade query response time, and interfere with business operations.

Analytic queries operational queries

Blank usually reference a specific point in time, such as sales totals as of midnight on the last day of the month. Since operational data is volatile, results depend on the blank a query is submitted. Analytic queries against operational databases thus have an uncertain reference time and may be misleading.

Analytic queries precise time

A blank is a separate database optimized for analytics rather than operations.

data warehouse

A data warehouse consists of data extracted from blank and restructured to support analytic queries.

operational databases

Data is usually extracted periodically, at a fixed time, so that data in the warehouse has a known blank. Data is extracted during times of low database use to minimize impact on operational queries.

reference time

Data warehouses integrate data from blank for use by the entire organization.

multiple business functions

A blank is a data warehouse designed for a specific business area, such as sales, human resources, or product development.

data mart

Since data marts have blank than a data warehouse, data marts are easier to build and maintain. A data mart can be derived directly from operational databases or indirectly from a data warehouse.

smaller scope

Data warehouses are refreshed periodically with a five-step process. Name the steps

Extract data from operational databases Cleanse data Integrate data into a uniform structure. Restructure data Load data to the data warehouse.

Extract data from operational databases into a temporary database, called a blank. Since the data warehouse already contains data from the prior period, only data that has changed since the prior period is extracted.

'staging area'

The five-step process is commonly referred to as the blank, or ETL, process.

extract-transform-load

Since the ETL process is time-consuming and difficult to automate, many organizations use special software products, called blank, to minimize programming.

ETL tools

Blank from Informatica is a high-end ETL product intended to manage large extracts for complex organizations.

PowerCenter

Blank from Microsoft is designed for SQL Server data warehouses.

SQL Server Integration Services

Blank supports many data sources but is optimized to load Oracle database products.

Oracle Data Integrator

Reading data from operational systems and writing the data to a temporary database is called blank.

extraction

Correcting errors in operational data and converting to a standard format is called blank.

'cleansing'

Data in different operational systems often have incompatible or missing keys. Creating uniform primary and foreign keys is one example of blank.

data integration

Converting data from a design optimized for operations to a design optimized for analytics is called blank.

restructuring

Data warehouses are periodically blanked with cleansed, integrated, and restructured data from the temporary database.

loaded

To simplify analytic queries, data warehouses commonly use a blank.

dimensional design

A dimensional design, also called a blank, consists of fact and dimension tables:

star schema

A blank contains numeric data used to measure business performance, such as sales revenue or number of employees.

fact table

Each row in a fact table consists of numeric fact columns and foreign keys that reference blank.

dimension tables

A blank contains textual data that describes the fact data, such as product line, organizational unit, and geographical region.

dimension table

The blank of a fact table is the composite of all foreign keys referencing dimension tables.

primary key

The primary key of a blank is a small, meaningless integer

dimension table

Most data warehouses have many blanks

fact tables

A blank is a sequence of columns in which each column has a one-many relationship to the next column.

dimension hierarchy

A dimension table usually contains one or more column blank.

hierarchies

In some cases, several columns are at the same blank of a hierarchy

level

Blank usually summarize data at one level of one hierarchy from each dimension.

Analytic queries

For fast execution, frequently used summary data may be computed in advance and stored in a blank.

data warehouse

Since data warehouses track historical data, dimensional designs usually have blank and blank dimension tables.

date and time

Each row of the blank dimension table corresponds to a day

date

Each row of the blank dimension table corresponds to a minute of the day.

time

The time dimension contains blank rows (24 hours × 60 minutes per hour).

1,440

Fact tables contain blank referencing date, time, or both dimensions, to establish the time of a fact.

foreign keys

The blank and blank dimensions provide an elegant way to track historical data.

date and time

Foreign keys StartDateID and EndDateID are added to the fact table and indicate the blank of each row. Current rows have an end date in the distant future, such as December 31, 2999.

effective dates

f a fact changes to a new value on date X, the end date of the current row is set to X and a new row is inserted with what values

The fact column is the new value. StartDateID refers to date X. EndDateID refers to December 31, 2999. Other columns are identical to the prior row.

Adding start and end foreign keys to the fact table is called blank. Historical data can be tracked with other designs, but type 2 design is simple, effective, and commonly used.

type 2 design for slowly changing dimensions

A straightforward approach for considering the value of a BI program suggests at least what four dimensions of value?

Financial value Productivity value Trust value Risk value

A blank is the primary source of information that feeds the analytical processing within an organization. There are a number of different analytic applications that are driven by business needs, yet most, if not all of these applications are driven by the data that has been migrated into a data warehouse.

data warehouse

A data warehouse is a centralized repository of blank.

information

A data warehouse is organized around the relevant blank important to the organization.

subject areas

A data warehouse provides a platform for different blank (both human and automated) to submit queries about enterprise information.

consumers

A data warehouse is used for analysis and not for blank. n The data in a data warehouse is nonvolatile.

transaction processing

A data warehouse is the target location for blank from multiple sources, both internal and external to an enterprise.

integrating data

A blank is a subject-oriented data repository, similar in structure to the enterprise data warehouse, but it holds the data necessary for the decision support and BI needs of a specific department or group within the organization.

data mart

Blank is different from the typical operational or transaction processing systems. There are many proposed definitions of OLAP, most of which describe what OLAP is used for. The most frequently used terms are ‘‘multidimensional’’ and ‘‘slice-and-dice.’’

Online analytical processing

Online analytical processing tools provide a means for presenting data sourced from a data warehouse or data mart in a way that allows the data consumer to view blank across multiple dimensions. In addition, these metrics are summarized in a way that allows the data consumer blank (which means to expose greater detail) on any particular aspect of the set of facts.

comparative metrics to drill down

The data to be analyzed in an OLAP environment are arranged in a way that enables visibility along any of the dimensions. Usually this is described as a blank, although the organization is intended to allow the analyst to fix some set of dimensions and then see aggregates associated with the other dimensional hierarchies

cube

The value of an OLAP tool is derived from the ability to quickly analyze the data from blank, and so OLAP tools are designed to pre-calculate the aggregations and store them directly in the OLAP databases.

multiple points of view

Data integration is not just limited to extracting data sets from internal sources and loading them into a data warehouse, but focuses on effectively facilitating the blank. Data integration goes beyond ETL, data replication, and change data capture, although these remain key components of the integration fabric.

delivery of information to the right places within the appropriate time

A basic concept for populating a data warehouse is that data sets from multiple sources are collected and then added to a data repository from which blank can source their input data.

analytical applications

What is the general steps of an ETL process. 7 steps

Get the data from the source location. Map the data from its original form into a data model that is suitable for manipulation at the staging area. Validate and clean the data. Apply any transformations to the data that are required before the data sets are loaded into the repository. Map the data from its staging area model to its loading model. Move the data set to the repository. Load the data into the warehouse.

The first part of the ETL process is to assemble the infrastructure needed for aggregating the raw data sets and for the application of the transformation and the subsequent preparation of the data to be forwarded to the data warehouse. This is typically a combination of a hardware platform and appropriate management software that we refer to as the blank.

staging area

What data is to be extracted essentially relies on what the BI clients expect to see ultimately factored into their analytical applications, and will have been identified as a result of the blank.

data requirements analysis process

How data should be extracted may depend on the scale of the project, the number (and disparity) of data sources, and how far into the implementation the developers are. Extraction can be as simple as a blank, the use of blank that connect to different originating sources, yet can be as complex as to require blank written in a proprietary programming language.

collection of simple SQL queries adapters specially designed programs

Blank can be used as a way to capture the metadata of a data set

Data profiling

Blank includes parsing strings representing integer and numeric values and transforming them into the proper representational form for the target machine, and converting physical value representations from one platform to another (EBCDIC to ASCII being the best example).

Data type conversion

Blank are rules we can uncover through the profiling process can be applied, along with directed actions that can be used to correct data that is known to be incorrect and where the corrections can be automated. This component also covers data-duplicate analysis and elimination and merge/purge.

Data cleansing

Blank includes exploiting the discovery of table and foreign keys for representing linkage between different tables, along with the generation of alternate (i.e., artificial) keys that are independent of any systemic business rules, mapping keys from one system to another, archiving data domains and codes that are mapped into those data domains, and maintaining the metadata (including full descriptions of code values and master key-lookup tables).

Integration

Blank in relation to the foreign key relationships exposed through profiling or as documented through interaction with subject matter experts, this component checks that any referential integrity constraints are not violated and highlights any nonunique (supposed) key fields and any detected orphan foreign keys.

Referential integrity checking

Blank are any transformations based on business rules, new calculations, string manipulations, and such that need to be applied as the data moves from source to target are applied during the transformation stage. For example, a new ‘‘revenue’’ field might be constructed and populated as a function of ‘‘unit price’’ and ‘‘quantity sold.’’

Derivations

Blank and blank are frequently data that is in normalized form when it comes from the source system needs to be broken out into a denormalized form when dimensions are created in repository data tables. Conversely, data sourced from join extractions may be denormalized and may need to be renormalized before it is forwarded to the warehouse.

Denormalization and renormalization.

Blank is any aggregate information that is used for populating summaries or any cube dimensions can be performed at the staging area.

Aggregation

Blank is as a matter of reference for integrity checking, it is always useful to calculate some auditing information, such as row counts, table counts, column counts, and other tests, to make sure that what you have is what you wanted. In addition, some data augmentation can be done to attach provenance information, including source, time and date of extraction, and time and date of transformation.

Audit information.

Blank is because nulls can appear in different forms, ranging from system nulls to explicit strings representing different kinds of nulls, it is useful to have some kind of null conversion that transforms different nulls from disparate systems.

Null conversion

The blank component of ETL is centered on moving the transformed data into the data warehouse.

The loading component of ETL has what two critical issues

Target dependencies Refresh volume and frequency

The blank compares pairs of records taken from different data sets to determine if they represent the same entity and are therefore candidates for merging

Merge/purge operation

The more data sets that are being blank, the greater the amount of work that needs to be done for the integration to complete

Integrated

Blank refers to the process of discovering patterns that lead to actionable knowledge from large data sets through one or more traditional data mining techniques, such as market basket analysis and clustering

Knowledge discovery

Data Mining Techniques is the process of mining data can be described as a blank

virtuous cycle.

Blank is the task of taking a large collection of entities and dividing that collection into smaller groups of entities that exhibit some similarity

Clustering

Blank is the process of organizing data into predefined classes

Classification

Blank is a process of assigning some continuously valued numeric value to an object.

Estimation

The subtle difference between blank and the previous two tasks is that blank is the attempt to classify objects according to some expected future behavior.

Prediction

Blank is a process of evaluating relationships or associations between data elements that demonstrate some kind of affinity between objects.

Affinity grouping

The last of the tasks is blank, which is the process of trying to characterize what has been discovered or trying to explain the results of the data mining process.

Description

An blank is a database that stores data in main memory, instead of or in addition to storage media

in-memory database

Main memory is much faster than storage media, such as flash memory and disk drives. Consequently, in-memory databases are appropriate for blank, which require fast execution of lengthy queries.

Analytic applications

In-memory databases are also appropriate for applications that rapidly insert blank, such as data collection from internet devices.

High volume of data

In-memory databases can now store blank of data, which is adequate for many databases.

Terabytes

Main memory is volatile and lost when power fails or the database process crashes, so in-memory data is periodically backed up on blank. In-memory databases may also record insert, update, and delete operations in a log file on blank, which can be used to reconstruct databases in the event of a crash.

Storage media

Blank is an extension to SQL Server supporting in-memory tables. In-memory tables offer the same transaction and recovery options as storage media tables.

SQL Server In-Memory OLTP

Blank creates in-memory copies of tables. The table source data remains on storage media, grouped by rows in blocks. In-memory copies are physically organized by column, rather than by row. The memory's columnar organization is optimal for analytic queries, which often summarize large volumes of data from one or two columns.

Oracle Database In-Memory

MySQL assigns a specific storage engine to individual tables. Both the blank and blank storage engines support in-memory tables. MEMORY does not support transactions or recovery in the event of a failure, and consequently is appropriate for temporary tables only. NDB Cluster supports transactions, recovery, and distributed data, and is recommended for persistent data.

MEMORY and MySQL NDB Cluster

Blank either run under different database systems or have incompatible schema.

Heterogeneous databases

Databases with blank might have inconsistent primary and foreign keys, similar tables with different designs, or similar columns with different names and data types

incompatible schema

The coordinating software layer is called blank since the software lies between application programs and database software.

Middleware

A blank is a directory of participating database objects, such as tables, columns, and indexes.

A global catalog

A blank decomposes a federated query into queries for each participating database.

A global query processor

A blank converts the decomposed queries to the appropriate syntax for each participating database.

Database wrapper

Some products support blank or SQL/MED, an extension of the SQL standard for federated databases. SQL/MED adds constructs such as nicknames and user mappings to SQL.

SQL/Management of External Data,

A blank is a federated database name for a participating database object, such as tables and columns.

Nickname

A blank associates a federated database user with a participating database user.

User mapping

Give the syntax for nickname and user mapping

CREATE NICKNAME Employee FOR DB2SERVER.HRdatabase.Emp2table; CREATE USER MAPPING FOR SamSnead SERVER DB2SERVER OPTIONS (REMOTE_AUTHID 'sam.snead@gmail.com', REMOTE_PASSWORD 'X!8sflHn');

Although a federated database does not provide a seamless view of data, a federated database is relatively easy to build and often the only practical way to blank from existing, incompatible databases.

Combine data

A blank is an analytic database of raw, unprocessed data copied from multiple data sources.

Data lake

Data lakes share some characteristics of blank and some characteristics of blank

data warehouses and federated databases

Like a blank, a data lake is a separate database designed for analytic queries and consisting of data extracted from multiple source systems.

Data warehouse

Like a blank, data in a data lake is not cleansed, integrated, or restructured. Data is stored in the original format and structure. Depending on the data source, data may be loaded continuously rather than at fixed intervals.

Federated database

Data lakes often contain blank of data, such as sensor data or website clicks. Data lakes also contain blank, such as images, video, and text documents, which consume megabytes or gigabytes per data item. As a result, data lakes usually require a large amount of storage and utilize inexpensive, but relatively slow, storage media.

Large volumes Unstructured data

Data lakes are more suitable for blank, who are trained to work with complex, unstructured data, than for business analysts.

Data scientists