CH1 Flashcards

Question

transient data

Answer 1

With transient data, changes to existing records are written over previous records, thus destroying the previous data content

Answer 2

Periodic data are never physically altered or deleted once they have been added to the store creaet second colum called Action column. C=current. U= update, you are updating values. so your storing the older rows(historical). D=Deleted data.

Answer 3

``` Relational databases consists of tables Tables are flat one-dimensional Cross-tabs can be 2-dimensional How to represent multi-dimensional tables? The job of data warehousing/OLAP ```

Answer 4

Is a preferred technique to present analytic data Contains same information as a normalized model Deliver data that’s understandable to the user Deliver fast query performance

Answer 5

Star Schemas: Dimensional modeling implemented in relational DBMS (ROLAP) OLAP (online analytical processing): Dimensional modeling in a multi-dimensional DB environment OLAP cubes deliver superior query performance because of pre-calculations, indexing strategies, and other optimizations

Answer 6

Relational Olap model. Star Schemaa Dimensional modeling implemented in relational dbms. similar to rdms.

Answer 7

multiple tables that point to 1 fact table in relational dbms

Answer 8

Dimensions are the subjects, who what when why

Answer 9

collect data about subjects(dimensions)

Answer 10

who what where when hos of measurement, example total amount of sales, qty, etc.

Answer 11

characterize in fact or dimension tables.

Answer 12

measurements rows in a fact table must be at the same grain. When creating table, at what detail of specification do you want to save the data. weekly, monthly, or individual transactions. they have to match.

Answer 13

Dimension tables often have many columns (attribute) Each dimension table contains data for one dimension Dimension table often represent hierarchical relationships Product roll up into brands and then into categories Each dimension is defined by a single primary key (PK) PK serves as the basis of referential integrity with the given fact table to which it is joined.

Answer 14

The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques OLAP provides advanced query capabilities to the warehouse that standard SQL cannot Complex queries that need to aggregate data can take hours to run End users cannot be expected to issue SQL statements

Answer 15

A multidimensional structure consisting of “Data Cubes”

Answer 16

Sides of a cube

Answer 17

Facts in a fact table

Answer 18

Projection of the cube

Answer 19

A Cube need not be in the shape of a cube at all. It can have as many dimensions as necessary. Each dimension can have as many “members” as necessary. A cube can have as many dimensions as necessary.

Answer 20

measures from fact table

Answer 21

The Fact Table is the table that provides the data for the elements of the Cube. There can be only one Fact Table per Cube!

Answer 22

Is a Projection of the cube. An Aggregation is a Projection of the Cube An Aggregation Collapses Dimensions. Summation. similar to SQL "Group BY" Clause

Answer 23

sum, count, min, max

Answer 24

come up with 2-D view of data by filtering (fixing) a dimension. like a where clause, slice based on a condition. example slice between the males and femails to compare.

Answer 25

come up with a small cube (sub-cube) by selecting a subset of all dimensions

Answer 26

going from summary to more detailed views.

Answer 27

going from detailed views to a summary view.

Answer 28

to rotate the cube across a dimension to see various faces.

Answer 29

information. This asset is almost always used for two purposes: operational record keeping and analytical decision making. Simply speaking, the operational systems are where you put the data in, and the DW/BI system is where you get the data out

Answer 30

The operational systems are optimized to process transactions quickly. These systems almost always deal with one transaction record at a time. They predictably perform the same operational tasks over and over, executing the organization’s business processes. Given this execution focus, operational systems typically do not maintain history, but rather update data to refl ect the most current state.

Answer 31

Users of a DW/BI system, on the other hand, watch the wheels of the organization turn to evaluate performance. They count the new orders and compare them with last week’s orders, and ask why the new customers signed up, and what the customers complained about. They worry about whether operational processes are working correctly. Although they need detailed data to support their constantly changing questions, DW/BI users almost never deal with one transaction at a time. These systems are optimized for high-performance queries as users’ questions often require that hundreds or hundreds of thousands of transactions be searched and compressed into an answer set. To further complicate matters, users of a DW/BI system typically demand that historical context be preserved to accurately evaluate the organization’s performance over time

Answer 32

The DW/BI system must make information easily accessible. The contents of the DW/BI system must be understandable The DW/BI system must present information consistently. The data in the DW/BI system must be credible. Data must be carefully assembled from a variety of sources, cleansed, quality assured, and released only when it is fi t for user consumption. The DW/BI system must adapt to change. User needs, business conditions, data, and technology are all subject to change The DW/BI system must present information in a timely way The DW/BI system must be a secure bastion that protects the information assets The DW/BI system must serve as the authoritative and trustworthy foundation for improved decision making The business community must accept the DW/BI system to deem it successful

Answer 33

■ Deliver data that’s understandable to the business users. | ■ Deliver fast query performance.

Answer 34

Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key diff erence between 3NF and dimensional models is the degree of normalization. Because both model types can be presented as ERDs, we refrain from referring to 3NF models as ER models; instead, we call them normalized models to minimize confusion.

Answer 35

A dimensional model contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change.

Answer 36

Dimensional models implemented in relational database management systems are referred to as star schemas because of their resemblance to a star-like structure. Dimensional models implemented in multidimensional database environments are referred to as online analytical processing (OLAP) cubes, as illustrated in Figure 1-1.

Answer 37

When data is loaded into an OLAP cube, it is stored and indexed using formats and techniques that are designed for dimensional data. Performance aggregations or precalculated summary tables are often created and managed by the OLAP cube engine.

Answer 38

OLAP cubes also provide more analytically robust functions that exceed those available with SQL. The downside is that you pay a load performance price for these capabilities, especially with large data sets.

Answer 39

a business measure

Answer 40

You should strive to store the low-level measurement data resulting from a business process in a single dimensional model. Because measurement data is overwhelmingly the largest set of data, it should not be replicated in multiple places for multiple organizational functions around the enterprise. Allowing business users from multiple organizations to access a single centralized repository for each set of measurement data ensures the use of consistent data throughout the enterprise

Answer 41

Each row in a fact table corresponds to a measurement event. The data on each row is at a specifi c level of detail, referred to as the grain, such as one row per product on a sales transaction

Answer 42

The idea that a measurement event in the physical world has a one-to-one relationship to a single row in the corresponding fact table is a bedrock principle for dimensional modeling. Everything else builds from this foundation.

Answer 43

The most useful facts are numeric and additive, such as dollar sales amount. Additivity is crucial because BI applications rarely retrieve a single fact table row. Rather, they bring back hundreds, thousands, or even millions of fact rows at a time, and the most useful thing to do with so many rows is to add them up.

Answer 44

Semi-additive facts, such as account balances, cannot be summed across the time dimension. Non-additive facts, such as unit prices, can never be added.

Answer 45

ransaction, periodic snapshot, and accumulating | snapshot. Transaction grain fact tables are the most common

Answer 46

All fact tables have two or more foreign keys (refer to the FK notation in Figure 1-2) that connect to the dimension tables’ primary keys.

Answer 47

The fact table generally has its own primary key composed of a subset of the foreign keys. This key is often called a composite key. Every table that has a composite key is a fact table. Fact tables express many-to-many relationships. All others are dimension tables.

Answer 48

They describe the “who, what, where, when, how, and why” associated with the event. Dimension tables tend to have fewer rows than fact tables, but can be wide with many large text columns. Each dimension is defi ned by a single primary key (refer to the PK notation in Figure 1-3), which serves as the basis for referential integrity with any given fact table to which it is joined

Answer 49

Dimension attributes serve as the primary source of query constraints, groupings, and report labels. In a query or report request, attributes are identifi ed as the by words. For example, when a user wants to see dollar sales by brand, brand must be available as a dimension attribute

Answer 50

Attributes should consist of real words rather than cryptic abbreviations. You should strive to minimize the use of codes in dimension tables by replacing them with more verbose 14 Chapter 1 textual attributes.In many ways, the data warehouse is only as good as the dimension attributes; the analytic power of the DW/BI environment is directly proportional to the quality and depth of the dimension attributes

Answer 51

You often make the decision by asking whether the column is a measurement that takes on lots of values and participates in calculations (making it a fact) or is a discretely valued description that is more or less constant and participates in constraints and row labels (making it a dimensional attribute). For example, the standard cost for a product seems like a constant attribute of the product but may be changed so often that you decide it is more like a measured fact.

Answer 52

dimension tables often represent hierarchical relationships. For example, products roll up into brands and then into categories. For each row in the product dimension, you should store the associated brand and category description. The hierarchical descriptive information is stored redundantly in the spirit of ease of use and query performance. You should resist the perhaps habitual urge to normalize data by storing only the brand code in the product dimension and creating a separate brand lookup table, and likewise for the category description in a separate category lookup table

Answer 53

This normalization is called snowfl aking. Instead of third normal form, dimension tables typically are highly denormalized with fl attened many-to-one relationships within a single dimension table. Because dimension tables typically are geometrically smaller than fact tables, improving storage effi ciency by normalizing or snowfl aking has virtually no impact on the overall database size. You should almost always trade off dimension table space for simplicity and accessibility

Answer 54

This book illustrates repeatedly that the most granular or atomic data has the most dimensionality. Atomic data that has not been aggregated is the most expressive data; this atomic data should be the foundation for every fact table design to withstand business users’ ad hoc attacks in which they pose unexpected queries. With dimensional models, you can add completely new dimensions to the schema as long as a single value of that dimension is defi ned for each existing fact row. Likewise, you can add new facts to the fact table, assuming that the level of detail is consistent with the existing fact table. You can supplement preexisting dimension tables with new, unanticipated attributes. In each case, existing tables can be changed in place either by simply adding new data rows in the table or by executing an SQL ALTER TABLE command.

Answer 55

If you study this code snippet line-by-line, the fi rst two lines under the SELECT statement identify the dimension attributes in the report, followed by the aggregated metric from the fact table. The FROM clause identifi es all the tables involved in the query. The fi rst two lines in the WHERE clause declare the report’s fi lter, and the remainder declare the joins between the dimension and fact tables. Finally, the GROUP BY clause establishes the aggregation within the report.

Answer 56

These are the operational systems of record that capture the business’s transactions. Think of the source systems as outside the data warehouse because presumably you have little or no control over the content and format of the data in these operational systems. The main priorities of the source systems are processing performance and availability. Operational queries against source systems are narrow, one-record-at-a-time Data Warehousing, Business Intelligence, and Dimensional Modeling Primer 19 queries that are part of the normal transaction fl ow and severely restricted in their demands on the operational system

Answer 57

environment consists of a work area, instantiated data structures, and a set of processes. The ETL system is everything between the operational source systems and the DW/BI presentation area

Answer 58

1.Extraction is the fi rst step in the process of getting data into the data warehouse environment. Extracting means reading and understanding the source data and copying the data needed into the ETL system for further manipulation. At this point, the data belongs to the data warehouse. 2.After the data is extracted to the ETL system, there are numerous potential transformations, such as cleansing the data (correcting misspellings, resolving domain conflicts, dealing with missing elements, or parsing into standard formats), combining data from multiple sources, and de-duplicating data. The ETL system addsvalue to the data with these cleansing and conforming tasks by changing the data and enhancing. 3.fi nal step of the ETL process is the physical structuring and loading of data into the presentation area’s target dimensional models. Because the primary mission of the ETL system is to hand off the dimension and fact tables in the delivery step, these subsystems are critical

CH1 Flashcards

(83 cards)