Final Review Sheet Flashcards

1
Q

index

A

table only has two attributes, a key identifier attribute and a pointer attribute that tells you which block of data it is in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

B+tree index

A

it follows the pattern of a tree the pointers have a top level then they go to the next level and to the next level until you get to the lowest grain of data
used for items with a large cardinality, it is more geared towards transactional systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

block

A

logical unit of data transferred between disk and memory is block (4 bytes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

bitmap index

A

a collection of bit vectors where there are only 1’s and 0’s

it is used for boolean operations and in data warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Staging Steps

A
Extraction
Data cleansing
Data Integration
Transformation
Loading
Maintenance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Extraction

A

take it from the source, ex: operational systems, flat files, web

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Staging

A

data is placed in a staging area away from the original system to cleanse it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Cleansing:Missing Values

A
  1. Exclude the record
  2. Exclude the attribute/field
  3. Replaced by a global constant
  4. Replaced by the attribute mean
  5. replaced by the most probable value
    6 Manual correction
  6. Apply specific algorithm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Integration

A

data from different sources with different data formats need to be integrated into one data warehouse

problems: same names between different attributes or same attribute with different names
1. Get source and target schemas
2. Integrate source schemas and map to data warehouse schema with the help from tools and business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Transformation

A

Prepare data for loading into the data warehouse

  1. change data formats
  2. Create derived attributes and tables
  3. Aggregate - make aggregate fact table
  4. Create warehouse keys
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Loading

A

Load cleansed, integrated, transform data into data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data integration tools

A
Pentaho
Oracle
SQL
Commercial 
lower maintenance costs, easier to use, metadata is useful by default
Scripting/SQL
more productive, specific to your needs
more difficult to maintain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DW Maintenance

A

DW refreshment, is the process of keeping data in DW consistent/current with data in source systems,
can be based on user driven policy where it is only refreshed when asked for
also it can be based on a warehouse driven policy that has a schedule of refreshing it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Trends in Data Warehousing

A

Online forums are used to predict product defects example: honda
social networks are used to predict stock behavior example: stocktwitter
Marketing trends are predicted by who you are in contact with on your phone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mapreduce

A

a programming model invented at google. uses thousands of commodity pcs connected by ethernet
map reads an input and produces a key value pair, all pairs are associated with the same key and are grouped then passed to reduce. multiple computers do these things
reduce: receive a group of pair and merge(aggregate) them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

hadoop

A

open source software for reliable, scalable, distributed computing

17
Q

cloud computing

A

data warehouse services including but not limited to ETL, data management, reporting, dashboards, etc. access control is offered via the internet

18
Q

OLAP

A

online analytical processing
places key performance indicators into context
measures are pre aggregated and precalculated
It is organized in the form of a cube and allows data to be modeled and viewed in multiple dimensions
Olap uses Role up(drill up) which summarizes data by climbing up a concept hierarchy or by reducing dimensions example: aggregating all of the cities in one state into one field
Also uses Drill-down which moves down a concept hierarchy and adding dimensions to be more specific
Typical operations include
slice:creates a slice from the cube by choosing a single value for one of the dimensions
dice: creates subcube from the cube by choosing two or more values for one or more of the dimensions