10/9 Class Flashcards

1
Q

difference between datawarehouse and database

A

the way the data is organized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

step 1

A

decide on the data format:
data type
length
null or not null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

char(size)

A

size is the number of characteristics to store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

varchar(size)

A

size is the number of characters to store, between variable length string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

integer

A

stores ranges of integer values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

decimal(p,s)

A

P is the precision and s is the scale

ex: deimal (7,2) is a number that has 5 digits before the decimal and 2 digits after the decimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Step 2

A

decide index attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

date time

A

stores year, month, day, hours, minutes, and seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

disk

A

is the slowest part of our index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

logical unit of transfer data(block)

A

4k bytes, 0.01 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

index table

A

only has two attributes

key: unique key, pointer: shows which block it is in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

B+ tree index

A

where there are multiple indexes in our records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

height of the tree

A

log2n the two is based on a two block table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

time process structure

A

4min non index
12s simple index
.21 s b+ tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

oracle command for creating B+tree index

A

create index pidindex ON product(product_key)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

bitmap index

A

collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v
more common in data warehouse

17
Q

aggregation

A

can be an alternative or a complement to indexes on tables

18
Q

data cleaning(missing values)

A
  1. exclude the record
  2. exclude the attribute/field
  3. replaced by a global constant
  4. replaced by the attribute mean
    5 Replaced by the most probable value
    6 Manual correction
    7 Apply specific algorithm
19
Q

imputation

A

used to find a missing value, it uses regression

20
Q

data integration common challenges

A

same attribute with different name

different attribute with same name

21
Q

data integration process

A
  1. Get source and target schemas
  2. Integrate source schema and map to data warehouse schema with the help from: business domain experts, source system tech experts, ETL tools
22
Q

maintenance

A

DW refreshment, is the process of keeping data in DW consistent/current with data in source systems