Chapter 5 Flashcards

1
Q

What is the purpose of Physical Database design?

A

To translate the logical description of data into the technical specifications for storing and retrieving data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of Physical Database Design?

A

To create a design for storing data that will provide adequate performance and ensure database integrity, security, and recoverability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Physical Design Process visualized

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the regulations and standards that impact physical design decisions?

A
  • Sarbanes-Oxley Act (SOX) - Protect investors by improving accuracy and reliability
  • Committee of Sponsoring Organizations (COSO) of the Treadway Commission
  • IT Infrastructure Library (ITIL)
  • Control Objectives for Information and Related Technology (COBIT)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a field?

A

Smallest unit of application data recognized by system software

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do you have to consider when designing a field?

A
  • Choose data type
  • Coding, compression, encryption
  • Control data intregrity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Different Data Types, Picture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a default value?

A

An assumed value if no explicit value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is range control?

A

The allowable value limitations (constraints or validation rules)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is null value control?

A

Allowing or prohibiting empty fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is referential integrity?

A

Range control (and null value allowances) for foreign-key to primary-key match-ups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you handle missing data?

A
  • Substitue an estimate of the missing value (e.g., using a formula)
  • Construct a report listing missing values
  • In programs, ignore missing data unless the value is significant (sensitivity testing)

*Triggers can be used to perform these operations*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is denormalization?

A

Transforming normalized relations into non-normalized physical record specifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the benefits of denormalization?

A
  • Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessar join queries)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the costs of denormalizations, due to data duplication?

A
  • Wasted storage space
  • Data integrity/consistency threats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the common denormalization opportunities?

A
  • One-to-one relationships
  • Many-to-many relationship with non-key attributes (associative entity)
  • Reference data (1:N relationship where 1-side has data not used in any other relationship)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

One-to-one denormalization example

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Many-to-Many denorm example

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Reference data denorm example

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some risks associated with denormalization?

A
  • Increase chance of errors and inconsistencies
  • Reintroduce anomalies
  • Fore reprogramming when business rules change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What other methods can be used to improve the performance of joins?

A
  • Organization of tables in the database (file organization and clustering)
  • Proper query design and optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is horizontal partitioning?

A

Distributing the rows of a logical relation into several seperate tables

23
Q

What are the three types of horizontal partitioning?

A
  • Key Range Partitioning
  • Hash Partitioning
  • Compositie Partitioning
24
Q

When can you use horizontal partitioning?

A

Useful for situations where different users need access to diffrent rows

25
Q

What is Vertical Partitioning?

A

It’s distributing the columns of a logical relation into several seperate physical tables.

26
Q

When would you use Vertical partitioning?

A

Useful for situations where different users need access to different columns

27
Q

When veritcal partitioning what much be repeated in each file?

A

Primary key

28
Q

What are the advantages of partitioning?

A
  • Efficiency: Records used together are grouped together
  • Logical optimization: Each partition can be optimized for performance
  • Security: data not relevant to users are segregated
  • Recovery and uptime: Smaller files take less time to back up
  • Load balancing: Partitions stored on different disks, reduces contention
29
Q

What are disadvantages of partitioning?

A
  • Inconsistent access speed: Slow retrievals across partitions
  • Complexity: Non-transparent partitioning
  • Extra space or update time: Duplicate data; access from multiple partitions
30
Q

What is range partitioning?

A
  • Partitions defined by range of field values
  • Could result in unbalanced distribution of rows
  • Like-valued fields share partitions
31
Q

What is hash partitioning?

A
  • Partitions defined via hash functions
  • Will guarentee balanced distribution of rows
  • Partition could contain widely varying valued fields
32
Q

What is list partitioning?

A
  • Based on predefined lists of values for the partitioning key
33
Q

What is composite partitioning?

A
  • Combination of the other approaches
34
Q

When designing physical database files, what is a physical file?

A

A named portion of secondary memory allocated for the purpose of storing physical records.

35
Q

What part of a physical file is Tablespace?

A
  • Named logical storage unit in which data from multiple tables/views/objects can be stored
36
Q

What are the tablespace components?

A
  • Segment - a table, index, or partition
  • Extent - contiguous section of disk space
  • Data block - smallest unit of storage
37
Q

Physical storage example

A
38
Q

What are file organizations?

A

Technique for physically arranging records of a file on secondary storage.

39
Q

What are the factors for selecting file organization?

A
  • Fast data, retrieval and throughput
  • Efficient storage space utilization
  • Protection from failure and data loss
  • Minimizing need for reorganization
  • Accommodating growth
  • Security from unauthorized use
40
Q

What are the types of file organizations?

A
  • Heap - no particular order
  • Sequential
  • Indexed
  • Hashed
41
Q

What is an indexed file organization?

A

Storage of records sequentially or nonsequentially with an index that allows software to locate individual records

42
Q

What is an index?

A

A table or other data structure used to determine in a file the location of records that satisfy some condition

43
Q

What keys are automatically indexed?

A

Primary keys

44
Q

What are the other fields or combinations of fields that can also be indexed called?

A

Secondary keys (or nonunique keys)

45
Q

The comparitive features of different file organizations

A
46
Q

What are clustering files?

A

In some relational DBMSs, related records from different tables can be stored together in the same disk area.

47
Q

What is useful for improving the performance of join operations?

A

Clustering files

48
Q

How are primary keys stored in clustering files?

A

Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table.

49
Q

What is a unique (primary) index?

A

Typically done for primary keys, but could also apply to other unique fields.

50
Q

What is a nonunique (secondary) index?

A
  • Done for fields that are often used to group individual entities (e.g. zip code, product)
51
Q

What are the rules for using indexes?

A
  1. Use on larger tables
  2. Index the primary key of each table
  3. Index search fields (fields frequently in WHERE clause)
  4. Fields in SQL ORDER BY and GROUP BY commands
  5. When there are >100 values but not when there are <30 values
52
Q

Rules for using Indexes cont.

A
  1. Avoid use of indexes for fields with long values; perhaps compress values first
  2. If key to index is used to determine location of record, use surrogate (like sequence number) to allow even spread in storage area
  3. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s)
  4. Be careful of indexing attributes with null values; many DBMSs will not recognize null vaues in an index search
53
Q

What is possible when working in multiprocessor systems?

A

Parallel query processing

54
Q

What allows for query writers to preempt the automated optimization?

A

Overriding automatic query optimization