Physical Access Flashcards

Question 1

Q

Physical access to data respect to DBMS architecture and general description of physical access structures and access method manager

Answer

A

Data may be stored on disk in different formats to provide efficient query execution, depending on the needs.

Physical access structures describe how data is stored on disk.

Access method manager: transforms an access plan generated by the optimizer into a sequence of physical access requests to database disk pages using access methods.

Question 2

Q

Access method

Answer

A

Sofware module specialized for a single physical data structure.

It provides primitives for reading and writing data.

Selects the appriopirate blocks of a file to be loaded in memory

Requests them to the buffer manger

Knows the organization of data into a page, so, can find specific tuples and values inside a page.

Question 3

Q

Organization of disk page

Answer

A

Different for different access methods

Divided in:

space available for data
Space reserved for access method control information
Space reserved for file systems control information

Note that:

Tuples may have varying sizes (varchar, null values)
A single tuple may span accross several pages (BLOB, CLOB)

Question 4

Q

Physical structures in relational systems

Answer

A

Physical structures define how data is stored on disk to provide efficient query execution.

Physical data storage:

Sequential structures
Hash structures

Indexing to increase efficiency

Tree structures (B-tree, B+-tree)
Unclustered Hash index
Bitmap index

Question 5

Q

Sequential structures

Answer

A

Tuples are storend in a given sequential order.

Different types of structures implement different ordering criteria.

Structure types:

Heap file (entry sequenced)
- Insert is typically an append.
- All space inside a block is completely exploited before starting a new block.
- Delete or update may cause wasted space.
  - updated tuple may not fit if new values have larger size.
- efficient sequential reading.
- Popular choice jointly used with secondary indices to support search and sort operations
Ordered sequential structure
- Order depends on the value of a key (sort key), that may contain more than one attribute
- Appropriate for sort and group by operations on the sort key
- Search and join on the sort key
- Problem: preserving sort order when inserting new tuples
- Solution: leaving percentage of free space in each block and dynamically re(sorting) in main memory tuples.
- Alternative solution: Overflow file containing tuples which do not fit into the correct block
- Typically used with B+-tree clustered (primary) indices and by DBMS to store intermediate query results.

Question 6

Q

Tree structures

Answer

A

Provide direct access to data based on the value of a key field (can include more attributes).

Does not contrain the physical position of tuples.

Most widespread in relational DBMS.

General characteristics:

One root node
Many intermediate nodes
Nodes have large fan-out
Leaf node provide access to data
- clustered
- unclustered

Question 7

Q

B-Tree and B⁺-Tree

Answer

A

B stands for balanced

Leaves are all at the same distance from root
Access time is constant, regardless of the search value

B-Tree

Data pages are reached only through key values by visiting the tree

B⁺-Tree

Provides a link structure allowing sequentail access in the sort order of key values

Clustered

Constraints the physical position of tuples in a given leaf node
- position may be modified by splitting the node when it is full
Also called key sequenced
Typically used for primary key indexing

Unclustered

Leaf contains physical pointers to actual data, but not the data itself.
- position of tuples in a file is unconstrained
Used for secondary indices.

Pros:

Very efficient for range queries
Appropriate for sequantial scan in the order of the key field (clustered, otherwise not guaranteed)

Cons:

Insertions may require a split of a leaf
- possibly also of intermediate nodes
- computationally intensive
Deletions may requier merging uncrowded nodes and rebalancing

Question 8

Q

Bitmap index

Answer

A

Guarantees direct and efficient access to daa based on the value of a key field. (based on the bit matrix)

The bit matrix references data rows by means of RIDs (Row IDentifiers).

Actual data is stored on separate structure

The matrix has a column for each different value of the indexed attribute, and one row for each tuple. The position (i, j) of the matrix is 1 if tuple i takes value j, 0 otherwise.

Pros:

Efficient for boolean expressions
Appropriate for attributes with limited domain cardinality

Cons:

Note feasible to use for continuos attributes
Required space grows sigificantly with domain cardinality.

Question 9

Q

Hash Structure (clustered and unclustered)

Answer

A

It guarantees direct and efficient access to data based on the value of a key field (one or more attributes)
Suppose the hash structure has B blocks
- the hash functions returns a value between 0 and B - 1
- Blocks should never be completely filled
  - To allow new data insertion

Pros:

Very effiecient for equality queries.
No sorting of disk blocks is required.

Cons:

Inefficient for range queries
Collision may occur

Unclustered hash index:

Actual data is stored in a separate structure
Position of tuples is not constrained by a block

Physical Access Flashcards

(9 cards)