Physical Access Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Physical access to data respect to DBMS architecture and general description of physical access structures and access method manager

A

Data may be stored on disk in different formats to provide efficient query execution, depending on the needs.

Physical access structures describe how data is stored on disk.

Access method manager: transforms an access plan generated by the optimizer into a sequence of physical access requests to database disk pages using access methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Access method

A

Sofware module specialized for a single physical data structure.

It provides primitives for reading and writing data.

Selects the appriopirate blocks of a file to be loaded in memory

Requests them to the buffer manger

Knows the organization of data into a page, so, can find specific tuples and values inside a page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Organization of disk page

A

Different for different access methods

Divided in:

  • space available for data
  • Space reserved for access method control information
  • Space reserved for file systems control information

Note that:

  • Tuples may have varying sizes (varchar, null values)
  • A single tuple may span accross several pages (BLOB, CLOB)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Physical structures in relational systems

A

Physical structures define how data is stored on disk to provide efficient query execution.

Physical data storage:

  • Sequential structures
  • Hash structures

Indexing to increase efficiency

  • Tree structures (B-tree, B+-tree)
  • Unclustered Hash index
  • Bitmap index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sequential structures

A

Tuples are storend in a given sequential order.

Different types of structures implement different ordering criteria.

Structure types:

  • Heap file (entry sequenced)
    • Insert is typically an append.
    • All space inside a block is completely exploited before starting a new block.
    • Delete or update may cause wasted space.
      • updated tuple may not fit if new values have larger size.
    • efficient sequential reading.
    • Popular choice jointly used with secondary indices to support search and sort operations
  • Ordered sequential structure
    • Order depends on the value of a key (sort key), that may contain more than one attribute
    • Appropriate for sort and group by operations on the sort key
    • Search and join on the sort key
    • Problem: preserving sort order when inserting new tuples
    • Solution: leaving percentage of free space in each block and dynamically re(sorting) in main memory tuples.
    • Alternative solution: Overflow file containing tuples which do not fit into the correct block
    • Typically used with B+-tree clustered (primary) indices and by DBMS to store intermediate query results.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tree structures

A

Provide direct access to data based on the value of a key field (can include more attributes).

Does not contrain the physical position of tuples.

Most widespread in relational DBMS.

General characteristics:

  • One root node
  • Many intermediate nodes
  • Nodes have large fan-out
  • Leaf node provide access to data
    • clustered
    • unclustered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

B-Tree and B+-Tree

A

B stands for balanced

  • Leaves are all at the same distance from root
  • Access time is constant, regardless of the search value

B-Tree

  • Data pages are reached only through key values by visiting the tree

B+-Tree

  • Provides a link structure allowing sequentail access in the sort order of key values

Clustered

  • Constraints the physical position of tuples in a given leaf node
    • position may be modified by splitting the node when it is full
  • Also called key sequenced
  • Typically used for primary key indexing

Unclustered

  • Leaf contains physical pointers to actual data, but not the data itself.
    • position of tuples in a file is unconstrained
  • Used for secondary indices.

Pros:

  • Very efficient for range queries
  • Appropriate for sequantial scan in the order of the key field (clustered, otherwise not guaranteed)

Cons:

  • Insertions may require a split of a leaf
    • possibly also of intermediate nodes
    • computationally intensive
  • Deletions may requier merging uncrowded nodes and rebalancing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bitmap index

A

Guarantees direct and efficient access to daa based on the value of a key field. (based on the bit matrix)

The bit matrix references data rows by means of RIDs (Row IDentifiers).

  • Actual data is stored on separate structure

The matrix has a column for each different value of the indexed attribute, and one row for each tuple. The position (i, j) of the matrix is 1 if tuple i takes value j, 0 otherwise.

Pros:

  • Efficient for boolean expressions
  • Appropriate for attributes with limited domain cardinality

Cons:

  • Note feasible to use for continuos attributes
  • Required space grows sigificantly with domain cardinality.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hash Structure (clustered and unclustered)

A
  • It guarantees direct and efficient access to data based on the value of a key field (one or more attributes)
  • Suppose the hash structure has B blocks
    • the hash functions returns a value between 0 and B - 1
    • Blocks should never be completely filled
      • To allow new data insertion

Pros:

  • Very effiecient for equality queries.
  • No sorting of disk blocks is required.

Cons:

  • Inefficient for range queries
  • Collision may occur

Unclustered hash index:

  • Actual data is stored in a separate structure
  • Position of tuples is not constrained by a block
How well did you know this?
1
Not at all
2
3
4
5
Perfectly