L11: Indexing (2019) Flashcards

Question 1

Q

Input/Output Operations:

Costs of Searching over Sorted Records:

Simple Scan
Binary Search

Answer

A

Simple Scan:

Cost = O(N)

Binary Search:

Cost = O(lgN)

Question 2

Q

DBMS Buffer:

Basic Idea

Answer

A

Database has it’s own I/O Buffer.

Separate from OS Buffer.

The Buffer is handled by a Buffer Manager

Motivation:

DB knows more about access patterns
Recovery and Logging require the ability to flush to disk

Question 3

Q

DBMS Buffer:

Buffer Manager

Answer

A

Buffer Manager

Handles supporting operations for the DB’s I/O Buffer
Primarily handles and executes the Replacement Policy
- Finds which pages to flush/release when the buffer fails
A DBMS typically implements its own buffer management routines, separate from the OS

Question 4

Q

Simplified File System:

Basic Components

Answer

A

Files consist of pages, pages consist of blocks

Block

The basic “chunk” of memory on the disk

Page

A fixed size array of memory
One or more disk Blocks
Interface:
- Write to an entry( called a Slot), or set to “None”

File

Variable Length list of Pages
Interface:
- create/open/close; next_page(); etc…

Question 5

Q

Why are Sort Algorithms

important in a Database?

Answer

A

Data requested in Sorted Order is very common in Databases
Quick Sort is awful for large amounts of data
Sorting is useful for eliminating duplicate copies in a collection of records
Sorting is the first step in Bulk Loading, and B+ Tree Index
Sort-Merge Join algorithm involves Sorting

Question 6

Q

Sorting Big Files:

Basic Process

Answer

A

Split into chunks small enough to sort in memory
- Chunks are called Runs
Merge pairs(or groups) of runs using the External Merge Algorithm
Keep merging the resulting runs until left with one sorted file
- Each time merging is called a Pass

Question 7

Q

External Merge Sort Algorithm:

Example:

3 Buffer Pages

6 Page File

Answer

A

Steps:

Split the file ( 2 sets of 3 pages)
Sort Externally
Load First page, f₁ , in buffer (Run # 1), and sort
Repeat for Second page, f₂ , in buffer (Run #2)
Run External Merge Sort Algorithm to put original file back together

Question 8

Q

External Merge Sorting Algorithm:

Calculating I/O Cost

Answer

A

Two Main Cost Factors:

*assume 1 read(R), 1 Write(W) per page

Cost of Loading, C_loading

C_loading = (#runs)*(#pages)*(1R + 1W) = 2RP

Cost of Merging, C_merging

C_merging = (#runs)*(#pages)*(1R + 1W) = 2RP

Total I/O Cost = C_loading + C_Merging

Question 9

Q

General I/O Cost of

Splitting N-page file

into N single page runs

and sorting

Answer

A

First Pass:

Merge N/2 pairs of runs, each with length of 1 page

Second Pass:

Merge N/4 Pairs of runs, each with length of 2 pages

Generalized:

For N pages, we perform lgN passes.
Also perform 1 pass for initial split and sort.
Each pass involves reading/writing:
- 2N I/O operations

Total I/O Cost = 2N*(lgN + 1)

Question 10

Q

Benefits of

Using Indexes

Answer

A

Much faster searching than simple scan or binary search
Faster insertions by avoiding shifting operations
Search quickly among multiple attributes
Indexes are critical across all DBMS types

Question 11

Q

Index:

Overview

Answer

A

Index:

A data structure mapping search keys to sets of rows in a database table.

An index on a file speeds up selections on Search Key fields
- A Search Key can be any subset of fields
- Completely separate from Keys of a relation
Provides efficient lookup and retrieval by search key values
An index can store either the full rows it points to
- Called a Primary Index
Or pointers to those rows
- Called a Secondary Index

Question 12

Q

Major Operations

on an Index

Answer

A

Search
- Quickly find all records which meet some conditions on the Search Key attribute
Insert/Remove entries
- Including Bulk Load and Bulk Delete

Question 13

Q

What does the term

“Covering” mean

wrt an Index?

Answer

A

If an Index contains all the needed attributes for a specific query,

the Index is said to Cover, or be Covering, for that query.

Basically, the query can be answered by using the Index alone.

Question 14

Q

High Level Categories

of

Index Types

(Data Structures)

Answer

A

B-Trees (B+ Trees)
- Very good for range queries, sorted data
- Old databases used B-trees
- Modern databases use B+ trees
Hash Tables
- Variants of this basic structure to deal with I/O
- Called Linear or Extensible Hashing

*Both these data structures are “IO aware”

Question 15

Q

Buffer Operations

Answer

A

Read( Page )
- Read a page from disk
- Place in buffer if not already in buffer
Flush( Page )
- Evict page from the buffer
- write to disk
Release( Page )
- Evict page from the buffer without writing to disk

Question 16

Q

Disk vs Main Memory(RAM)

Answer

Study These Flashcards

A

Speed
- Disk is slow:
  - Sequential block access
    - Blocks read one at a time
  - Reads/Writes are expensive
- Main Memory is fast
  - Random Access, byte addressable
  - 10x faster for sequential access
  - 100,000x faster for Random Access
Data Safety
- Disk storage is durable
  - Once on disk, data is safe
- Storage in RAM is Volatile
  - Data is lost if a crash or power loss occurs
Cost
- Large capacity Disks are cheap
- RAM is expensive

Question 17

Q

Buffer:

Overview

Answer

Study These Flashcards

A

Buffer:

A region of physical memory used to store Temporary Data

More specifically,

Region in Main Memory used to store Intermediate Data between the disk and processes

Key Idea:

Reading/Writing to the disk is slow, so data is cached for reading/writing in the Buffer

L11: Indexing (2019) Flashcards

(17 cards)