L11: Indexing (2019) Flashcards

1
Q

Input/Output Operations:

Costs of Searching over Sorted Records:

  • Simple Scan
  • Binary Search
A

Simple Scan:

Cost = O(N)

Binary Search:

Cost = O(lgN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DBMS Buffer:

Basic Idea

A

Database has it’s own I/O Buffer.

Separate from OS Buffer.

The Buffer is handled by a Buffer Manager

Motivation:

  • DB knows more about access patterns
  • Recovery and Logging require the ability to flush to disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DBMS Buffer:

Buffer Manager

A

Buffer Manager

  • Handles supporting operations for the DB’s I/O Buffer
  • Primarily handles and executes the Replacement Policy
    • Finds which pages to flush/release when the buffer fails
  • A DBMS typically implements its own buffer management routines, separate from the OS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Simplified File System:

Basic Components

A

Files consist of pages, pages consist of blocks

Block

The basic “chunk” of memory on the disk

Page

  • A fixed size array of memory
  • One or more disk Blocks
  • Interface:
    • Write to an entry( called a Slot), or set to “None”

File

  • Variable Length list of Pages
  • Interface:
    • create/open/close; next_page(); etc…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are Sort Algorithms

important in a Database?

A
  • Data requested in Sorted Order is very common in Databases
  • Quick Sort is awful for large amounts of data
  • Sorting is useful for eliminating duplicate copies in a collection of records
  • Sorting is the first step in Bulk Loading, and B+ Tree Index
  • Sort-Merge Join algorithm involves Sorting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sorting Big Files:

Basic Process

A
  1. Split into chunks small enough to sort in memory
    • Chunks are called Runs
  2. Merge pairs(or groups) of runs using the External Merge Algorithm
  3. Keep merging the resulting runs until left with one sorted file
    • Each time merging is called a Pass
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

External Merge Sort Algorithm:

Example:

3 Buffer Pages

6 Page File

A

Steps:

  1. Split the file ( 2 sets of 3 pages)
  2. Sort Externally
  3. Load First page, f1 , in buffer (Run # 1), and sort
  4. Repeat for Second page, f2 , in buffer (Run #2)
  5. Run External Merge Sort Algorithm to put original file back together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

External Merge Sorting Algorithm:

Calculating I/O Cost

A

Two Main Cost Factors:

*assume 1 read(R), 1 Write(W) per page

Cost of Loading, Cloading

Cloading = (#runs)*(#pages)*(1R + 1W) = 2RP

Cost of Merging, Cmerging

Cmerging = (#runs)*(#pages)*(1R + 1W) = 2RP

Total I/O Cost = Cloading + CMerging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

General I/O Cost of

Splitting N-page file

into N single page runs

and sorting

A

First Pass:

Merge N/2 pairs of runs, each with length of 1 page

Second Pass:

Merge N/4 Pairs of runs, each with length of 2 pages

Generalized:

  • For N pages, we perform lgN passes.
  • Also perform 1 pass for initial split and sort.
  • Each pass involves reading/writing:
    • 2N I/O operations

Total I/O Cost = 2N*(lgN + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Benefits of

Using Indexes

A
  • Much faster searching than simple scan or binary search
  • Faster insertions by avoiding shifting operations
  • Search quickly among multiple attributes
  • Indexes are critical across all DBMS types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Index:

Overview

A

Index:

A data structure mapping search keys to sets of rows in a database table.

  • An index on a file speeds up selections on Search Key fields
    • A Search Key can be any subset of fields
    • Completely separate from Keys of a relation
  • Provides efficient lookup and retrieval by search key values
  • An index can store either the full rows it points to
    • Called a Primary Index
  • Or pointers to those rows
    • Called a Secondary Index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Major Operations

on an Index

A
  • Search
    • Quickly find all records which meet some conditions on the Search Key attribute
  • Insert/Remove entries
    • Including Bulk Load and Bulk Delete
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term

“Covering” mean

wrt an Index?

A

If an Index contains all the needed attributes for a specific query,

the Index is said to Cover, or be Covering, for that query.

Basically, the query can be answered by using the Index alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

High Level Categories

of

Index Types

(Data Structures)

A
  • B-Trees (B+ Trees)
    • Very good for range queries, sorted data
    • Old databases used B-trees
    • Modern databases use B+ trees
  • Hash Tables
    • Variants of this basic structure to deal with I/O
    • Called Linear or Extensible Hashing

*Both these data structures are “IO aware

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Buffer Operations

A
  • Read( Page )
    • Read a page from disk
    • Place in buffer if not already in buffer
  • Flush( Page )
    • Evict page from the buffer
    • write to disk
  • Release( Page )
    • Evict page from the buffer without writing to disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Disk vs Main Memory(RAM)

A
  • Speed
    • Disk is slow:
      • Sequential block access
        • Blocks read one at a time
      • Reads/Writes are expensive
    • Main Memory is fast
      • Random Access, byte addressable
      • 10x faster for sequential access
      • 100,000x faster for Random Access
  • Data Safety
    • Disk storage is durable
      • Once on disk, data is safe
    • Storage in RAM is Volatile
      • Data is lost if a crash or power loss occurs
  • Cost
    • Large capacity Disks are cheap
    • RAM is expensive
17
Q

Buffer:

Overview

A

Buffer:

A region of physical memory used to store Temporary Data

More specifically,

Region in Main Memory used to store Intermediate Data between the disk and processes

Key Idea:

Reading/Writing to the disk is slow, so data is cached for reading/writing in the Buffer