Chapter 5 Data Storage Flashcards

Question

What is a JOIN query?

Answer 1

Combines rows from two or more tables based on *conditions*.

Answer 2

Simple hash function using *division remainder*. ## Footnote [📖 Hash Fx?](https://g.co/gemini/share/684d25b157fb)

Answer 3

Inserts that quickly *determines row location* ## Footnote via hash key.

Answer 4

Person managing database structures and performance. ## Footnote Aka root account

Answer 5

Method of organizing data **across storage**.

Answer 6

Process of assigning **additional storage** blocks.

Answer 7

A column shared by all interleaved tables. ## Footnote [Interleaved](https://share.evernote.com/note/37353648-96ef-ef41-af1f-3e8e3cf44573)

Answer 8

*Arrangement of data* in database tables.

Answer 9

Index with *entries for every* table _row_. ## Footnote [📖](https://share.evernote.com/note/9a765c9b-7d94-960b-5805-b1c3e916cc94)

Answer 10

Index with entries for every table _block_. ## Footnote [📖](https://share.evernote.com/note/9a765c9b-7d94-960b-5805-b1c3e916cc94)

Answer 11

**Pointer** to a specific data location.

Answer 12

Time taken to read a block from disk.

Answer 13

Efficiency of retrieving data from tables.

Answer 14

Uses bits to represent **data presence** in rows.

Answer 15

Logical index that requires *additional read*. ## Footnote Slower than physical.

Answer 16

Often retained in memory *to speed access*.

Answer 17

Data structure using hash functions for indexing. ## Footnote Indexing: assigns rows to buckets

Answer 18

Indexes stored in the *same tablespace* as tables.

Answer 19

Simultaneous modifications to multiple tables.

Answer 20

Scattering of **data blocks** across storage.

Answer 21

Fast storage media refers to storage hardware that *can be accessed quickly*, improving performance for frequently used data.

Answer 22

512 bytes ## Footnote Magnetic memory typically uses 512-byte sectors for data storage.

Answer 23

Approximately 2000 sectors ## Footnote One megabyte requires 1,000,000 bytes, which when divided by 512 bytes/sector results in about 2000 sectors. [📖](https://g.co/gemini/share/3046018fc922)

Answer 24

- Older standard: 512 bytes per sector - Newer standard: 4096 bytes (4KB) per sector

Answer 25

250 sectors ## Footnote [📖](https://g.co/gemini/share/eb145a226a47)

Answer 26

8 kilobytes ## Footnote A minimum of one eight-kilobyte block **must be transferred into memory**, despite reading four kilobytes from flash memory.

Answer 27

2 kilobytes ## Footnote Flash memory page size is typically two kilobytes.

Answer 28

1. One block 2. Even though two pages (4 kilobytes total) are read, only one block of eight kilobytes is transferred. ## Footnote [📖](https://g.co/gemini/share/2e49ea477829)

Answer 29

**Data transfers** to the database occurs in blocks, not individual bytes.

Answer 30

Transactional applications.

Answer 31

A storage method *where an entire row* is stored within one block.

Answer 32

When *row size is small* relative to block size.

Answer 33

* Improved query performance * Less wasted storage

Answer 34

Wasted space is insignificant.

Answer 35

Each *row contains a link* to the large column stored separately.

Answer 36

Analytic applications.

Answer 37

A storage method where *each block stores values* for a single column only.

Answer 38

* Faster data access * Better data compression ## Footnote Also use the same data type

Answer 39

Because all values have the same data type.

Answer 40

1. Column-oriented storage is _bad_ for *transactions needing full row access*. 2. It requires accessing multiple blocks to retrieve a single row. 3. This is because *data is stored by column*, not by row.

Answer 41

* PostgreSQL * Vertica

Answer 42

1. NoSQL databases are built for *managing large amounts of unstructured data*, emphasizing high scalability and availability. 2. Transactional applications, on the other hand, require strong consistency and ACID properties, which are not typically prioritized in NoSQL databases. ## Footnote (atomicity, consistency, isolation, durability)

Answer 43

1. Mean a technique for organizing data on storage media. 2. Sometimes these terms mean a type of NoSQL database, commonly called *wide column database*.

Answer 44

Row-oriented storage

Answer 45

* Heap table * Sorted table * Hash table * Table cluster

Answer 46

A table where no order is imposed on rows ## Footnote [📖](https://share.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 47

Maintains a **list** of blocks and the **address** of the *first available* space for inserts ## Footnote [📖](https://lite.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 48

The database allocates a new block for inserts

Answer 49

The *space occupied by the row* is marked as free ## Footnote [📖](https://share.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 50

As a linked list ## Footnote [📖](https://share.evernote.com/note/998024b0-3288-6d85-354e-cfce28814768)

Answer 51

Insert operations ## Footnote [📖](https://lite.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 52

Database administrators can override the default structure to optimize performance for specific queries.

Answer 53

A heap table is a table structure with no specific order for rows, where *rows are stored in the order they are inserted*.

Answer 54

1. The database maintains a list of blocks and the address of the first available space for inserts. 2. If blocks are full, a new block is allocated.

Answer 55

The available *free space* defined as a space from the deleted row; is recorded and managed using a separate **linked list data structure**.

Answer 56

Heap tables **optimize insert operations** and are particularly fast for bulk loads, but are *not optimal for queries that require rows in a specific order*.

Answer 57

A sorted table organizes rows based on a **designated sort column**, typically the primary key.

Answer 58

1. Rows are assigned to **blocks** according to *the value of the sort column*. 2. The databases maintain **pointers** to efficiently manage order during inserts.

Answer 59

Sorted tables are optimal for **read queries** that *utilize the sort column*, such as `JOIN`s and `SELECT` *with ranges*.

Answer 60

A hash table assigns rows to **buckets** based on a hash key, *usually the primary key*, using a hash function.

Answer 61

Rows are distributed across buckets using a hash function, where *each bucket contains a block or group of blocks*.

Answer 62

Hash tables are *optimal for inserting and deleting individual rows quickly*, but are inefficient for range-based queries.

Answer 63

A table cluster interleaves rows from related tables **based on a cluster key**, typically involving a primary key and *a corresponding foreign key*. ## Footnote [📖](https://lite.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 64

Table clusters optimize queries that join on the cluster key by *ensuring that related rows are stored physically close together*. ## Footnote [📖](https://lite.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 65

Table clusters are not optimal for queries that join on non-cluster keys or for *reading multiple rows from a single table*. ## Footnote [📖](https://lite.evernote.com/note/9223d2c9-49d5-6ab7-ea38-ff63351a4fc7)

Answer 66

* Heap Table * Sorted Table * Hash Table * Table Cluster

Answer 67

read-heavy

Answer 68

1. When you query a **heap table**, the database system may *return rows in _any order_ it finds them*. 2. This order might be based on the physical storage location or the order in which rows were inserted, but it's not predictable or reliable.

Answer 69

When a block is full, *it splits to create space* for new inserts.

Answer 70

A type of database table structure that allows for **dynamic storage of data** with free space management.

Answer 71

The first inserted row is placed at the location *pointed to by the free space pointer*, which is then reset to point to free space B. ## Footnote [Order 📖](https://g.co/gemini/share/e5abf32979c8)

Answer 72

The free space pointer points to free space C. ## Footnote [Order 📖](https://g.co/gemini/share/e5abf32979c8)

Answer 73

It is reset to the space at the end of the block. (Right side) ## Footnote [🤔](https://g.co/gemini/share/d3f36d793314)

Answer 74

To track available space for new row inserts.

Answer 75

* Free space A * Free space B * Free space C * New block

Answer 76

It allows for continued storage of data when the current block becomes full.

Answer 77

The fourth insert goes to the beginning of the new block.

Answer 78

1. Convert the hash key by *interpreting the key's bits* as an integer value. 2. Divide the *integer* by the *number of buckets*. 3. Interpret the division remainder as the *bucket number*. 4. Convert the bucket number to the physical *address of the block* containing the row.

Answer 79

A scheme for organizing rows *in blocks* on storage media.

Answer 80

* Heap table * Sorted table * Hash table * Table cluster

Answer 81

Databases assign a default structure to all tables.

Answer 82

Yes, to optimize performance for specific queries.

Answer 83

No order is imposed on rows; a list of blocks and first available space *for inserts is maintained*.

Answer 84

They are particularly fast for **bulk load** of many rows since rows are stored in load order.

Answer 85

No, they are not optimal for such queries as rows are scattered randomly.

Answer 86

To compute the bucket, *containing the row* from the hash key. ## Footnote [📖](https://share.evernote.com/note/90eb6b03-e448-8090-ff3f-bb88abf8f9a4)

Answer 87

It determines the bucket number *from the hash key*.

Answer 88

Buckets that contain *long chains of linked blocks* due to fixed hash function allocation. ## Footnote [📖 Fixed](https://share.evernote.com/note/90eb6b03-e448-8090-ff3f-bb88abf8f9a4)

Answer 89

They *automatically allocate more blocks* and distribute rows across all buckets. ## Footnote [📖](https://share.evernote.com/note/90eb6b03-e448-8090-ff3f-bb88abf8f9a4)

Answer 90

* Inserts and deletes of individual rows * Selecting a single row *with specified hash key* value

Answer 91

They are slow on queries that select many rows with a *range of values*.

Answer 92

A sort column identified by the database designer.

Answer 93

Usually the primary key, but can be a non-key column or group of columns.

Answer 94

Maintaining *correct sort order* can be slow during inserts and updates.

Answer 95

The block splits in two, moving *half the rows* to a new block.

Answer 96

Multi-tables.

Answer 97

Rows of two or more tables are *interleaved in the same storage area* based on a cluster key.

Answer 98

It determines the order in which rows are interleaved.

Answer 99

They are optimal *when joining interleaved tables* on the cluster key.

Answer 100

1. Retrieving many rows from a table within the cluster, when *they are not related by the cluster key*, can be inefficient. 2. Joining tables using columns that aren't the primary *key of the cluster* can be slow. 3. Changing the value of the cluster key in a row requires moving the row to a different physical location in the storage, which can be expensive.

Answer 101

Number of rows `/` rows *per table block*

Answer 102

1. Number of index blocks 2. `+` referenced table blocks

Answer 103

1. log base 2 (number of index blocks) 2. `+` referenced table blocks ## Footnote This means you calculate the log base 2 of the number of index blocks

Answer 104

1. log base *fan-out* (number of rows), rounded up 2. `+` referenced table blocks ## Footnote This means you calculate the log base of the number of rows +..

Answer 105

1. log base *fan-out* (number of rows / rows per table block), rounded up,`+` 2. referenced table block

Answer 106

No, hash tables cannot have a primary index ## Footnote Rows of a hash table are not stored in sort order.

Answer 107

Secondary indexes ## Footnote Secondary indexes can be any structure, including hash.

Answer 108

As a sparse multi-level index ## Footnote In principle, a primary index can be structured as a hash index.

Answer 109

Applying a hash function to a hash key ## Footnote Hash keys do not store table block pointers.

Answer 110

Hash keys are typically implemented as single-level indexes ## Footnote This means that the hash function *directly maps the key to an index position* within a hash table, without requiring any intermediate lookups.

Answer 111

Yes, a hash index can be sparse. ## Footnote meaning it doesn't include entries for all documents in a collection

Answer 112

An index entry in a bucket.

Answer 113

No, hash indexes are never sparse.

Answer 114

hash index.

Answer 115

1. Apply the hash function to the column value to compute a bucket number 2. Read the index blocks for the bucket number. 3. Find the index entry for the column value and read the table block pointer. 4. Read the table block containing the row.

Answer 116

1. Determine the index column coresponding to the table value. 2. Read the index column and find index rows that are set to 'one'. 3. Determine table rows corresponding to the index rows. 4. Determine pointers to blocks containing the table rows.

Answer 117

1. Look up the column value in the logical index to find the primary key value. 2. Look up the primary key value in the primary index to find the table block pointer. 3. Read the table block containing the row.

Answer 118

1. Specify a function on the column value to transform stored values. 2. Use the transformed values in the index to process queries instead of the original column values.

Answer 119

A partition is a subset of table data

Answer 120

Associates each partition with a range of partition expression values using VALUES LESS THAN keywords

Answer 121

The highest range

Answer 122

Each partition is explicitly named by the database administrator

Answer 123

Associates each partition with an explicit list of partition expression values using the VALUES IN keywords

Answer 124

Hash partition

Answer 125

(partition expression value) modulo N

Answer 126

Similar to a hash partition, but the partition expression is determined automatically by the database

Answer 127

Automatically create one tablespace for each table

Answer 128

Improves query performance

Answer 129

The associated file is deleted and storage is released

Answer 130

Queries that scan tables on heavily fragmented files are slow

Answer 131

Minimizes fragmentation and optimizes table scans

Answer 132

By reducing the amount of data accessed by INSERT, UPDATE, DELETE, and SELECT statements

Answer 133

Horizontal partitioning is a subset of table rows; vertical partitioning is a subset of table columns

Answer 134

Horizontal partitioning

Answer 135

Table indexes are also partitioned

Answer 136

A partitioned table may not contain foreign keys and foreign keys may not refer to a partitioned table ## Footnote This restriction limits the usability of partitions in MySQL.

Answer 137

All partition columns must appear in all unique columns, including the primary key, of the partitioned table ## Footnote This requirement can complicate table design and limits flexibility.

Answer 138

Due to the restrictions and requirements, partitions have limited value in MySQL.

Answer 139

InnoDB and NDB ## Footnote MyISAM does not provide native partitioning support.

Answer 140

Insights into the partitioned tables and their structures ## Footnote This table helps users understand how their partitioning is set up.

Answer 141

False ## Footnote MyISAM does not provide native partitioning support.

Answer 142

User awareness ## Footnote This helps users understand the constraints they may encounter.

Answer 143

Tables, columns, and keys ## Footnote Logical design is an essential part of database schema design.

Answer 144

Indexes, table structures, and partitions ## Footnote Physical design focuses on how data is stored and accessed.

Answer 145

It affects query performance but never affects query results ## Footnote This distinction is crucial for database design.

Answer 146

Translates instructions generated by a query processor into low-level commands that access data on storage media ## Footnote Storage engines manage how data is stored and retrieved.

Answer 147

InnoDB ## Footnote InnoDB is widely used for its transaction support and referential integrity features.

Answer 148

* Full support for transaction management * Foreign keys * Referential integrity * Locking ## Footnote These features make InnoDB suitable for many applications requiring data integrity.

Answer 149

Limited transaction management and locking capabilities ## Footnote MyISAM is often used in scenarios with fewer data updates.

Answer 150

Analytic applications with limited data updates ## Footnote MyISAM's design is better suited for read-heavy workloads.

Answer 151

Stores all data in main memory ## Footnote MEMORY is used for fast access with databases small enough to fit in main memory.

Answer 152

* Heap * Sorted * Hash * Cluster ## Footnote Oracle's flexibility in table structures is beneficial for various applications.

Answer 153

* Heap * Sorted ## Footnote InnoDB's support is limited compared to other databases.

Answer 154

B+tree indexes ## Footnote B+tree indexes are commonly used for efficient data retrieval.

Answer 155

* B+tree * Hash ## Footnote The MEMORY engine's support for hash indexes allows for faster lookups in certain scenarios.

Answer 156

The type of the query being executed, such as SIMPLE, PRIMARY, or SUBQUERY. ## Footnote - SIMPLE: A straightforward query without nesting or unions. - PRIMARY: The main or outer SELECT in a nested query. - SUBQUERY: An inner SELECT inside a nested query.

Answer 157

The name of the table being described in that row of the EXPLAIN result. ## Footnote This helps identify which table's data is being analyzed in the query.

Answer 158

The join type being used in the query, such as const, range, eq_ref, or ALL. ## Footnote - const: The table has a row that matches at most one condition. - range: The query uses a constant to filter rows. - eq_ref: One row from this table is selected for each matching row of another table. - ALL: Every row from the table is scanned.

Answer 159

Indexes that could potentially be used to speed up the query.

Answer 160

The specific index that the database has decided to use for the query. ## Footnote If it's NULL, no index was used and a full table scan was performed.

Answer 161

Constants or expressions compared with the selected index.

Answer 162

How many rows will be read from the table to execute the query.

Answer 163

The number of rows that qualify based on the query conditions.

Chapter 5 Data Storage Flashcards

(194 cards)