Chapter 5 Data Storage Flashcards
What is transfer rate?
Speed of data reading or writing.
What is cost in terms of storage media?
Price per gigabyte of storage media.
What does capacity refer to?
Amount of data a media can store.
What is non-volatile memory?
Memory retained without power.
What are sectors?
Data groups on magnetic disks, traditionally 512 bytes.
Also hard-disk drive (HDD)
What are pages in flash memory?
Flash aka RAM
Data groups in flash memory, 2-16 kilobytes.
What are blocks in databases?
Uniform size for data transfer in databases.
What is a storage controller?
The storage controller needs to be able to convert between blocks and the appropriate unit of storage for the type of storage device being used.
Usually between blocks and sectors.
What is block size?
Uniform size specified by database administrator.
What are transactional applications?
Access few rows per query, prefer smaller blocks.
What is the standard block size in storage systems?
4 Kilobytes.
What is bulk load?
Fast insertion of multiple rows into a table.
What is an SQL query?
Command to extract or manipulate data.
What is a primary key?
Unique identifier for each row in a table.
What is a NoSQL database?
Non-relational database for unstructured data.
What is a free space pointer?
Reference to available storage in a table.
What is a free space linked list?
- The primary purpose of a free space linked list is to manage free memory blocks efficiently.
- When memory is allocated to an application, this data structure helps keep track of what portions of memory are available for future allocations.
What is an insert operation?
Adding data to the first available space.
What is an update operation?
Modifying existing row data in a table.
What is a JOIN query?
Combines rows from two or more tables based on conditions.
What are efficient inserts?
Inserts that quickly determines row location
via hash key.
Who is a database administrator?
Person managing database structures and performance.
Aka root account
What is data distribution?
Method of organizing data across storage.
What is block allocation?
Process of assigning additional storage blocks.
What is table structure?
Arrangement of data in database tables.
What is an index entry?
Data Address: Data value
Pointer to a specific data location.
What is block read time?
Time taken to read a block from disk.
What is query performance?
Efficiency of retrieving data from tables.
What is a bitmap index?
Uses bits to represent data presence in rows.
What is read query performance?
Logical index that requires additional read.
Slower than physical.
What is a hash table?
Data structure using hash functions for indexing.
Indexing: assigns rows to buckets
What is index storage?
Indexes stored in the same tablespace as tables.
What are concurrent updates?
Simultaneous modifications to multiple tables.
What is fragmentation?
Scattering of data blocks across storage.
What is fast storage media?
Fast storage media refers to storage hardware that can be accessed quickly, improving performance for frequently used data.
What is the size of sectors used in magnetic memory?
512 bytes
Magnetic memory typically uses 512-byte sectors for data storage.
How many sectors are required to store one megabyte in magnetic memory?
Approximately 2000 sectors
One megabyte requires 1,000,000 bytes, which when divided by 512 bytes/sector results in about 2000 sectors. 📖
What is the size of sectors used in newer storage systems?
- Older standard: 512 bytes per sector
- Newer standard: 4096 bytes (4KB) per sector
How many sectors are required to store one megabyte in newer systems with 4KB sectors?
250 sectors
What is the minimum block size that must be transferred into memory from flash memory?
8 kilobytes
A minimum of one eight-kilobyte block must be transferred into memory, despite reading four kilobytes from flash memory.
What is the page size of flash memory?
2 kilobytes
Flash memory page size is typically two kilobytes.
If a user runs a query that reads two pages of flash memory, how many blocks are transferred to main memory?
- One block
- Even though two pages (4 kilobytes total) are read, only one block of eight kilobytes is transferred.
What is transferred to the database in blocks.
Data transfers to the database occurs in blocks, not individual bytes.
What type of applications are most relational databases optimized for?
Transactional applications.
What is row-oriented storage?
A storage method where an entire row is stored within one block.
When does row-oriented storage perform best?
When row size is small relative to block size.
What are the two reasons row-oriented storage performs best with small row sizes?
- Improved query performance
- Less wasted storage
What happens to storage when row size is small relative to block size?
Wasted space is insignificant.
What is a common approach for tables containing very large columns?
Each row contains a link to the large column stored separately.
What types of applications are some newer relational databases optimized for?
Analytic applications.
What is column-oriented storage?
A storage method where each block stores values for a single column only.
What are the benefits of column-oriented storage for analytic applications?
- Faster data access
- Better data compression
Also use the same data type
Why is data compression more effective in column-oriented storage?
Because all values have the same data type.
What is a disadvantage of column-oriented storage for transactional applications?
column-oriented storage is often associated with NoSQL databases
- Column-oriented storage is bad for transactions needing full row access.
- It requires accessing multiple blocks to retrieve a single row.
- This is because data is stored by column, not by row.
Which two relational databases support column-oriented storage.
- PostgreSQL
- Vertica
Why are NoSQL databases not optimized for transactional applications.
- NoSQL databases are built for managing large amounts of unstructured data, emphasizing high scalability and availability.
- Transactional applications, on the other hand, require strong consistency and ACID properties, which are not typically prioritized in NoSQL databases.
(atomicity, consistency, isolation, durability)
The term column-oriented & columnar represent _ ?
- Mean a technique for organizing data on storage media.
- Sometimes these terms mean a type of NoSQL database, commonly called wide column database.
What type of storage performs better than column-oriented storage for most transactional databases?
Row-oriented storage
What are the four alternative table structures supported by databases?
- Heap table
- Sorted table
- Hash table
- Table cluster
How does a database manage inserts in a heap table?
Maintains a list of blocks and the address of the first available space for inserts
What happens when all blocks in a heap table are full?
The database allocates a new block for inserts
What occurs when a row is deleted from a heap table?
The space occupied by the row is marked as free
- If each table in a database can have a different structure, and databases assign a default structure to all tables.
- How can the default structure of a table in a database be modified?
Database administrators can override the default structure to optimize performance for specific queries.
What is a heap table?
A heap table is a table structure with no specific order for rows, where rows are stored in the order they are inserted.
How does a database manage insertions in a heap table?
- The database maintains a list of blocks and the address of the first available space for inserts.
- If blocks are full, a new block is allocated.
What happens to space when a row is deleted in a heap table?
The available free space defined as a space from the deleted row; is recorded and managed using a separate linked list data structure.
What are the performance characteristics of heap tables?
Heap tables optimize insert operations and are particularly fast for bulk loads, but are not optimal for queries that require rows in a specific order.
What is a sorted table?
A sorted table organizes rows based on a designated sort column, typically the primary key.
How are rows managed in a sorted table?
- Rows are assigned to blocks according to the value of the sort column.
- The databases maintain pointers to efficiently manage order during inserts.
What are the advantages of using sorted tables?
Sorted tables are optimal for read queries that utilize the sort column, such as JOIN
s and SELECT
with ranges.
How does a hash table assign rows?
A hash table assigns rows to buckets based on a hash key, usually the primary key, using a hash function.
Describe the storage mechanism of a hash table.
Rows are distributed across buckets using a hash function, where each bucket contains a block or group of blocks.
What are the performance benefits of hash tables?
Hash tables are optimal for inserting and deleting individual rows quickly, but are inefficient for range-based queries.
What is a table cluster?
A table cluster interleaves rows from related tables based on a cluster key, typically involving a primary key and a corresponding foreign key.
How does a table cluster improve query performance?
Table clusters optimize queries that join on the cluster key by ensuring that related rows are stored physically close together.
What are the limitations of table clusters?
Table clusters are not optimal for queries that join on non-cluster keys or for reading multiple rows from a single table.
What are the different types of table structures in databases?
- Heap Table
- Sorted Table
- Hash Table
- Table Cluster
A sorted table is best for _______ queries.
read-heavy
Which table structure are useless for queries that require specific row ordering.
- When you query a heap table, the database system may return rows in any order it finds them.
- This order might be based on the physical storage location or the order in which rows were inserted, but it’s not predictable or reliable.
What happens when a sorted table’s blocks fill up?
When a block is full, it splits to create space for new inserts.
What is a heap table?
A type of database table structure that allows for dynamic storage of data with free space management.
What happens when the first row is inserted into a heap table?
The first inserted row is placed at the location pointed to by the free space pointer, which is then reset to point to free space B.
After the second insert in a heap table, where does the free space pointer point?
The free space pointer points to free space C.
What happens to the free space pointer after the second insert?
It is reset to the space at the end of the block. (Right side)
What is the purpose of the free space pointer in a heap table?
To track available space for new row inserts.
List the order of free spaces used during the insertions in the heap table.
- Free space A
- Free space B
- Free space C
- New block
What is the significance of allocating a new block in a heap table?
It allows for continued storage of data when the current block becomes full.
What occurs after the fourth insert in a heap table?
The fourth insert goes to the beginning of the new block.
What is the modulo function 4 steps?
- Convert the hash key by interpreting the key’s bits as an integer value.
- Divide the integer by the number of buckets.
- Interpret the division remainder as the bucket number.
- Convert the bucket number to the physical address of the block containing the row.
What is a table structure in a database?
A scheme for organizing rows in blocks on storage media.
What are the four alternative table structures supported by databases?
- Heap table
- Sorted table
- Hash table
- Table cluster
What is the default behavior of databases regarding table structures?
Databases assign a default structure to all tables.
Can database administrators modify the default table structure?
Yes, to optimize performance for specific queries.
Describe the organization of rows in a heap table.
No order is imposed on rows; a list of blocks and first available space for inserts is maintained.
How do heap tables optimize insert operations?
They are particularly fast for bulk load of many rows since rows are stored in load order.
Are heap tables optimal for reading rows in a specific order?
No, they are not optimal for such queries as rows are scattered randomly.
What is the purpose of a hash function in a hash table?
To compute the bucket, containing the row from the hash key.
What does the modulo function do in hash tables?
It determines the bucket number from the hash key.
What are deep buckets in hash tables?
Buckets that contain long chains of linked blocks due to fixed hash function allocation.
What is the advantage of using dynamic hash functions?
They automatically allocate more blocks and distribute rows across all buckets.
When are hash tables optimal?
- Inserts and deletes of individual rows
- Selecting a single row with specified hash key value
Why are hash tables slow for certain queries?
They are slow on queries that select many rows with a range of values.
What determines the physical row order in a sorted table?
A sort column identified by the database designer.
What is the typical use of the sort column in sorted tables?
Usually the primary key, but can be a non-key column or group of columns.
What is a potential downside of maintaining sorted tables?
Maintaining correct sort order can be slow during inserts and updates.
What happens when a block in a sorted table is full?
The block splits in two, moving half the rows to a new block.
What are table clusters also known as?
Multi-tables.
How are rows stored in table clusters?
Rows of two or more tables are interleaved in the same storage area based on a cluster key.
What is the function of the cluster key in table clusters?
It determines the order in which rows are interleaved.
Why are table clusters optimal for certain queries?
They are optimal when joining interleaved tables on the cluster key.
What are the performance issues with table clusters for other queries?
- Join on columns other than cluster key
- Read multiple rows of a single table
- Update cluster key
What is the steps for Calculating Storage Metrics in Database Management
What is the steps for Calculating Storage Metrics in Database Management
Fill in the blanks
What is the formula for a table scan?
Number of rows / rows per table block
How do you calculate a single-level index scan?
Number of index blocks plus referenced table blocks
What is the formula for a single-level index binary search?
log base 2 (number of index blocks) plus referenced table blocks
How is a multi-level dense index search calculated?
log base fan-out (number of rows), rounded up, plus referenced table blocks
What is the formula for a multi-level sparse index search?
log base fan-out (number of rows / rows per table block), rounded up, plus referenced table block
Can a hash table have a primary index?
No, hash tables cannot have a primary index
Rows of a hash table are not stored in sort order.
What type of indexes can a hash table have?
Secondary indexes
Secondary indexes can be any structure, including hash.
How is a primary index usually structured?
As a sparse multi-level index
In principle, a primary index can be structured as a hash index.
What determines the location of a row containing a hash key value?
Applying a hash function to a hash key
Hash keys do not store table block pointers.
Are hash keys implemented as single- or multi-level indexes?
No, hash keys are not implemented as single- or multi-level indexes.
Can a hash index be sparse?
Yes, a hash index can be sparse.
What does each column value appear as in a hash index?
An index entry in a bucket.
Are hash indexes ever sparse?
No, hash indexes are never sparse.
Fill in the blank: A hash table can have a _______ index.
hash index.
- What are the steps to locate rows containing a table value for the following index?
- Hash Index
- Apply the hash function to the column value to compute a bucket number
- Read the index blocks for the bucket number.
- Find the index entry for the column value and read the table block pointer.
- Read the table block containing the row.
- What are the steps to locate rows containing a table value for the following index?
- Bitmap Index
- Determine the index column coresponding to the table value.
- Read the index column and find index rows that are set to ‘one’.
- Determine table rows corresponding to the index rows.
- Determine pointers to blocks containing the table rows.
- What are the steps to locate rows containing a table value for the following index?
- Logical Index
- Look up the column value in the logical index to find the primary key value.
- Look up the primary key value in the primary index to find the table block pointer.
- Read the table block containing the row.
- What are the steps to locate rows containing a table value for the following index?
- Function Index
- Specify a function on the column value to transform stored values.
- Use the transformed values in the index to process queries instead of the original column values.
What is a partition in the context of database tables?
A partition is a subset of table data
How does a range partition assign rows to partitions?
Associates each partition with a range of partition expression values using VALUES LESS THAN keywords
What keyword represents the highest column value in a range partition?
MAXVALUE
What does VALUES LESS THAN MAXVALUE specify in a range partition?
The highest range
How are partitions named in a range partition?
Each partition is explicitly named by the database administrator
What is a list partition?
Associates each partition with an explicit list of partition expression values using the VALUES IN keywords
What type of partition requires a partition expression with positive integer values?
Hash partition
How is the partition number computed in a hash partition?
(partition expression value) modulo N
What is a key partition in MySQL?
Similar to a hash partition, but the partition expression is determined automatically by the database
What is the default behavior for tablespaces in most databases?
Automatically create one tablespace for each table
What is one benefit of storing frequently accessed tables in separate tablespaces?
Improves query performance
What happens to the associated file when a table is dropped?
The associated file is deleted and storage is released
What is the impact of fragmentation on query performance?
Queries that scan tables on heavily fragmented files are slow
What is the advantage of storing one table per file?
Minimizes fragmentation and optimizes table scans
How do partitions improve query performance?
By reducing the amount of data accessed by INSERT, UPDATE, DELETE, and SELECT statements
What is the difference between horizontal and vertical partitioning?
Horizontal partitioning is a subset of table rows; vertical partitioning is a subset of table columns
Which type of partitioning do MySQL and most relational databases use?
Horizontal partitioning
What happens to table indexes when a table is partitioned?
Table indexes are also partitioned
Fill in the blank: Each partition contains index entries only for rows in the _______.
partition
What are the restrictions on partitioned tables in MySQL?
A partitioned table may not contain foreign keys and foreign keys may not refer to a partitioned table
This restriction limits the usability of partitions in MySQL.
What must all partition columns do in MySQL?
All partition columns must appear in all unique columns, including the primary key, of the partitioned table
This requirement can complicate table design and limits flexibility.
Why do partitions lack value in MySQL?
Due to the restrictions and requirements, partitions have limited value in MySQL.
Which storage engines support partitioning in MySQL?
InnoDB and NDB
MyISAM does not provide native partitioning support.
What does the PARTITIONS table in the INFORMATION_SCHEMA database provide?
Insights into the partitioned tables and their structures
This table helps users understand how their partitioning is set up.
True or False: MyISAM supports native partitioning in MySQL.
False
MyISAM does not provide native partitioning support.
What is the purpose of documenting known restrictions and limitations in MySQL 8.0 regarding partitioning?
User awareness
This helps users understand the constraints they may encounter.
Fill in the blank: Partitioning is supported only by _______ and NDB storage engines.
InnoDB
What does logical design specify?
Tables, columns, and keys
Logical design is an essential part of database schema design.
What does physical design specify?
Indexes, table structures, and partitions
Physical design focuses on how data is stored and accessed.
How does physical design affect query performance?
It affects query performance but never affects query results
This distinction is crucial for database design.
What is the role of a storage engine?
Translates instructions generated by a query processor into low-level commands that access data on storage media
Storage engines manage how data is stored and retrieved.
What is the default storage engine installed with MySQL?
InnoDB
InnoDB is widely used for its transaction support and referential integrity features.
What are the key features of InnoDB?
- Full support for transaction management
- Foreign keys
- Referential integrity
- Locking
These features make InnoDB suitable for many applications requiring data integrity.
What are the transaction management capabilities of MyISAM?
Limited transaction management and locking capabilities
MyISAM is often used in scenarios with fewer data updates.
What type of applications is MyISAM commonly used for?
Analytic applications with limited data updates
MyISAM’s design is better suited for read-heavy workloads.
What does the MEMORY storage engine do?
Stores all data in main memory
MEMORY is used for fast access with databases small enough to fit in main memory.
Which table structures does Oracle Database support?
- Heap
- Sorted
- Hash
- Cluster
Oracle’s flexibility in table structures is beneficial for various applications.
What table structures does MySQL with InnoDB support?
- Heap
- Sorted
InnoDB’s support is limited compared to other databases.
What index types does MySQL with InnoDB or MyISAM support?
B+tree indexes
B+tree indexes are commonly used for efficient data retrieval.
What index types does MySQL with MEMORY support?
- B+tree
- Hash
The MEMORY engine’s support for hash indexes allows for faster lookups in certain scenarios.
What does the column ‘select_type’ indicate in the EXPLAIN statement output?
The type of the query being executed, such as SIMPLE, PRIMARY, or SUBQUERY.
- SIMPLE: A straightforward query without nesting or unions.
- PRIMARY: The main or outer SELECT in a nested query.
- SUBQUERY: An inner SELECT inside a nested query.
What information does the ‘table’ column provide in the EXPLAIN output?
The name of the table being described in that row of the EXPLAIN result.
This helps identify which table’s data is being analyzed in the query.
What does the ‘type’ column describe in the EXPLAIN statement?
The join type being used in the query, such as const, range, eq_ref, or ALL.
- const: The table has a row that matches at most one condition.
- range: The query uses a constant to filter rows.
- eq_ref: One row from this table is selected for each matching row of another table.
- ALL: Every row from the table is scanned.
What does ‘possible_keys’ represent in the EXPLAIN output?
Indexes that could potentially be used to speed up the query.
What does the ‘key’ column indicate in the output of the EXPLAIN statement?
The specific index that the database has decided to use for the query.
If it’s NULL, no index was used and a full table scan was performed.
What information does the ‘ref’ column provide in the EXPLAIN output?
Constants or expressions compared with the selected index.
What does the ‘rows’ column estimate in the context of an EXPLAIN query?
How many rows will be read from the table to execute the query.
What does the ‘filtered’ column estimate in the EXPLAIN output?
The number of rows that qualify based on the query conditions.