CIS275 - Chapter 5: Data Storage Flashcards
the time required to access the first byte in a read or write operation.
Access time
the speed at which data is read or written, following initial access.
Transfer rate
memory that is lost when disconnected from power.
Volatile memory
memory that is retained without power.
Non-volatile memory
the primary memory used when computer programs execute.
Main memory,
also called random-access memory (RAM),
Main memory is fast, expensive, and has limited capacity.

_____ is less expensive and higher capacity than main memory.
Flash memory,
also called solid-state drive (SSD)
Writes to flash memory are much slower than reads, and both are much slower than main memory writes and reads.

Memory used to store large amounts of data.
Magnetic disk,
also called hard-disk drive (HDD)
Magnetic disk is slower, less expensive, and higher capacity than flash memory.



Magnetic disk groups data in _____.
sectors
traditionally 512 bytes per sector but 4 kilobytes with newer disk formats.

Flash memory groups data in _____
pages
usually between 2 kilobytes and 16 kilobytes per page.

Databases and file systems use a uniform size, called a _____, when transferring data between main memory and storage media.

block
Block size is independent of storage media.
Database systems typically support block sizes ranging from 2 kilobytes to 64 kilobytes. Smaller block sizes are usually better for transactional applications, which access a few rows per query. Larger block sizes are usually better for analytic applications, which access many rows per query.



To minimize block transfers, relational databases usually store an entire row within one block, which is called _____.

row-oriented storage

Row-oriented storage performs best when row size is small relative to block size, for two reasons:
Improved query performance. When row size is small relative to block size, each block contains many rows. Queries that read and write multiple rows transfer fewer blocks, resulting in better performance.
Less wasted storage. Row-oriented storage wastes a few bytes per block, since rows do not usually fit evenly into the available space. The wasted space is less than the row size. If row size is small relative to block size, this wasted space is insignificant.
- The row size is small relative to the block size, so many rows are transferred per block.
- Only 16 bytes go unused, so wasted space is insignificant.
- The row size is large relative to block size, so fewer rows are transferred per block.
- One kilobyte goes unused, so wasted space is significant.

- The large column allows only two rows to be transferred per block and wastes significant space.
- Large columns are replaced by a link and are stored in a separate area.
- More rows fit per block, and less space is wasted. Queries that do not access the large column are faster.



In column-oriented storage, also called columnar storage, each block stores values for a single column only.

column-oriented storage,
also called columnar storage
Column-oriented storage benefits analytic applications in several ways:
Faster data access. More column values are transferred per block, reducing time to access storage media.
Better data compression. Databases often apply data compression algorithms when storing data. Data compression is usually more effective when all values have the same data type. As a result, more values are stored per block, which reduces storage and access time.
- With column-oriented storage, a 4 kilobyte block contains roughly 500 8-byte column values.
- Computing the average of all incomes reads fewer blocks than row-oriented storage.
- Different columns occupy separate blocks.
- Selecting multiple columns reads multiple blocks and is slower than row-oriented storage.



a scheme for organizing rows in blocks on storage media.
table structure
Databases commonly support four alternative table structures:
Heap table
Sorted table
Hash table
Table cluster
In a _____, no order is imposed on rows.

heap table
- In a heap table, the database maintains a pointer to free space, which indicates the location of the next insert.
- When a row is inserted, the pointer moves to the next available space.
- When a row is deleted, the pointer moves to the deleted space. The deleted space is linked to free space at the end of the table.
- As more rows are deleted, free space is linked together in a list.
- Inserts go to the first available space in the list.

In a _____, the database designer identifies a _____ that determines physical row order.

sorted table
sort column
- In a sorted table, rows are sorted on a sort column.
- Inserting a new row requires moving all subsequent rows, which is inefficient.
- To avoid inefficient inserts, the sort column order is maintained with links, producing a linked list.
- The sort order is maintained during an insert by changing two links rather than moving rows.



In a _____, rows are assigned to buckets.
hash table
A _____ is a block or group of blocks containing rows.
bucket
Initially, each bucket has one block. As a table grows, some buckets eventually fill up with rows, and the database allocates additional blocks. New blocks are linked to the initial block, and the bucket becomes a chain of linked blocks.
The bucket containing each row is determined by a hash function and a hash key.































































































































