Data Storage Flashcards
Speed is measured as blank and blank
access time and transfer rate
Blank is the time required to access the first byte in a read or write operation.
Access time
Blank is the speed at which data is read or written, following initial access.
Transfer rate
Blank typically ranges from pennies to dollars per gigabyte of memory, depending on the media type.
Cost
In principle, any media type can store any amount of data. In practice, blank is limited by cost.
capacity
Blank is memory that is lost when disconnected from power.
Volatile memory
Blank is retained without power.
Non-volatile memory
Computer media vary on what four important dimensions:
Speed.
Cost.
Capacity.
Volatility.
Three types of media are most important for database management:
Name them
Main memory, also called random-access memory (RAM)
Flash memory, also called solid-state drive (SSD)
Magnetic disk, also called hard-disk drive (HDD),
Blank, also called random-access memory (RAM), is the primary memory used when computer programs execute. Blank is fast, expensive, and has limited capacity.
Main Memory
Blank, also called solid-state drive (SSD), is less expensive and higher capacity than main memory. Writes to blank are much slower than reads, and both are much slower than main memory writes and reads.
Flash memory
Blank, also called hard-disk drive (HDD), is used to store large amounts of data. Blank is slower, less expensive, and higher capacity than flash memory.
Magnetic Disk
Main memory is blank
volatile
Flash memory and magnetic disk are blank and therefore considered blank
non-volatile
storage media
Databases store data permanently on blank and transfer data to and from blank during program execution.
storage media
main memory
Main memory has an access time of blank microseconds, transfer rate of blank, cost per gb of blank, and is blank
.01 to .1
>10
>1
Volatile
Flash memory has an access time or blank microseconds , transfer rate of blank gb/sec, cost per gb of blank, and is blank
20 to 100
.5 to 3
around .25
Non-volatile
Magnetic disk has an access time or blank microseconds , transfer rate of blank gb/sec, cost per gb of blank, and is blank
5,000 to 10,000
.05 to .2
around .02
Non-volatile
What type of memory - Reading one gigabyte takes about one second.
Flash memory
what type of memory - Used to store petabytes (millions of gigabytes) of user data in the cloud.
Magnetic disk
What type of memory - Upgrading from 16 to 32 gigabytes costs an extra $400 for an Apple laptop computer.
Main memory
Magnetic disk groups data in black, traditionally 512 bytes per sector but 4 kilobytes with newer disk formats.
sectors
Flash memory groups data in blank, usually between 2 kilobytes and 16 kilobytes per page.
pages
Databases and file systems use a uniform size, called a blank, when transferring data between main memory and storage media
block
Blank is independent of storage media.
Block size
Storage media are managed by blank, which convert between blocks and sectors or pages. This conversion is internal to the storage device, so the database is unaware of page and sector sizes.
controllers
Blank must be uniform within a database but, in most database systems, can be specified by the database administrator
Block size
Database systems typically support block sizes ranging from blank to blank
2 kilobytes to 64 kilobytes.
Smaller block sizes are usually better for blank, which access a few rows per query.
transactional applications
Larger block sizes are usually better for blank, which access many rows per query.
analytic applications
Accessing storage media is relatively slow. Since data is transferred to and from storage media in blocks, databases attempt to minimize the number of blocks required for blank.
common queries
Most relational databases are optimized for blank, which often read and write individual rows.
transactional applications
To minimize block transfers, relational databases usually store an entire row within one block, which is called blank.
row-oriented storage
Row-oriented storage performs best when row size is small relative to block size, for what two reasons:
Improved query performance.
Less wasted storage.
When row size is small relative to block size, each block contains many blank. Queries that read and write multiple rows transfer fewer blocks, resulting in better performance.
rows
Row-oriented storage wastes blank per block, since rows do not usually fit evenly into the available space. The wasted space is less than the row size. If row size is small relative to block size, this wasted space is insignificant.
a few bytes
Database administrators might specify a blank for databases containing larger rows.
larger block size
Sometimes a table contains a very large column, such as 1 megabyte documents or 10 megabyte images. For tables with large columns, each row usually contains a blank to the large column, which is stored in a different area.
link
In blank, also called columnar storage, each block stores values for a single column only.
column-oriented storage
Column-oriented storage benefits analytic applications in what two ways
Faster data access.
Better data compression.
More column values are transferred per block, reducing time to blank.
access storage media
Databases often apply blank when storing data.
data compression algorithms
Data compression is usually more effective when all values have blank. As a result, more values are stored per block, which reduces storage and access time.
same data type
With column-oriented storage, reading or writing an entire row requires blank. Consequently, column-oriented storage is a poor design for most transactional applications.
accessing multiple blocks
PostgreSQL and Vertica are examples of relational databases that support blank. Many NoSQL databases, described elsewhere in this material, are optimized for analytic applications and use column-oriented storage.
column-oriented storage
A blank is a scheme for organizing rows in blocks on storage media.
table structure
Databases commonly support four alternative table structures. Name them
Heap table
Sorted table
Hash table
Table cluster
Each table in a database can have a blank. Databases assign a default structure to all tables. Database administrators can override the default structure to optimize performance for specific queries.
Different structure
In a heap table, no order is imposed on blank.
rows
The database maintains a list of blocks assigned to the table, along with the address of the first available space for inserts. If all blocks are full, the database allocates a blank and inserts rows in it.
new block
When a row is deleted, the space occupied by the row is marked as blank. Typically, blank space is tracked as a linked list, as in the animation below. Inserts are stored in the first space in the list, and the head of the list is set to the next space.
free
Heap tables optimize blank operations.
insert
Heap tables are particularly fast for blank of many rows, since rows are stored in load order.
bulk load
Heap tables are not optimal for queries that read rows in a blank, such as a range of primary key values, since rows are scattered randomly across storage media.
specific order
In a sorted table, the database designer identifies a blank that determines physical row order
sort column
The sort column is usually the blank but can be a non-key column or group of columns.
primary key
Rows are assigned to blocks according to the blank of the sort column. Each block contains all rows with values in a given range. Within each block, rows are located in order of sort column values.
value
Blank are optimal for queries that read data in order of the sort column, such as:
JOIN on the sort column
SELECT with range of sort column values in the WHERE clause
SELECT with ORDER BY the sort column
Sorted tables
Maintaining correct sort order of rows within each block can be slow. When a new row is inserted or when the sort column of an existing row is updated, free space may not be available in the correct location. To maintain the correct order efficiently, databases maintain pointers to the next row within each block, as in the animation below. With this technique, inserts and updates change blank rather than move entire rows.
two address values
When an attempt is made to insert a row into a full block, the block splits in two. The database moves half the rows from the blank to a new block, creating space for the insert.
initial block
Sorted tables are optimized for blank at the expense of insert and update operations.
read queries
In a blank, rows are assigned to buckets
hash table
A blank is a block or group of blocks containing rows.
bucket
Initially, each bucket has one block. As a table grows, some buckets eventually fill up with rows, and the database allocates additional blocks. New blocks are blank to the initial block, and the bucket becomes a chain of linked blocks.
linked
The bucket containing each row is determined by a blank function and a blank key.
hash
The blank is a column or group of columns, usually the primary key.
hash key
The blank computes the bucket containing the row from the hash key.
hash function
Blank are designed to scramble row locations and evenly distribute rows across blocks
Hash functions
The blank is a simple hash function with four steps:
modulo function
Name the four steps of the modulo function
1) Convert the hash key by interpreting the key’s bits as an integer value.
2) Divide the integer by the number of buckets.
3) Interpret the division remainder as the bucket number.
4) Convert the bucket number to the physical address of the block containing the row.
As tables grow, a blank allocates more rows to each bucket, creating deep buckets consisting of long chains of linked blocks.
fixed hash function
Blank are inefficient since a query may read several blocks to access a single row. To avoid deep buckets, databases may use dynamic hash functions
Deep buckets
A blank automatically allocates more blocks to the table, creates additional buckets, and distributes rows across all buckets. With more buckets, fewer rows are assigned to each bucket and, on average, buckets contain fewer linked blocks.
dynamic hash function
Blank are optimal for inserts and deletes of individual rows, since row location is quickly determined from the hash key.
Hash tables
Hash tables are optimal for selecting a single row when the hash key value is specified in the blank. Hash tables are slow on queries that select many rows with a range of values, since rows are randomly distributed across many blocks.
WHERE clause
Blank, also called multi-tables, interleave rows of two or more tables in the same storage area.
Table clusters
Table clusters have a blank, a column that is available in all interleaved tables.
cluster key
The cluster key determines the order in which rows are interleaved. Rows with the same cluster key value are stored together. Usually the cluster key is the blank of one table and the corresponding blank of another,
primary key
foreign key
Table clusters are optimal when blank interleaved tables on the cluster key, since physical row location is the same as output order.
joining
Table clusters perform poorly for many other queries. Name them.
Join on columns other than cluster key.
Read multiple rows of a single table.
Update cluster key.
A blank is a file containing column values, along with pointers to rows containing the column value. The pointer identifies the block containing the row.
single-level index
In some indexes, the pointer also identifies the blank of the row within the block
exact location
Indexes are created by database designers with the blank command.
CREATE INDEX
Single-level indexes are normally sorted on the blank.
column value
A sorted index is not the same as an index on a sorted blank.
table
If an indexed column is blank, the index has one entry for each column value.
unique
If an indexed column is not unique, the index may have blank for some column values, or one entry for each column value, followed by multiple pointers.
multiple entries
An index is usually defined on a single column, but an index can be defined on blank.
multiple columns
In a blank, each index entry is a composite of values from all indexed columns. In all other respects, blank behave exactly like indexes on a single column.
multi-column index
To execute a SELECT query, the database can perform a blank scan or an blank scan
table
index
A blank is a database operation that reads table blocks directly, without accessing an index.
Table scan
An blank is a database operation that reads index blocks sequentially, in order to locate the needed table blocks.
index scan
Blank, also called filter factor or selectivity, is the percentage of table rows selected by a query.
Hit ratio
When a SELECT query is executed, the database examines the WHERE clause and estimates hit ratio. If hit ratio is high, the database performs a blank.
table scan
When a SELECT query is executed, the database examines the WHERE clause and estimates hit ratio. If hit ratio is low, the query needs only a few table blocks, so a table scan would be inefficient. Instead, the database does what four things:
Looks for an indexed column in the WHERE clause.
Scans the index.
Finds values that match the WHERE clause.
Reads the corresponding table blocks.
If the WHERE clause does not contain an blank, the database must perform a table scan.
indexed column
Since a column value and pointer occupy less space than an entire row, an index requires fewer blocks than a table. Consequently, index scans are blank than table scans.
much faster
In some cases, indexes are small enough to reside in blank, and index scan time is insignificant. When hit ratio is low, additional time to read the table blocks containing selected rows is insignificant.
main memory
If a single-level index is sorted, each value can be located with a blank.
binary search
In a binary search, the database repeatedly splits the index blank until it finds the entry containing the search value.
in two
How does the binary search work? Four steps
The database first compares the search value to an entry in the middle of the index.
If the search value is less than the entry value, the search value is in the first half of the index. If not, the search value is in the second half.
The database now compares the search value to the entry in the middle of the selected half, to narrow the search to one quarter of the index.
The database continues in this manner until it finds the index block containing the search value.
For an index with N blocks, a binary search reads blank, rounded up to the nearest integer
log2 N blocks
Indexes on a sorted table may be blank or blank
primary or secondary
A blank, also called a clustering index, is an index on a sort column.
primary index
A blank, also called a nonclustering index, is an index that is not on the sort column.
secondary index
A sorted table can have only one sort column, and therefore only one blank.
primary index
Usually, the blank is on the primary key column(s).
primary index