Database Concept Flashcards by weichen ouyang

Design a relational database

two sub-system: storage(file system) and management software

management software:

storage manager (block read in/write)

cache

SQL parser

log system (for undo/redo)

authorization management (roles)

Disaster recover system

indexing (fast read)

lock management (for concurrence)

How well did you know this?

Not at all

Perfectly

why use index

Indexes are created to speed up the data retrieval and the query processing operations from a database table or view, by providing swift access to the database table rows, without the need to scan all the table’s data, in order to retrieve the requested data.

How well did you know this?

Not at all

Perfectly

what column(s) can be index

The column or columns have a high degree of uniqueness

You frequently need to look for a certain value or range of values for the column.

How well did you know this?

Not at all

Perfectly

index’s data structure

B+ tree

other candidates: binary search tree, B tree, Hash table

How well did you know this?

Not at all

Perfectly

dense index and sparse index

For MySQL’s two storage engines, myisam use sparse index, innodb has one and only one dense index: if primary key is defined, it is the dense index; if not, the first non-empty index will be chosen to be the dense index; if neither, innodb will generate a hidden primary key for us.

In the dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.

In the sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.

How well did you know this?

Not at all

Perfectly

Why not use a binary search tree to store index

Since each node will only contain two children, the average lookup time will be O(logN). That means O(logN) times of I/O, which is too much – even slower than a full table scan.

How well did you know this?

Not at all

Perfectly

B+ tree as index’s data structure

non-leaf node’s number of child = number of keywords
non-leaf node’s child pointer P[i] -> sub-tree with keywords[K[i], K[i+1])
the non-leaf node only store index, data records are all in leaf nodes
all leaf nodes have a pointer to next leaf nodes

How well did you know this?

Not at all

Perfectly

Why use B+ tree as the DS for index?

low I/O cost, internal nodes contains only indexes, a bulk read in contains more indexes which in turn reduce the total times of read
search time cost is stable
better for full-table scan (scan all leafs)

How well did you know this?

Not at all

Perfectly

Why not use hash index for database?

time complexity is better
However, it only supports ‘=’, not range query
does not keep the order, thus need constant sorting operation
query by composition key is very complex for hash-based method
hash collision will cause unnecessary table scan
large hash collision will reduce the time efficiency

How well did you know this?

Not at all

Perfectly

Bitmap for index?

yes, oracle is using it.

not good for high concurrency, thus, it is an option for OLAP, but not for OLTP

How well did you know this?

Not at all

Perfectly

How to optimize slow query

identify slow query through slow query log

use EXPLAIN ANALYZE e.g. tool to analyze the query

revise query to let it exploit more with indexes

How well did you know this?

Not at all

Perfectly

Is using primary key always faster than using other indexes?

No.

For example, when count(id), it probably will not use id(primary key) to aggregate, it will choose a sparse index, because a sparse index store only key and value in its nodes, it could read in more records in one batch each time/ eliminate more records this way, which in turn will speed up the query.

How well did you know this?

Not at all

Perfectly

Script command related to tuning/optimalize query

show variables like ‘%query%’ // find slow_query related var
show status like ‘%slow_queries%’ // show the number of slow queries executed within this session
explain ########query content########## //explain analyze // type=all or index: need to improve | extra=Using filesort or temporary : can’t use index

If the above 3 showed slow query, change the script to use other field to query or alter the table to add the current field as index

sometimes Mysql will choose a non-primary key to query to achieve better performance

If you want to force the query to use other index:

select count(id) from table1 force index(primary)

How well did you know this?

Not at all

Perfectly

Composite Index

Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.

How well did you know this?

Not at all

Perfectly

Key related SQL command

PRIMARY KEY(‘id’),

UNIQUE KEY ‘account’ (‘account’),

KEY ‘index_area_title’ (‘area’,’title’),

KEY ‘idx_name’(‘name’)

How well did you know this?

Not at all

Perfectly

Left-prefix rule of composite index

Study These Flashcards

The left-prefix rule says a query can search an index efficiently if it provides search terms (such as values in a WHERE clause) that match the leading columns of the index, in left-to-right order.

mySQL will continue its matching from left to right until range operator(>, <, between, like)

key(a,b,c,d): where a=1.b=2.c>5,d=4 will not use index

key(a,b,d,c) where a=1.b=2.c>5,d=4 will use index

d=4 will be move upper stream by MySQL to fit the index

‘in’ and ‘=’ could be not in order

example:

table: KEY ‘mycomkey’ (A,B)

query 1: select * from table1 where A=1,b=2

query 2: select * from table where A=1

query 3: select * from table where B=2

query1 and 2 will use index mycomkey, query 3 will do whole table scan. because the left prefix rule, B=2 without A will not provoke the index

A and B’s order in the WHERE clause is not a matter, MySQL will change the order for you to fit the index

is index the more the better

Study These Flashcards

no.

The small table should not use an index, because the overhead of establishing and store the index system will exceed the efficiency gain compared to a whole table scan
Update table will update the index, more index meaning more cost in maintaining the index system
More index means more spatial cost

What is the difference between MyISAM’s and InnoDB’s internal Locking

Study These Flashcards

MyISAM’s default is table-level locking, not support row-level locking;

InnoDB’s default is row-level locking, and support table-level locking

read lock

Study These Flashcards

A READ lock has the following features:

A READ lock for a table can be acquired by multiple sessions at the same time. In addition, other sessions can read data from the table without acquiring the lock.
The session that holds the READ lock can only read data from the table, but cannot write. And other sessions cannot write data to the table until the READ lock is released. The write operations from another session will be put into the waiting states until the READ lock is released.
If the session is terminated, either normally or abnormally, MySQL will release all the locks implicitly. This feature is also relevant for the WRITE lock.

read lock is generally shared lock but you can force mutex lock like：

select * from table1 where id between 1 and 300 _for update_

write lock

Study These Flashcards

A WRITE lock has the following features:

The only session that holds the lock of a table can read and write data from the table.
Other sessions cannot read data from and write data to the table until the WRITE lock is released.

script to add a shared lock

Study These Flashcards

lock in shared mode

MyISAM is suitable for what case

Study These Flashcards

frequent full table count(*)
read-intensive, write-low-intensive
no transaction

InnoDB is suitable for what case

Study These Flashcards

read and write-intensive
high requirement for reliability, support transaction

different type of lock

Study These Flashcards

level: row-level, table-level, page-level

shared lock and mutex lock

auto-lock and explicit lock

DML lock DDL lock

Optimistic lock pessimistic lock

transaction 4 features

ACID * Atomic：all done or do nothing * Consistency：integrity constraint * Isolation：concurrent transaction should not interfere with each other * Durability：after commit， the result of the transaction should be kept effective。 redo log file

Concurrent transaction problem 1 update loss - one update overlap another

all major database are already avoid it through lock

Concurrent transaction problem 2 dirty read one transaction read other transaction's uncommited data

*read uncommited is the lowest isolation level* to avoid the dirty read, *set session transaction isolation level read committed (oracle default)*

Concurrent transaction problem 3 non-repeatable read multiple read results not consistent

when one transaction is reading and another transaction is updating the value; solution: *set session transaction isolation level repeatable read (innodb default)*

Concurrent transaction problem 4 phantom reads mySQL "repeatable read" avoid this(conceptually only serializable could avoid it ) By default, InnoDB operates in REPEATABLE READ transaction isolation level and with the innodb\_locks\_unsafe\_for\_binlog system variable disabled. In this case, InnoDB uses next-key locks for searches and index scans, which prevents phantom rows

A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read. This can occur when range locks are not acquired on performing a SELECT ... WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfil that WHERE clause. In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in the range 10 to 30 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.

snapshot read

non-lock, unblocking read, when is not serializable 简单的select操作

current read

read the newest version, and add a lock select ... lock in share mode //shared lock select ... for update //mutex lock insert //mutex lock update //mutex lock delete //mutex lock

Under RC(read committed), RR(repeatable read) mode, how snapshot read be implemented

* 3 hidden column: transaction id, rollback pointer, row id(auto increase when new row inserted) * **undo log: the old version of data** **insert undo log (discard after commit)** **update undo log (update or delete)** * read view: find the closest transaction id prev to current active transactions (most stable for now / snapshot)

MVCC

Multiversion concurrency control

Is RR‘ snapshot read a real MVCC

not a real MVCC

what is the real mechanism for RR‘s avoiding phantom read

next key lock = record lock + gap lock 1. Where conditions are all meet (query will use unique index), then use record lock. Then other concurrent insert commands will still be effective(no gap lock). 2. Where conditions are partially meet or no-meet, then add Gap Lock Example: id (6, 9, 11), when "delete from t1 where id = 9) (6, 11] range will be GAP locked * if delete from t1 where id =9, if id is not unique index, no index can be used, then it is a table gap lock, all gap is locked

Database Concept Flashcards

(35 cards)