Database Concept Flashcards

1
Q

Design a relational database

A

two sub-system: storage(file system) and management software

management software:

storage manager (block read in/write)

cache

SQL parser

log system (for undo/redo)

authorization management (roles)

Disaster recover system

indexing (fast read)

lock management (for concurrence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why use index

A

Indexes are created to speed up the data retrieval and the query processing operations from a database table or view, by providing swift access to the database table rows, without the need to scan all the table’s data, in order to retrieve the requested data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what column(s) can be index

A

The column or columns have a high degree of uniqueness

You frequently need to look for a certain value or range of values for the column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

index’s data structure

A

B+ tree

other candidates: binary search tree, B tree, Hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

dense index and sparse index

A

For MySQL’s two storage engines, myisam use sparse index, innodb has one and only one dense index: if primary key is defined, it is the dense index; if not, the first non-empty index will be chosen to be the dense index; if neither, innodb will generate a hidden primary key for us.

In the dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.

In the sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why not use a binary search tree to store index

A

Since each node will only contain two children, the average lookup time will be O(logN). That means O(logN) times of I/O, which is too much – even slower than a full table scan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

B+ tree as index’s data structure

A
  1. non-leaf node’s number of child = number of keywords
  2. non-leaf node’s child pointer P[i] -> sub-tree with keywords[K[i], K[i+1])
  3. the non-leaf node only store index, data records are all in leaf nodes
  4. all leaf nodes have a pointer to next leaf nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use B+ tree as the DS for index?

A
  1. low I/O cost, internal nodes contains only indexes, a bulk read in contains more indexes which in turn reduce the total times of read
  2. search time cost is stable
  3. better for full-table scan (scan all leafs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why not use hash index for database?

A
  1. time complexity is better
  2. However, it only supports ‘=’, not range query
  3. does not keep the order, thus need constant sorting operation
  4. query by composition key is very complex for hash-based method
  5. hash collision will cause unnecessary table scan
  6. large hash collision will reduce the time efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bitmap for index?

A

yes, oracle is using it.

not good for high concurrency, thus, it is an option for OLAP, but not for OLTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to optimize slow query

A

identify slow query through slow query log

use EXPLAIN ANALYZE e.g. tool to analyze the query

revise query to let it exploit more with indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is using primary key always faster than using other indexes?

A

No.

For example, when count(id), it probably will not use id(primary key) to aggregate, it will choose a sparse index, because a sparse index store only key and value in its nodes, it could read in more records in one batch each time/ eliminate more records this way, which in turn will speed up the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Script command related to tuning/optimalize query

A
  1. show variables like ‘%query%’ // find slow_query related var
  2. show status like ‘%slow_queries%’ // show the number of slow queries executed within this session
  3. explain ########query content########## //explain analyze // type=all or index: need to improve | extra=Using filesort or temporary : can’t use index

If the above 3 showed slow query, change the script to use other field to query or alter the table to add the current field as index

sometimes Mysql will choose a non-primary key to query to achieve better performance

If you want to force the query to use other index:

select count(id) from table1 force index(primary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Composite Index

A

Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Key related SQL command

A

PRIMARY KEY(‘id’),

UNIQUE KEY ‘account’ (‘account’),

KEY ‘index_area_title’ (‘area’,’title’),

KEY ‘idx_name’(‘name’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Left-prefix rule of composite index

A

The left-prefix rule says a query can search an index efficiently if it provides search terms (such as values in a WHERE clause) that match the leading columns of the index, in left-to-right order.

mySQL will continue its matching from left to right until range operator(>, <, between, like)

key(a,b,c,d): where a=1.b=2.c>5,d=4 will not use index

key(a,b,d,c) where a=1.b=2.c>5,d=4 will use index

d=4 will be move upper stream by MySQL to fit the index

‘in’ and ‘=’ could be not in order

example:

table: KEY ‘mycomkey’ (A,B)

query 1: select * from table1 where A=1,b=2

query 2: select * from table where A=1

query 3: select * from table where B=2

query1 and 2 will use index mycomkey, query 3 will do whole table scan. because the left prefix rule, B=2 without A will not provoke the index

A and B’s order in the WHERE clause is not a matter, MySQL will change the order for you to fit the index

17
Q

is index the more the better

A

no.

  1. The small table should not use an index, because the overhead of establishing and store the index system will exceed the efficiency gain compared to a whole table scan
  2. Update table will update the index, more index meaning more cost in maintaining the index system
  3. More index means more spatial cost
18
Q

What is the difference between MyISAM’s and InnoDB’s internal Locking

A

MyISAM’s default is table-level locking, not support row-level locking;

InnoDB’s default is row-level locking, and support table-level locking

19
Q

read lock

A

A READ lock has the following features:

  1. A READ lock for a table can be acquired by multiple sessions at the same time. In addition, other sessions can read data from the table without acquiring the lock.
  2. The session that holds the READ lock can only read data from the table, but cannot write. And other sessions cannot write data to the table until the READ lock is released. The write operations from another session will be put into the waiting states until the READ lock is released.
  3. If the session is terminated, either normally or abnormally, MySQL will release all the locks implicitly. This feature is also relevant for the WRITE lock.

read lock is generally shared lock but you can force mutex lock like:

select * from table1 where id between 1 and 300 _for update_

20
Q

write lock

A

A WRITE lock has the following features:

  1. The only session that holds the lock of a table can read and write data from the table.
  2. Other sessions cannot read data from and write data to the table until the WRITE lock is released.
21
Q

script to add a shared lock

A

lock in shared mode

22
Q

MyISAM is suitable for what case

A
  • frequent full table count(*)
  • read-intensive, write-low-intensive
  • no transaction
23
Q

InnoDB is suitable for what case

A
  • read and write-intensive
  • high requirement for reliability, support transaction
24
Q

different type of lock

A

level: row-level, table-level, page-level

shared lock and mutex lock

auto-lock and explicit lock

DML lock DDL lock

Optimistic lock pessimistic lock

25
Q

transaction 4 features

A

ACID

  • Atomic:all done or do nothing
  • Consistency:integrity constraint
  • Isolation:concurrent transaction should not interfere with each other
  • Durability:after commit, the result of the transaction should be kept effective。 redo log file
26
Q

Concurrent transaction problem 1

update loss - one update overlap another

A

all major database are already avoid it through lock

27
Q

Concurrent transaction problem 2

dirty read

one transaction read other transaction’s uncommited data

A

read uncommited is the lowest isolation level

to avoid the dirty read, set session transaction isolation level read committed (oracle default)

28
Q

Concurrent transaction problem 3

non-repeatable read

multiple read results not consistent

A

when one transaction is reading and another transaction is updating the value;

solution:

set session transaction isolation level repeatable read (innodb default)

29
Q

Concurrent transaction problem 4

phantom reads

mySQL “repeatable read” avoid this(conceptually only serializable could avoid it )

By default, InnoDB operates in REPEATABLE READ transaction isolation level and with the innodb_locks_unsafe_for_binlog system variable disabled. In this case, InnoDB uses next-key locks for searches and index scans, which prevents phantom rows

A

A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.

This can occur when range locks are not acquired on performing a SELECT … WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT … WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfil that WHERE clause.

In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in the range 10 to 30 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.

30
Q

snapshot read

A

non-lock, unblocking read, when is not serializable

简单的select操作

31
Q

current read

A

read the newest version, and add a lock

select … lock in share mode //shared lock

select … for update //mutex lock

insert //mutex lock

update //mutex lock

delete //mutex lock

32
Q

Under RC(read committed), RR(repeatable read) mode, how snapshot read be implemented

A
  • 3 hidden column: transaction id, rollback pointer, row id(auto increase when new row inserted)
  • undo log: the old version of data

insert undo log (discard after commit)

update undo log (update or delete)

  • read view: find the closest transaction id prev to current active transactions (most stable for now / snapshot)
33
Q

MVCC

A

Multiversion concurrency control

34
Q

Is RR‘ snapshot read a real MVCC

A

not a real MVCC

35
Q

what is the real mechanism for RR‘s avoiding phantom read

A

next key lock = record lock + gap lock

  1. Where conditions are all meet (query will use unique index), then use record lock. Then other concurrent insert commands will still be effective(no gap lock).
  2. Where conditions are partially meet or no-meet, then add Gap Lock

Example: id (6, 9, 11), when “delete from t1 where id = 9)

(6, 11] range will be GAP locked

  • if delete from t1 where id =9, if id is not unique index, no index can be used, then it is a table gap lock, all gap is locked