Database Concept Flashcards
Design a relational database
two sub-system: storage(file system) and management software
management software:
storage manager (block read in/write)
cache
SQL parser
log system (for undo/redo)
authorization management (roles)
Disaster recover system
indexing (fast read)
lock management (for concurrence)
why use index
Indexes are created to speed up the data retrieval and the query processing operations from a database table or view, by providing swift access to the database table rows, without the need to scan all the table’s data, in order to retrieve the requested data.
what column(s) can be index
The column or columns have a high degree of uniqueness
You frequently need to look for a certain value or range of values for the column.
index’s data structure
B+ tree
other candidates: binary search tree, B tree, Hash table
dense index and sparse index
For MySQL’s two storage engines, myisam use sparse index, innodb has one and only one dense index: if primary key is defined, it is the dense index; if not, the first non-empty index will be chosen to be the dense index; if neither, innodb will generate a hidden primary key for us.
In the dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.
In the sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.
Why not use a binary search tree to store index
Since each node will only contain two children, the average lookup time will be O(logN). That means O(logN) times of I/O, which is too much – even slower than a full table scan.
B+ tree as index’s data structure
- non-leaf node’s number of child = number of keywords
- non-leaf node’s child pointer P[i] -> sub-tree with keywords[K[i], K[i+1])
- the non-leaf node only store index, data records are all in leaf nodes
- all leaf nodes have a pointer to next leaf nodes
Why use B+ tree as the DS for index?
- low I/O cost, internal nodes contains only indexes, a bulk read in contains more indexes which in turn reduce the total times of read
- search time cost is stable
- better for full-table scan (scan all leafs)
Why not use hash index for database?
- time complexity is better
- However, it only supports ‘=’, not range query
- does not keep the order, thus need constant sorting operation
- query by composition key is very complex for hash-based method
- hash collision will cause unnecessary table scan
- large hash collision will reduce the time efficiency
Bitmap for index?
yes, oracle is using it.
not good for high concurrency, thus, it is an option for OLAP, but not for OLTP
How to optimize slow query
identify slow query through slow query log
use EXPLAIN ANALYZE e.g. tool to analyze the query
revise query to let it exploit more with indexes
Is using primary key always faster than using other indexes?
No.
For example, when count(id), it probably will not use id(primary key) to aggregate, it will choose a sparse index, because a sparse index store only key and value in its nodes, it could read in more records in one batch each time/ eliminate more records this way, which in turn will speed up the query.
Script command related to tuning/optimalize query
- show variables like ‘%query%’ // find slow_query related var
- show status like ‘%slow_queries%’ // show the number of slow queries executed within this session
- explain ########query content########## //explain analyze // type=all or index: need to improve | extra=Using filesort or temporary : can’t use index
If the above 3 showed slow query, change the script to use other field to query or alter the table to add the current field as index
sometimes Mysql will choose a non-primary key to query to achieve better performance
If you want to force the query to use other index:
select count(id) from table1 force index(primary)
Composite Index
Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.
Key related SQL command
PRIMARY KEY(‘id’),
UNIQUE KEY ‘account’ (‘account’),
KEY ‘index_area_title’ (‘area’,’title’),
KEY ‘idx_name’(‘name’)