SQL вероятно Flashcards
In which cases is there an advantage to using a heap instead of a clustered index?
First of all, I think, in case when we insert a lot of unordered items into existing table. This case heap will work faster than clustered index, because it doesn’t need to keep certain order.
Second case is if information in our table is not expected to be somehow ordered or indexed. This case heap will be better because this case our operations will be a little bit faster because we won’t need to update index when we insert or update or delete something.
In what order will SQL return records if a clustered index is built on the table?
In an arbitrary order. By default SQL does not guarantee any order of selected entries if there are no explicit ORDER BY statement used.
What is the difference between columns included in an index and columns that are part of the index key?
If a column is specified as included (INCLUDE), its values will only be stored at the leaf level of the index.
If column is part of index key, then its value will be on each index node.
What is the lock escalation mechanism?
Lock escalation mechanism is a process used in databases in order to optimize memory use in lock buffer. The idea is to replace multiple rows locks with smaller number of larger locks, blocking entire sections of rows or whole table, in order to reduce amount of separate small locks and save memory.
PostgreSQL doesn’t have lock escalation mechanism.
Should the clustered index match the primary key?
No, but by default, a unique index is created for the primary key, and it is clustered.
If there is a non-clustered index on a table, what data is stored at the leaf level to access the clustered key or heap?
On leaf level of non-clustered index we have some kind of pointer to eact row.
If we have clustered index on our table, then this is cluster key value.
Otherwise, this is RowID, consisting of FileID, PageID and SlotID
How to measure index efficiency?
Ну, во-первых, можно использовать те же EXPLAIN или EXPLAIN ANALYZE, чтобы посмотреть план запроса и насколько использование индекса уменьшает время работы запроса
Во-вторых, можно попробовать проанализировать размер индекса и его используемость. Ну то есть, например, если у нас какой-нибудь новостной портал, то наверное нам нет смысла хранить индекс по всей таблице, потому что всё равно чаще всего при запросах хотят получить статьи за последнее время, скажем, за последние месяца два, три максимум.
В этом случае мы можем не строить индекс на всю таблицу, а только на ту часть записей, которые были созданы относительно недавно.
Ну и наконец, можно посмотреть статистику базы данных и узнать как часто вообще используется наш индекс.
То есть если у нас существует индекс, но он нигде не используется, или используется редко, то возможно, будет лучше его вообще убрать.
How to find indexes that are not used in database?
Using Dynamic Management Views
We just won’t see any statistics about searches for this index.
What is MVCC?
MVCC is a Multi Version Conccurrency Control in PostgreSQL.
This is mechanism that allows for making parallel transactions without locking.
If you are performing read requests in transaction with repeatable read isolation level, then during transaction all entries that you can read will be accessible in same state they were on moment when transaction started.
What is difference between TRUNCATE TABLE and DELETE FROM?
DELETE FROM operation deletes all rows from table, but it treats rows separately.
Deletion of these rows will be logged, so you can rollback all changes
Also this is possible to use WHERE statement in order to delete only certain entries
Finally, DELETE FROM is relatively slow operation, comparing to TRUNCATE TABLE
In case of TRUNCATE TABLE, we work with table pages, not with separate rows. TRUNCATE TABLE just deletes whole pages, with most minor logging possible.
So, this operation is faster than DELETE FROM, but it is not possible to use it on tables with foreign keys.
How recursive queries work in PostgreSQL?
We can use recursive queries in Common Table Expression in PostgreSQL.
Their syntax is similar to regular CTE, but has some differences.
First of all, it starts as WITH RECURSIVE, not just WITH
Also, recursive query consist of two parts with UNION ALL between them
First part is not recursive, and second part is recursive.
Recursive part must ask data from table we declare inside this CTE, and must return some rows. On each loop it receive rows it returned on previous one.
On the moment when this recursive part did not return any rows, recursive loop stops
After that all returned rows are collected into one table.
What are normal forms in table and what normal forms exist?
Normal form is a certain set of requirements, that table must meet. There are six normal forms that exist, but now I can remember only three of them.
First form demand is that table columns must contain no composite data, and there must be no duplicates among table rows.
Second normal form demand is that each entry in table must have field, that can be used as Primary Key. Primary key must have no duplicates among fields in same column.
Third normal form demand is that entries in table must contain no fields which are not primary keys and depend on other non-primary key fields.
What is parameter sniffing in SQL?
Parameter sniffing is kind of bug in process of query execution, that can appear because of query plan caching.
So, for example, let’s imagine that we have table of products, and create custom function with one parameter, which is product description string.
And after that we make few requests using this function.
After first request, PostgreSQL will save query plan that was used for this function execution, based on first request function parameter value.
And usually this is not a problem, but let’s imagine that we have 1000 rows in our table, and 900 of them have same description, and this description is value of parameter that we used on first function use.
This case best PostgreSQL will plan will be to scan all table, instead of using any indexes, because this is best strategy this case.
But this strategy can be kept for other function calls to, even in cases when this is not effective.
There are few ways how we can avoid parameter sniffing.
First is to dynamically recompile query on each function call using execute statement
Second is to save input parameters into local variables and then use these local variables in our queries
Third is to remember about parameter sniffing existence and just create few similar functions optimized for exact use cases
Finally, we can manually enable or disable certain strategies that SQL can use to build the query
What are main lock types in PostgreSQL?
First is exclusive lock. This case any access from other transactions to locked rows will be restricted. Usually used for modification or deletion operations.
Second is lock for update. This case other transactions will be able to read rows, but not modify or delete or put their own exclusive or for update locks on them.
Third is shared lock. This lock makes it possible for other transactions to read rows, but not modify them, and also prevents any lock with higher restrictions to be put on these rows.
What are intent locks?
Intent locks are locks that put on table, what are used to simplify locks collision mechanism, and they just show that another transaction already got lock of certain level on some rows of the table.
These locks are used for optimization purposes, for example in case when certain transaction wants to create lock for whole table. Without intent locks that would be necessary to check every single row in search for locks, and that would require much more resources than just checking one intentional lock assigned to table itself.
There are few types of intentional locks.
These are Intentional Shared lock and Intentional Exclusive lock