SQL вероятно Flashcards by Ilja Kuznetsov

In which cases is there an advantage to using a heap instead of a clustered index?

First of all, I think, in case when we insert a lot of unordered items into existing table. This case heap will work faster than clustered index, because it doesn’t need to keep certain order.

Second case is if information in our table is not expected to be somehow ordered or indexed. This case heap will be better because this case our operations will be a little bit faster because we won’t need to update index when we insert or update or delete something.

How well did you know this?

Not at all

Perfectly

In what order will SQL return records if a clustered index is built on the table?

In an arbitrary order. By default SQL does not guarantee any order of selected entries if there are no explicit ORDER BY statement used.

How well did you know this?

Not at all

Perfectly

What is the difference between columns included in an index and columns that are part of the index key?

If a column is specified as included (INCLUDE), its values will only be stored at the leaf level of the index.
If column is part of index key, then its value will be on each index node.

How well did you know this?

Not at all

Perfectly

What is the lock escalation mechanism?

Lock escalation mechanism is a process used in databases in order to optimize memory use in lock buffer. The idea is to replace multiple rows locks with smaller number of larger locks, blocking entire sections of rows or whole table, in order to reduce amount of separate small locks and save memory.

PostgreSQL doesn’t have lock escalation mechanism.

How well did you know this?

Not at all

Perfectly

Should the clustered index match the primary key?

No, but by default, a unique index is created for the primary key, and it is clustered.

How well did you know this?

Not at all

Perfectly

If there is a non-clustered index on a table, what data is stored at the leaf level to access the clustered key or heap?

On leaf level of non-clustered index we have some kind of pointer to eact row.
If we have clustered index on our table, then this is cluster key value.
Otherwise, this is RowID, consisting of FileID, PageID and SlotID

How well did you know this?

Not at all

Perfectly

How to measure index efficiency?

Ну, во-первых, можно использовать те же EXPLAIN или EXPLAIN ANALYZE, чтобы посмотреть план запроса и насколько использование индекса уменьшает время работы запроса

Во-вторых, можно попробовать проанализировать размер индекса и его используемость. Ну то есть, например, если у нас какой-нибудь новостной портал, то наверное нам нет смысла хранить индекс по всей таблице, потому что всё равно чаще всего при запросах хотят получить статьи за последнее время, скажем, за последние месяца два, три максимум.
В этом случае мы можем не строить индекс на всю таблицу, а только на ту часть записей, которые были созданы относительно недавно.

Ну и наконец, можно посмотреть статистику базы данных и узнать как часто вообще используется наш индекс.
То есть если у нас существует индекс, но он нигде не используется, или используется редко, то возможно, будет лучше его вообще убрать.

How well did you know this?

Not at all

Perfectly

How to find indexes that are not used in database?

Using Dynamic Management Views
We just won’t see any statistics about searches for this index.

How well did you know this?

Not at all

Perfectly

What is MVCC?

MVCC is a Multi Version Conccurrency Control in PostgreSQL.
This is mechanism that allows for making parallel transactions without locking.
If you are performing read requests in transaction with repeatable read isolation level, then during transaction all entries that you can read will be accessible in same state they were on moment when transaction started.

How well did you know this?

Not at all

Perfectly

What is difference between TRUNCATE TABLE and DELETE FROM?

DELETE FROM operation deletes all rows from table, but it treats rows separately.
Deletion of these rows will be logged, so you can rollback all changes
Also this is possible to use WHERE statement in order to delete only certain entries
Finally, DELETE FROM is relatively slow operation, comparing to TRUNCATE TABLE

In case of TRUNCATE TABLE, we work with table pages, not with separate rows. TRUNCATE TABLE just deletes whole pages, with most minor logging possible.
So, this operation is faster than DELETE FROM, but it is not possible to use it on tables with foreign keys.

How well did you know this?

Not at all

Perfectly

How recursive queries work in PostgreSQL?

We can use recursive queries in Common Table Expression in PostgreSQL.
Their syntax is similar to regular CTE, but has some differences.
First of all, it starts as WITH RECURSIVE, not just WITH
Also, recursive query consist of two parts with UNION ALL between them
First part is not recursive, and second part is recursive.
Recursive part must ask data from table we declare inside this CTE, and must return some rows. On each loop it receive rows it returned on previous one.
On the moment when this recursive part did not return any rows, recursive loop stops
After that all returned rows are collected into one table.

How well did you know this?

Not at all

Perfectly

What are normal forms in table and what normal forms exist?

Normal form is a certain set of requirements, that table must meet. There are six normal forms that exist, but now I can remember only three of them.

First form demand is that table columns must contain no composite data, and there must be no duplicates among table rows.

Second normal form demand is that each entry in table must have field, that can be used as Primary Key. Primary key must have no duplicates among fields in same column.

Third normal form demand is that entries in table must contain no fields which are not primary keys and depend on other non-primary key fields.

How well did you know this?

Not at all

Perfectly

What is parameter sniffing in SQL?

Parameter sniffing is kind of bug in process of query execution, that can appear because of query plan caching.

So, for example, let’s imagine that we have table of products, and create custom function with one parameter, which is product description string.
And after that we make few requests using this function.
After first request, PostgreSQL will save query plan that was used for this function execution, based on first request function parameter value.
And usually this is not a problem, but let’s imagine that we have 1000 rows in our table, and 900 of them have same description, and this description is value of parameter that we used on first function use.
This case best PostgreSQL will plan will be to scan all table, instead of using any indexes, because this is best strategy this case.
But this strategy can be kept for other function calls to, even in cases when this is not effective.

There are few ways how we can avoid parameter sniffing.
First is to dynamically recompile query on each function call using execute statement
Second is to save input parameters into local variables and then use these local variables in our queries
Third is to remember about parameter sniffing existence and just create few similar functions optimized for exact use cases
Finally, we can manually enable or disable certain strategies that SQL can use to build the query

How well did you know this?

Not at all

Perfectly

What are main lock types in PostgreSQL?

First is exclusive lock. This case any access from other transactions to locked rows will be restricted. Usually used for modification or deletion operations.

Second is lock for update. This case other transactions will be able to read rows, but not modify or delete or put their own exclusive or for update locks on them.

Third is shared lock. This lock makes it possible for other transactions to read rows, but not modify them, and also prevents any lock with higher restrictions to be put on these rows.

How well did you know this?

Not at all

Perfectly

What are intent locks?

Intent locks are locks that put on table, what are used to simplify locks collision mechanism, and they just show that another transaction already got lock of certain level on some rows of the table.

These locks are used for optimization purposes, for example in case when certain transaction wants to create lock for whole table. Without intent locks that would be necessary to check every single row in search for locks, and that would require much more resources than just checking one intentional lock assigned to table itself.

There are few types of intentional locks.
These are Intentional Shared lock and Intentional Exclusive lock

How well did you know this?

Not at all

Perfectly

What is difference between inline and multistatement functions?

Study These Flashcards

The difference is in how query optimizer works with them.

In case of inline functions whole function consist of single select statement and it doesn’t have it’s own execution plan. Query optimizer will work with inline function same way like it would work with its inner select statement.

In case of multistatement functions can have more than one operation, can have variables and conditions, and they are optimized on level of function itself, but not on level of whole query.
This potentially can lead to lower performance, because query optimizer can see only limited scope, when optimizing this function.
For example, if function have subqueries inside, these subqueries will be called on each function call, and they won’t be optimized in any way, even if that could be possible to do, if these queries were optimized on whole query level.
Also, function execution plans themselfves can be cached, that can lead to so called parameter sniffing.
Parameter sniffing is optimization bug, happening when function was called few times with different parameters over short period of time or with same parameters, but with long period of time between calls.
The reason of parameter sniffing is caching of first created execution plan and consequences is slow work in case if execution plan is not effective for other parameter values or because table data has already changed a lot.
As simplest example, let’s imagine that we have a function taking user name as parameter and then searching for all users with this name in table, where we have index on user name column. At first call this name is Allan, and on second this name is Alexander. And lets imagine that in out table there are 1000 users and 900 of them have name Allan.
After first execution optimizer will try to analyze the situation and save most efficient plan, that could be with this parameter value. And because in this case with this parameter value request cardinality was low, it will avoid using index and prefer full table scan. And later, even despite it could use index to quickly filter out all users with name Alexander, it will use this cached plan and use index scan instead of index seek.

What is a correlated subquery in SQL, and how does it differ from a regular subquery?

Study These Flashcards

A correlated subquery is a subquery that references a column from the outer query. It is executed once for each row in the outer query, which can make it slower than a regular subquery. A regular subquery is executed once, and the result is used in the outer query.

What is a subselect in SQL, and how is it used?

Study These Flashcards

A subselect is a query used within another query to retrieve data from a table based on certain criteria. It is often used to filter or join data from multiple tables.

What is a full-text search in SQL, and how is it implemented?

Study These Flashcards

A full-text search is a search that allows you to search for text within the contents of a column, such as a document or an email. It is implemented using a full-text index, a special index type that allows for fast text searches. I don’t remember exactly but this should be GIN or GiST index.

What is OLTP?

Study These Flashcards

OLTP is online transactional processing, this is system purpose of that is to process multiple relatively small transaction in real time. OLTP is usually processed using relational databases.

What is OLAP?

Study These Flashcards

OLAP is online analythical processing. The purpose of OLAP system is to relatively quickly perform operations and analythical queries over large amounts of data. OLAP is usually processed using columnar databases.

What character manipulation functions do you know?

Study These Flashcards

LENGTH, TRIM, SUBSTRING, POSITION, UPPER, LOWER, LEFT, RIGHT

What are differences between DDL, DML and DCL?

Study These Flashcards

DDL is Data Definition Language, this is part of SQL purpose of what is define and alter structure of tables themselves. Typical commands are CREATE, TRUNCATE, ALTER or DROP.

DML is Data Manipulation Language, its purpose is to manipulate tables content. Typical commands are SELECT, DELETE, UPDATE or INSERT.

DCL is Data Control Language, and this part of SQL is used to control database access, but that’s all I can remember.

What are constraints and what types of them exist?

Study These Flashcards

Constraints are some kind of restrictions or rules that certain field in the row should follow.
There are different types of constraints. These are UNIQUE constraint, that must ensure that field value is unique among same column, NOT NULL constraint, that demands field not to be null, also there is PRIMARY KEY constraint, what combines both of them and also can be used to connect two tables together if in another table is a field that implements FOREIGN KEY constraint, referenced to this table.
There are also DEFAULT constraint, which defines default value for newly created row, if no value was provided for this column, and CHECK constraint, what determines certain more specific rule that field must match.

What is relational database?

Relational database is a database that is based on relational data model. In this model, data is stored in tables, and relations between data units are determined by use of keys. So, basic data unit in relational database is entry, entry is a row inside table. Fields of entry are defined by table columns. Entries from different tables can be connected through declaration of this relation in table definition through use of primary and foreign keys. Relational databases advantages are that this is quite easy and convenient to add new tables or columns in this system, and also that through use of table restrictions data can be kept consistent.

How do you understand, that your changes (adding index, changing the query) changed query time?

Well, there are three steps you can do in order to do this. First of all, use EXPLAIN command in order to check if execution plan changed, and if changed, then is current execution plan really potentially more effective than previous one. Second step, is to use EXPLAIN ANALYZE on mock data in staging environment to see if these changes really changed execution time. This is not 100% reliable method, but usually mock data are outdated anonymized data from prod, so this can predict behavior in prod with quite high chance. Finally, after you pushed you changes into prod, before closing the release you can check Grafana or other monitoring instrument to see, what are real metrict of this or that query in prod.

В каких случаях может быть необходимо послать запрос к мастер ноде в реплицированной системе бд?

Когда нужно сразу после обновления данных получить новые данные, не дожидаясь синхронизации.

Каким образом изменить на проде таблицу с большим количеством данных?

При миграции, мы создаём в etcd две переменные Одна это булевый флаг который отвечает за выбор поля для работы в основном коде программы - старое или новое Вторая переменная хранит в себе id, до которого мы точно всё обновили в джобе Изменяю логику работы в основном коде программы так, чтобы при записи или создании пока включён фича флаг 1 у нас шла запись в оба поля Джоба: пока не дошёл до конца и фича флаг 1 true: 1. Выбираю блок n от x (n * 1000) до x ((n + 1) * 1000), начиная с значения сохранённого в ETCD 2. Выбираю для этого блока все незалоченные записи 3. Id залоченных записываю в массив 4. Обновляю незалоченные 5. Пробую обновить записи из массива, пропуская залоченные когда дошёл до конца, но есть необновлённые записи и фича флаг 1 true: 1. пробую обновить незалоченные записи 2. Жду timeout если вдруг фича флаг 1 false, снимаю триггеры и заканчиваю работу Пишу в логи что закончил Слушаю обновление фича флага 2 Если сменился на true, то удаляю триггеры (а логика работы в основном приложении переходит на работу с новым полем) 1. Делаю миграцию через джобу, обновляя изменениями с гуся 2. Если всё ок и ничего не упало, запускаю джобу, запускающую крону 3. Жду пока запустится крона 4. Смотрю, жду пока отработает скрипт, смотрю пока по логам отработает 5. Дёргаем фича флаг, чтобы работало с новой версией таблицы 6. Если всё не ок, то вырубаем фича флаг. Чиним то что сломалось, вырубаем джобы, запущенные на предыдущем этапе, и повторяем с пункта 2. 7. Если всё ок, то закрываем релиз и вторым релизом накатываем изменения, убирающие переходные штуки.

SQL вероятно Flashcards

(28 cards)