SQL важное Flashcards

1
Q

How to create table in SQL?

A

This can be done using CREATE TABLE command. After CREATE TABLE words you must provide new table name, and after than declare table structure or use keyword AS in order to create new table using SELECT request.
Also, this is possible to write TEMPORARY keyword before TABLE keyword in order to create temporary table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

How to delete table in SQL?

A

Using DROP TABLE command. After this command you have to provide table name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to change SQL table structure?

A

This is possible to change table structure using ALTER TABLE command.
In this command you can add, drop and alter existing columns, foreign keys and constraints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is difference between regular tables and temporary tables in SQL?

A

Regular tables are permanent, but temporary tables exist only inside certain database session, and will be deleted when session ends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are constraints in SQL?

A

Constraints in SQL are certain rules what we can apply to certain columns in order to prevent unwanted behaviour. For example, these constraints can be check, unique, default or foreign key. Each of them adds certain behaviour scenarios what will be used in certain conditions. For example, using CHECK constaint we can add conditions what certain value should meet before being inserted into table, or using DEFAULT we can provide certain default value what will be inserted into certain column if value for this column was not provided.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is primary key?

A

Primary key is unique id which can be used to identify certain entry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is foreign key?

A

Foreign key is a value which references to certain entry in another table. This can be used to logically connect few tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is difference between local temporary and global temporary SQL tables.

A

Local tables exist during single session and accessible only inside this session.
Global tables exist while exist any session that uses it, and accessible from all sessions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is table normalization?

A

Table normalization is a process of splitting table into multiple subtables, keeping same logic. This sometimes can be necessary in order to simplify or optimize table structure.

For example, let’s imagine that we have table named Product. Table Product has few columns, which are id, name and category. Name and category are just text fields. The problem this case is that first of all, we have to write category in each entry, and waste more memory. Also, your categories can have mismatch, even if they should mean same things. For example, some categories can be written in only lower case, without capital letter, or something else. So that would be a problem to process this data.
In order to solve this problem, we can move categories into separate table, and instead of providing category name in product item, we can provide only category id. So, this will simplify working process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are JOINs in SQL?

A

JOIN is a SQL language mechanism, what allows us to connect two tables together inside query, and then access this resulting table during request

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How INNER JOIN works?

A

In case of inner join in resulting table will remain only these entries from original tables for what we have matching values in both tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How LEFT JOIN works?

A

Lets imagine that we have two tables, A and B.
In case of LEFT JOIN of table B on table A, in resulting table we will have all data for entires from table A, but we will have data from table B only on these entries, where our join condition matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How RIGHT JOIN works?

A

Lets imagine that we have two tables, A and B.
In case of RIGHT JOIN of table B on table A, in resulting table we will have all data for entires from table B, but we will have data from table A only on these entries, where our join condition matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How FULL JOIN works?

A

Lets imagine that we have two tables, A and B.
In case of FULL JOIN of table B on table A, in resulting table we will have all data for entires from tables A and B, but for each entry in resulting table we will have full data only for these entries where our join condition matches. In other case, on certain entries we can have no information about part belonging to second table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How SELF JOIN works?

A

SELF JOIN is a name for tables joining case, when we connect certain table with itself. This can be used in case when table references to itself
For example, imagine that we have table of employees, and for each employee we have id, name, and this employee’s boss name. So, if we want to get employee id, name, and boss name, we can use SELF JOIN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How CROSS JOIN works and where it can be used?

A

CROSS JOIN is a specific kind of table joining, when we join each row of first table with each row of second table.

About where it can be used…
Okay, just as simplest example, lets imagine that we are selling T-shirts, and we have tables with T-shirts sizes and, for example, materials they can be made of. And if we want to have table, where we store each possible combination of T-shirt size and material. So for this purpose we can use CROSS JOIN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is SQL index and how it works?

A

Index in SQL is a special data structure, what can be used in order to reduce queries time when you make request to certain columns in table.

By default, we have no indexes on colunms, except of id columns, and if you for want to select only certain entries from table, and make request like select name from person where age equals 20, then SQL will have to go through all entries, and compare each of them with your condition.
This is okay if your table is small, but what if it contains something like million, or ten millions entries? This case, of course, your request will be quite slow, because you have to do quite a lot of work to compare all these values to find only matching.

So, indexes are structures what SQL can use in order to optimize this process. Usually index is just a pre-computed binary tree or hash based on values of certain columns of our table, and this tree is used to reduce number of steps we have to do to find required entries.

But when you are trying to optimize something using indexes, you always have to remember, that indexes is not something magical, that just boosts performance with no side effects, but they have direct impact on other aspects of your database.
First of all, indexes are blocks of additional data about your table, and they have to be stored somewhere. So, if you have indexes, then you need more disk space than without them.
Also, indexes are based on the idea that all entries of tables are included in them, so every time you insert new entries, or deleting or updating old entries, indexes have to be updated too. This, of course, makes these operations slower, and sometimes if work with table where you have a lot of operations, what update it, but much smaller number reading requests, then best optimization could be to delete certain index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do multicolumn indexes work in SQL?

A

Multicolumn indexes are optimized for operations what filter entries first by first index column, then by second etc.
But this is important to remember that for best performance we have to filter entries using all these columns that are used for index.
So, for example, if we have table with columns A, B and C, and then create multicolumn index by A, B, C, then best performance we will get in case of queries filtering entries by all of these columns.

If we have queries, filtering only by two or one of them, then performance can be lower.
For example, if we filter only by A, performance will be almost similar to single-column index by A.
If we filter only by B, or by B and C, then performance will be significantly lower, because this case database will have to read more data from index before filtering.
And worst performance we will get if we try to filter only by column C. But, this still can be better than completely without index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is selectivity in SQL?

A

Selectivity in SQL is relation of number of distinct values inside column in a table to number of all entries in table.
Index selectivity varies in border from zero to one, where zero means that we have only one distinct value among entries, and selectivity of one means that every value in our table is unique.
The higher is table index selectivity, the better it is, because this case database works more effectively when using this index. Otherwise, if selectivity is low, for example, zero, or 0.5, then this can be better to completely avoid using index while performing request, because this case time what will be needed to get required information from index will be bigger, than time required to just directly scan all entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is cluster index in SQL?

A

Cluster index in SQL is a specific approach to storing data. This case, we store both index and entries themselves inside one structure, instead of storing only identifiers and accessing entries by reference using them.
There can be only one cluster index in table, because this is not possible to store same entry in two different places in two different cluster indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is covering index in SQL?

A

Covering index is index, what already has all information, required for performing certain request, so there are no need to access table itself this case.
But, covering indexes usually can be used only with very simple requests, or in cases when index was created in order to optimize one certain request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to optimize SQL query?

A

First of all, we can use indexes.
If we have a lot of select queries, and even better if we have a lot of similar select queries, then we can create index or indexes for columns that are used in them, and through this significantly reduce time required for filtering entries.

Second thing we can do is to analyze our requests. Are they optimal? Don’t we do something like recreation of non-materialized table on every iteration of subquery?
Or maybe we select too many columns inside subqueries, and don’t use them later? Or maybe our joins are ineffective and we use left join instead of inner joins?

Third, we should check, do we need all indexes we have in our table. If we have too many indexes that are rarely used, then sometimes this can be better to get rid of them, because they make database slower when inserting, updating or deleting rows.

Finally, maybe we do some expensive calculations in our queries, that could be avoided? For example, we need to select entries created during certain year, and we compare entry year with aim year using YEAR function. This case we can improve performance by using BETWEEN statement, or at least LIKE, directly comparing date strings without additional calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What value types exist in SQL? (PostgresSQL)

A
  • Integers, which are integer, bigint, bigserial or smallint
  • Floats, like real or double
  • Fixed length strings
  • Variable length strings
  • Dates and timestamps
  • UUID
  • Boolean

And some other more specific types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are differences between PostgreSQL and other databases?

A

Я бы сказал, что PostgreSQL - это более громоздкое, но более универсальное и надёжное решение для больших систем.

Например, в PostgreSQL есть очень большое количество поддерживаемых типов. Это не только дефолтные строки, числа или ещё что-то, но и JSON, XML, геометрические данные, да ещё и свои типы данных тоже можно добавлять.

Кроме того, PostgreSQL также сильно большее количество поддерживаемых индексов, чем в том же MySQL.

Кроме пользовательских типов, в PostgreSQL можно также добавлять свои процедуры и функции.

Ещё, насколько я помню, PostgreSQL лучше работает с параллельными запросами, и в целом лучше оптимизирует запросы, но деталей я сейчас не вспомню.

Well, first of all, expandability. That means that this is possible to add your own data types or functions in Postgres

Third is big number of supported data types, including for example special data types for working with arrays, json, xml or geometry types

Also, PostgreSQL is quite scalable and support replication and sharding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is replication?

A

Replication is one of mechanisms of databases scaling.
In case of replication, we copy our database, or part of our database to another place, and then use for certain purposes.

So, what are different replication scenarios?

First of all, this is so called Master-Slave or Primary-Secondary replication. In this case, we have one primary database, for serving insert, update, and delete operations, and few secondary databases, used only for reading data. This can be useful in case if we have to handle a huge amount of read requests, so this case these requests can be distributed between few nodes.
Second variant is Master-Master or Primary-Primary replication. In this case, all of nodes of our database network can be responsible for handling both read and write requests, but possible miscommunication and mistakes caused by outdated data stored on them makes this approach not so effective, and usually used only for cases, when we need to have another database node in reserve to be able to use it if our main node will fail by any reason.

Now, what about what these purposes can be?

First of all, increase of data accessibility. Even if one or few nodes will fail, the system still will be able to server requests, so reliability increases.
Second reason is database load reduction. If we have too many requests, this can be difficult to manage all of them on single server. So, if we use replication, we can distribute this load among multiple servers.
Next is geodistribution. Maybe, some of our users are quite far away from main node. This case, we can use additional nodes to store copy of our data nearer to them, leading to faster response time.
Finally, replication can be used in order to protect from data loss. Let’s imagine, that we had a fire in our datacenter. This case, if we had one node, all our date is probably lost. But if we have few nodes, we can use them to backup our data.

Also, talking about replication, replication can be synchronous and asynchronous.
In synchronous replication, when our primary database receive write request, it will mark transaction complete only after it will also update all secondary databases too.
In case of asynchronous replication, we don’t wait until secondary nodes update. This can make operations faster, but can lead to temporary data inconsistancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is sharding?

A

Sharding is one of database horizontal scaling techniques, based on data splitting.
The idea of sharding is to divide all this amount of data that we have in our database into smaller pieces, or shards, by on certain criteria, and distribute them between multiple servers, on order to reduce database overloading.

Sharding can be useful for optimization purposes, because this case we can run some of our queries in parallel on each of shards, and through this make it faster, but this is also important to remember, that queries, built without taking the fact that table is sharded into account, can become significantly slower in case, if they involve big number of shards.
Generally speaking, the less is the number of shards we need to serve each request, the better it is. Ideal situation is when we need to access only one shard to get required information.

As simple example of my words, lets imagine that we have news portal, storing its articles in different shards. Depending on the criteria we used to split the data, its performance can be totally different.
If we are not very clever, and splitted our enties based on article length, then probably we will have serious problems with performance, because every time we want to show visitor latest news, we have to go over absolutely all shards, because we have no idea in what shards these enties are placed. And the opposite situation, if our sharding criteria is article creation date, then our system will be quite fast, because we will need to access only one or few shards, where latest entries are placed.

So, this is very important to understand, what criteria we will use for sharding. For news this can be date when they were created, for some other data, like user orders on marketplace, this can be user identifier.
But if model is complex, then there could be more than one sharding key.
For example, lets imagine we have a marketplace. So, we could need to shard our information by user ids, which is important for customers, and by order ids, which is important for logistic purposes. This case we inevitably will need to duplicate at least part of our data, and spend more disk space, but this improves performance.

Well, now a few words about data distribution between shards. There are two ways we can do this - using fixed distribution, and dynamic distribution.
In first case we use splitting function, what rely only on sharding key, like article creation date, or user id.
For example, it can use simple modulo division or hash function. This is fast and simple distribution way, but this can cause problem in case if some of our shards can have different popularity, because this case we won’t be able to balance the load between them. Also, if we would want to change number of shards in future, or just change distribution algorithm, that will be difficult task to do, because this case we will have to update quite a lot of data and re-distribute them among new shards.

Another approach is dynamic distribution. This case information about exact place of our entry is stored in separate table, so every time we want to get certain entry, we go to this table and ask it, where it is located.
This, of course, generally reduces performance, but makes it possible to dynamically move items between shards, balancing the load on each of them. Also, we can distribute load between shards unevenly - this can be useful if our servers have different performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is partitioning?

A

Partitioning is a mechanism of splitting big amounts of data into smaller pieces. In partitioning, information from one or few tables, or from whole database is distributed among few servers.

There are few types of partitioning.
First one is vertical. This case we split our data based on their domain. For example, in case of online store, we can store information about users on one server, about orders on another, about products on third.
But this approach also has limits, because if you have too many entries in one table, you will anyway have to use horizontal partitioning.
About horizontal partitioning. In this case we split our table into shards by certain criteria, and save entries from different shards on different servers.

27
Q

What are the main aggregate functions you know?

A

MIN, MAX, AVG, SUM, COUNT

28
Q

What is difference between different types of joins, like merge join or hash join?

A

Well, the difference is in algorithm we use to join entries from different tables.

In case of merge join, for example, we sort tables by their join keys and then we compare them row by row and if keys are similar, we join entries

In case of nested loop join we go over all entries from first table and for each of them we check for possible joins in second table

In case of hash join we create hash table using join key from one table and then use it to find matches with entries from other table

29
Q

What are the purposes of database normalization?

A

Normalization is needed in order to eliminate redundancy and duplication of data in the database.

30
Q

What is identity field in table?

A

Identity field is a field, value of what is generated automatically and incrementally following certain rules.
For example, if we declare certain integer field with IDENTITY(1, 1) then we will have field, which initial value will be 1, but for each next entry inserted in table its value will be increased by 1. So, for first one it is 1, for second is 2, for third is 3 etc.

Also, what is important, if you use TRUNCATE TABLE statement on table, then all identity fields counters will reset.

31
Q

What query isolation types exist in PostgreSQL?

A
  • Read Commited, allows for reading changes commited by other transactions. Non-repeatable reads and phantom reads are possible.
  • Repeatable Read, allows by reading changes commited by another transactions, but locks rows we read from changing by other transactions. This case we can avoid non-repeatable read, but still have chance of phantom read.
  • Serializable, guarantees protection agains any change of rows we interact with or can interact with during our transaction. Usually blocks whole table from being accessible for other transactions.
    Is completely safe, but makes system slow and not possible to perform multiple transactions in parallel
32
Q

What is non-repeatable read in SQL?

A

Non-repeatable read is a situation when during single transaction same row is selected twice, but its columns values received first time are different from their values second time.

33
Q

What is phantom read in SQL?

A

Phantom read is a situation when during single transactions we make same select query, but set of rows we receive in first case and in second are different.

34
Q

What are default primary key restrictions in SQL?

A

Unique, not null

35
Q

How entries storing can be organized in table?

A

Using cluster index or heap.

In case of heap entries are not organized and stored inside blocks called pages.
In case of cluster index we use balanced binary tree index as structure base, but the difference between cluster index and default index is that in case of cluster index entries are stored together with index inside index nodes.

36
Q

What index types in PostgreSQL you know?

A

In PostgreSQL are many types of indexes:
- Binary tree index
- Hash index
- GiST - Generalized Search Tree index
- GIN - Generalized Inverted index

First two are most frequently used for indexing absolutely different data, while two last are more specific.
So, GiST is used for data with non-standart types, like geometry data, for example.
GIN is used for full-text search, or search in arrays or binary json structures

37
Q

How transactions work in SQL?

A

Transaction is an SQL feature, that implements mechanisms that allow multiple queries work together like logically atomic operations.

So, during transactions, you can use multiple queries, and if something goes wrong during their execution, then all changes that were made by your transaction will be deleted.

Also, inside transactions you can use similar mechanism, implemented through use of SAVEPOINT and ROLLBACK TO statements.
Using savepoint you can kind of save your progress that you have for this moment, and using ROLLBACK TO you can later return to table state that it had for the moment of reaching savepoint.

38
Q

What is view in PostgreSQL?

A

View are some kind of virtual tables, created on defined query basis.
Two types of views exist. First is regular, which is just a wrapper over query we wrote for this view. Second is materialized. In case of materialized view its data is stored into separate table and not recalculated each time we make a request.

The difference between views and regular tables is that views are computed values, what means that they can be read, but not changed. If you want to change something, you must go directly to the root table.
Second thing, is that this is important to remember, that any queries with non-materialized views are slower than similar queries that get data from real tables.
Finally, cause views are computed, if you change the tables they are based on, they can break.

39
Q

What is function in PostgreSQL?

A

The function in PostgreSQL is… is a function
So, you set up function name, its input parameters with their names, its return type
After that you write function body query and this query language.
After that functions can be used same way like any other function in SQL

This is also possible to take as parameters or return complex structures like tables or arrays, and take variable number of parameters.

40
Q

What is procedure in PostgreSQL?

A

Procedure is PostgreSQL is some kind of function, stored in the database, that can be reused.
Procedure also has input parameters, but unlike function, it can’t return any values, at least directly, and also should be called through using CALL statement.

Also, procedure can’t be called inside query

Another big difference is that procedure can control transaction flow.
If procedure was not called as part of existing transaction, it creates new transaction.
If procedure was called inside existing transaction, it get some limitations, but still can control some aspects of transaction flow. For example, it won’t be able to use COMMIT or ROLLBACK statements, but will be able to use SAVEPOINT and ROLLBACK TO statements.

Also, as I already said, unlike functions, procedures can’t directly return any values, but there is a feature that can let us avoid this limitation.
We can declare one of function parameters using OUT or INOUT keyword before its name, and this case this parameter will be provided by reference, and through changing it inside procedure, it will also change outside of it.

41
Q

What is deadlock?

A

Deadlock is a situation, when two or more transaction have circular lock dependency and prevent each other from doing anything because all of them are waiting for another to finish and remove its locks to continue working.

As simplest example, let’s imagine that we have two transactions being executed same time

First transactions starts like:
BEGIN;
UPDATE table_1 SET column1 = ‘value1’ WHERE id = 3

Second transaction same time be like:
BEGIN
UPDATE table_2 SET column1 = ‘value1’ WHERE id = 3

And both of them did these operations during same period of time
After that, first transaction tries to:
UPDATE table_2 SET column1 = ‘value2’ WHERE id = 3

And second transaction be like:
Update table_1 SET column1 = ‘value2’ WHERE id = 3

And again, they do this at same time
Finally, both transactions put lock on tables 1 and 2 on same rows, and they are waiting for each other to finish, but this is not possible because they are in circular dependency.

Finally, these transaction will finish anyway if they have timeout, or if PostgreSQL will notice them and force them to finish.

Deadlocks are bad things because they waste time.
Basically, there are three things we can do to avoid deadlocks.
First is to make transactions lock resources in same order
Second is to make transactions as short and quick as possible
Third is to execute transactions with higher isolation level.

42
Q

What is filtered index?

A

Filtered index is an index that was created using WHERE statement with certain condition.
This index is created only for these entries that meet this condition and covers only part of table rows.

Filtered indWex can be useful in case when we don’t need to optimize whole database, and need only part of rows to be optimized. This case we can save some disk space without loses in performance.
For example, if we have news portal, then we can have index covering only these articles that were created during last year, and ignore earlier articles, because they are anyway rarely read.

43
Q

What is a primary key in SQL?

A

A primary key is a unique identifier for a record in a table. Field that is used as primary key has some restrictions. It must be unique, so no duplicates allowed, and also it must be not null.

44
Q

What is a foreign key in SQL?

A

Foreign key in SQL is a key that refer to field with primary key of another table.
Usually it is used to connect two tables together and show that there is a dependency between them.
Foreign key can have different behavior for cases when this primary key in another table is updated or entry containing primary key was deleted. There are three options:
First one is to delete or correspondingly update this dependent entry.
Second is to set foreign key to NULL.
Last option is to restrict changes. This case it will forbid parent table entry to be deleted or updated.

45
Q

What is normalisation in SQL?

A

Normalisation is the process of organising data in a database to reduce redundancy and improve convenience of working with data. It involves dividing larger tables into smaller tables and creating relationships between them.

46
Q

What is a trigger in SQL?

A

A trigger is a special type of stored procedure automatically executed in response to certain database events. It is used to enforce business rules and to provide additional functionality.

47
Q

What is a subquery in SQL?

A

A subquery is a query that is embedded within another query.

48
Q

What is a cursor in SQL?

A

A cursor is a database object used to retrieve and manipulate data row by row. It is used when you need to perform complex operations that cannot be done with a single SQL statement.

49
Q

What is the difference between a unique constraint and a primary key in SQL?

A

A unique constraint ensures that each record in a table is unique, but it does not provide a unique identifier for the record. A primary key provides a unique identifier for the record and ensures each record is unique.

50
Q

What is the difference between aggregate and scalar functions in SQL?

A

An aggregate function operates on a group of values and returns a single value, such as the sum or average of a column. A scalar function operates on a single value and returns a single value, such as the length or substring of a string.

51
Q

What are UNION, MINUS and INTERSECT commands?

A

The UNION operator is used to combine the results of two tables while also removing duplicate entries.
The MINUS operator is used to return rows from the first query but not from the second query.
The INTERSECT operator is used to combine the results of both queries into a single row.
Before running either of the above SQL statements, certain requirements must be satisfied –
Within the clause, each SELECT query must have the same number of columns.
The data types in the columns must also be comparable.
In each SELECT statement, the columns must be in the same order.

52
Q

What does COALESCE command do?

A

COALESCE command takes variable number of arguments, and returns first not null argument.
This can be used instead of IFNULL, used in MYSql

53
Q

What relation types can exist in SQL?

A

One-to-One, when one entry in one table is related to one entry in enother table.
For example, relation between employee and employee direct manager.

One-to-Many, when one entry in one table is related to many entries in another table.
For example, relation between person and person telephone numbers, because one person can have multiple telephone numbers.

Many-to-Many, when for each entry from one table we have multiple related entries in second table, and vice versa. This case for connecting two tables is used third table, storing only connections between them.
This can be, for example, case with students and courses. Each student can have multiple courses assigned, and each course can have multiple students.

54
Q

What is the difference between WHERE and HAVING?

A

WHERE is used to filter separate entries before grouping
HAVING is used to filter groups of entries after grouping them, so this is possible to use aggregate functions inside HAVING block.

55
Q

What is ACID?

A

ACID is a set of properties that an ideal transaction should have.
First is Atomicity. So, transaction is logically atomic operation, so it is completed successfully and this case affect the database contents, or it is not completed and this case all changes made inside this transaction are reverted.
Second in Consistency. So, transaction moves database from one consistent state, where all databse rules and restrictions are met, to another consistent state.
Third is isolation. Transaction results are not visible by other transaction before transaction has successfully finished, and also, parallel transaction never affect each other.
Final is durability. If transaction was completed, its changes must be saved even in case of system failure, and also, if transaction was not completed, then it should not affect database state even in case of system failure.

But, as I already said, these are properties of an ideal transaction, which never existed. In real world, some of these restrictions are sometimes violated. For example, isolation. In many databases, including Postgres, parallel transaction affect each other. They can block each other, and under some isolation levels, except of highest one, they can affect each other behavior, because one transaction can read data changed by another.

56
Q

What is trigger in SQL?

A

Trigger is a stored procedure, that is automatically called on certain event happened on certain table

57
Q

What is SQL injection?

A

SQL injection is a database hacking method, when hacker put SQL code into input data, and through this can perform queries inside database.
SQL injection can lead to data loss or steal or corrpution.

In order to avoid SQL injection this is important to sanitize the data you insert into your query. You can do this manually, or, what is better, use library or framework that you use to work with database to build query through use of prepared statements templates.

Also, you can use data validation both on server and client side to ensure that any data that you receive has only expected types.

Finally, everywhere where this is possible you must avoid using so called dynamic SQL, or in other words, situation, when you dynamically build query through concatenating SQL strings with input data.

58
Q

What is NULL in SQL?

A

Null in SQL is a special value, that means no data or unknown value.

There are few things that is important to remember about NULL.
First of all, NULL is not equal to empty string or zero, it is separate value.
Second, NULL can’t be compared with other values through equality operator. If we use it, we can get incorrect results. For example, equality like NULL equals NULL will return false as a result, which of course is not true. In order to check if value is NULL or not null, we must use IS NULL or IS NOT NULL statements.
Finally, we must remember, that all operations with NULL return NULL.

59
Q

How to use CASE operator in SQL?

A

We can use CASE operator to work with conditions.
Its logic is a bit different from that we expect from it based on our experience from imperative languages like Golang, JavaScript or Python.
In SQL, CASE operator works more like multiple “else if” statement.

So, conditions are written inside CASE block, which starts with CASE operator and ends with END operator.
Inside this block we define our conditions through WHEN and THEN statements, and we also can define certain default case that will be fulfilled if no other condition will be met through ELSE statement.

60
Q

What are window functions in SQL?

A

Window functions are functions that perform calculations over a set of rows, or in other words, window, associated with the current row and return a result for each row without grouping the data.

Inside our window we can use aggregate functions like SUM, AVG or COUNT, ranking functions like ROW_NUMBER, RANK or DENSE_RANK and shift functions like LAG, LEAD or FIRST_VALUE, for example.

61
Q

What scan types do you know?

A

Sequental scan
During sequental scan database reads all rows from table and check do they match condition or not.
Sequental scan is usually used when we don’t have index on column that we use in our where block or we expect that our query will get most of rows.

Index scan
During index scan database reads existing index in order to understand, what rows should be received from table, and then reads them from table, accessing them by row id.
It is used when we have index and number of rows we want to get is relatively small.

Index only scan
Index only scan is similar to index scan, but database won’t go to the table itself and will read all required information from index.
This is possible to use if all columns we interact with are included in index or are part of it.

Bitmap scan
Bitmap scan is used when we need to get a lot of rows, but not a lot enough to use sequental scan.
Bitmap scan uses index in order to understand, in what pages of table required rows can be stored, and later sequentally reads them in order to filter out required rows.

62
Q

Что такое PGPool и вообще для чего нужен пул при взаимодействии с базой данных?

A

PGPool это штука, которая позволяет подключаться к базе данных с использование пула предсозданных соединений, а не создавать каждый раз для этого новые.

При этом количество соединений регулируется как на стороне самой базы данных (общее максимальное количество подключений), так и в библиотеках для работы с пулами подключений отдельных сервисов (максимальное количество подключений от этого сервиса)

Причины:
1. Создание нового соединения требует много ресурсов.
2. Базы данных не могут эффективно обслуживать больше определённого количества соединений
3. Для оптимизации процессов все соединения создаются заранее и помещаются в пул соединений, откуда запросы к базе данных их могут брать.

63
Q

Как вы работали с задачами, связанными с базой данных?
Допустим, нужно оптимизировать запрос. Как вы локально это делали?

A

Локально запускали в Docker базу данных.
С помощью make команд обновляли её данными с stg, потом локально работали.

64
Q

Как вы работали с шардированием в ваших сервисах?

A

У нас была настроена инфраструктура для шардирования, использовалась корпоративная либа, но по факту у нас был только один шард.

А использовать либу и шардирование было обязательно, чтобы можно было в случае необходимости по-быстрому расшардировать сервис.