SQL важное Flashcards
How to create table in SQL?
This can be done using CREATE TABLE command. After CREATE TABLE words you must provide new table name, and after than declare table structure or use keyword AS in order to create new table using SELECT request.
Also, this is possible to write TEMPORARY keyword before TABLE keyword in order to create temporary table.
How to delete table in SQL?
Using DROP TABLE command. After this command you have to provide table name.
How to change SQL table structure?
This is possible to change table structure using ALTER TABLE command.
In this command you can add, drop and alter existing columns, foreign keys and constraints.
What is difference between regular tables and temporary tables in SQL?
Regular tables are permanent, but temporary tables exist only inside certain database session, and will be deleted when session ends.
What are constraints in SQL?
Constraints in SQL are certain rules what we can apply to certain columns in order to prevent unwanted behaviour. For example, these constraints can be check, unique, default or foreign key. Each of them adds certain behaviour scenarios what will be used in certain conditions. For example, using CHECK constaint we can add conditions what certain value should meet before being inserted into table, or using DEFAULT we can provide certain default value what will be inserted into certain column if value for this column was not provided.
What is primary key?
Primary key is unique id which can be used to identify certain entry.
What is foreign key?
Foreign key is a value which references to certain entry in another table. This can be used to logically connect few tables.
What is difference between local temporary and global temporary SQL tables.
Local tables exist during single session and accessible only inside this session.
Global tables exist while exist any session that uses it, and accessible from all sessions.
What is table normalization?
Table normalization is a process of splitting table into multiple subtables, keeping same logic. This sometimes can be necessary in order to simplify or optimize table structure.
For example, let’s imagine that we have table named Product. Table Product has few columns, which are id, name and category. Name and category are just text fields. The problem this case is that first of all, we have to write category in each entry, and waste more memory. Also, your categories can have mismatch, even if they should mean same things. For example, some categories can be written in only lower case, without capital letter, or something else. So that would be a problem to process this data.
In order to solve this problem, we can move categories into separate table, and instead of providing category name in product item, we can provide only category id. So, this will simplify working process.
What are JOINs in SQL?
JOIN is a SQL language mechanism, what allows us to connect two tables together inside query, and then access this resulting table during request
How INNER JOIN works?
In case of inner join in resulting table will remain only these entries from original tables for what we have matching values in both tables.
How LEFT JOIN works?
Lets imagine that we have two tables, A and B.
In case of LEFT JOIN of table B on table A, in resulting table we will have all data for entires from table A, but we will have data from table B only on these entries, where our join condition matches
How RIGHT JOIN works?
Lets imagine that we have two tables, A and B.
In case of RIGHT JOIN of table B on table A, in resulting table we will have all data for entires from table B, but we will have data from table A only on these entries, where our join condition matches
How FULL JOIN works?
Lets imagine that we have two tables, A and B.
In case of FULL JOIN of table B on table A, in resulting table we will have all data for entires from tables A and B, but for each entry in resulting table we will have full data only for these entries where our join condition matches. In other case, on certain entries we can have no information about part belonging to second table.
How SELF JOIN works?
SELF JOIN is a name for tables joining case, when we connect certain table with itself. This can be used in case when table references to itself
For example, imagine that we have table of employees, and for each employee we have id, name, and this employee’s boss name. So, if we want to get employee id, name, and boss name, we can use SELF JOIN.
How CROSS JOIN works and where it can be used?
CROSS JOIN is a specific kind of table joining, when we join each row of first table with each row of second table.
About where it can be used…
Okay, just as simplest example, lets imagine that we are selling T-shirts, and we have tables with T-shirts sizes and, for example, materials they can be made of. And if we want to have table, where we store each possible combination of T-shirt size and material. So for this purpose we can use CROSS JOIN
What is SQL index and how it works?
Index in SQL is a special data structure, what can be used in order to reduce queries time when you make request to certain columns in table.
By default, we have no indexes on colunms, except of id columns, and if you for want to select only certain entries from table, and make request like select name from person where age equals 20, then SQL will have to go through all entries, and compare each of them with your condition.
This is okay if your table is small, but what if it contains something like million, or ten millions entries? This case, of course, your request will be quite slow, because you have to do quite a lot of work to compare all these values to find only matching.
So, indexes are structures what SQL can use in order to optimize this process. Usually index is just a pre-computed binary tree or hash based on values of certain columns of our table, and this tree is used to reduce number of steps we have to do to find required entries.
But when you are trying to optimize something using indexes, you always have to remember, that indexes is not something magical, that just boosts performance with no side effects, but they have direct impact on other aspects of your database.
First of all, indexes are blocks of additional data about your table, and they have to be stored somewhere. So, if you have indexes, then you need more disk space than without them.
Also, indexes are based on the idea that all entries of tables are included in them, so every time you insert new entries, or deleting or updating old entries, indexes have to be updated too. This, of course, makes these operations slower, and sometimes if work with table where you have a lot of operations, what update it, but much smaller number reading requests, then best optimization could be to delete certain index.
How do multicolumn indexes work in SQL?
Multicolumn indexes are optimized for operations what filter entries first by first index column, then by second etc.
But this is important to remember that for best performance we have to filter entries using all these columns that are used for index.
So, for example, if we have table with columns A, B and C, and then create multicolumn index by A, B, C, then best performance we will get in case of queries filtering entries by all of these columns.
If we have queries, filtering only by two or one of them, then performance can be lower.
For example, if we filter only by A, performance will be almost similar to single-column index by A.
If we filter only by B, or by B and C, then performance will be significantly lower, because this case database will have to read more data from index before filtering.
And worst performance we will get if we try to filter only by column C. But, this still can be better than completely without index.
What is selectivity in SQL?
Selectivity in SQL is relation of number of distinct values inside column in a table to number of all entries in table.
Index selectivity varies in border from zero to one, where zero means that we have only one distinct value among entries, and selectivity of one means that every value in our table is unique.
The higher is table index selectivity, the better it is, because this case database works more effectively when using this index. Otherwise, if selectivity is low, for example, zero, or 0.5, then this can be better to completely avoid using index while performing request, because this case time what will be needed to get required information from index will be bigger, than time required to just directly scan all entries.
What is cluster index in SQL?
Cluster index in SQL is a specific approach to storing data. This case, we store both index and entries themselves inside one structure, instead of storing only identifiers and accessing entries by reference using them.
There can be only one cluster index in table, because this is not possible to store same entry in two different places in two different cluster indexes
What is covering index in SQL?
Covering index is index, what already has all information, required for performing certain request, so there are no need to access table itself this case.
But, covering indexes usually can be used only with very simple requests, or in cases when index was created in order to optimize one certain request.
How to optimize SQL query?
First of all, we can use indexes.
If we have a lot of select queries, and even better if we have a lot of similar select queries, then we can create index or indexes for columns that are used in them, and through this significantly reduce time required for filtering entries.
Second thing we can do is to analyze our requests. Are they optimal? Don’t we do something like recreation of non-materialized table on every iteration of subquery?
Or maybe we select too many columns inside subqueries, and don’t use them later? Or maybe our joins are ineffective and we use left join instead of inner joins?
Third, we should check, do we need all indexes we have in our table. If we have too many indexes that are rarely used, then sometimes this can be better to get rid of them, because they make database slower when inserting, updating or deleting rows.
Finally, maybe we do some expensive calculations in our queries, that could be avoided? For example, we need to select entries created during certain year, and we compare entry year with aim year using YEAR function. This case we can improve performance by using BETWEEN statement, or at least LIKE, directly comparing date strings without additional calculations.
What value types exist in SQL? (PostgresSQL)
- Integers, which are integer, bigint, bigserial or smallint
- Floats, like real or double
- Fixed length strings
- Variable length strings
- Dates and timestamps
- UUID
- Boolean
And some other more specific types
What are differences between PostgreSQL and other databases?
Well, first of all, expandability. That means that this is possible to add your own data types or functions in Postgres
Third is big number of supported data types, including for example special data types for working with arrays, json, xml or geometry types
Also, PostgreSQL is quite scalable and support replication and sharding.
What is replication?
Replication is one of mechanisms of databases scaling.
In case of replication, we copy our database, or part of our database to another place, and then use for certain purposes.
So, what are different replication scenarios?
First of all, this is so called Master-Slave or Primary-Secondary replication. In this case, we have one primary database, for serving insert, update, and delete operations, and few secondary databases, used only for reading data. This can be useful in case if we have to handle a huge amount of read requests, so this case these requests can be distributed between few nodes.
Second variant is Master-Master or Primary-Primary replication. In this case, all of nodes of our database network can be responsible for handling both read and write requests, but possible miscommunication and mistakes caused by outdated data stored on them makes this approach not so effective, and usually used only for cases, when we need to have another database node in reserve to be able to use it if our main node will fail by any reason.
Now, what about what these purposes can be?
First of all, increase of data accessibility. Even if one or few nodes will fail, the system still will be able to server requests, so reliability increases.
Second reason is database load reduction. If we have too many requests, this can be difficult to manage all of them on single server. So, if we use replication, we can distribute this load among multiple servers.
Next is geodistribution. Maybe, some of our users are quite far away from main node. This case, we can use additional nodes to store copy of our data nearer to them, leading to faster response time.
Finally, replication can be used in order to protect from data loss. Let’s imagine, that we had a fire in our datacenter. This case, if we had one node, all our date is probably lost. But if we have few nodes, we can use them to backup our data.
Also, talking about replication, replication can be synchronous and asynchronous.
In synchronous replication, when our primary database receive write request, it will mark transaction complete only after it will also update all secondary databases too.
In case of asynchronous replication, we don’t wait until secondary nodes update. This can make operations faster, but can lead to temporary data inconsistancy.