Backend Flashcards

Question

What is difference between different databases (MySQL, PostgreSQL, SQLITE)

Answer 1

Well... If to talk about differences between relational databases, then there can be quite a lot of differences depending on database. Depending on database, can differ data types, which can be stored inside its cells, or some other things like completeness of SQL language implementation, performance or extensibility. For example, lets take a look on three most popular databases, which are SQLite, MySQL and PostgreSQL. SQLite is a simple file-oriented database. It provides most basic functionality and implements database storage as single file on hard drive. It has some limitations, comparing to full SQL standarts, and also has quite limited amount of data types it can use. But, from another side, its quite compact and has quite good portability. SQLite database is perfect fit for small local storages, which should not be accessible from multiple places. MySQL is more advanced and most popular server-side database. It provides much wider column types set, good scalability and good safety out of the box, but still does not implement all SQL operations, and has some issues with read-write operations, happening at same time. PostreSQL is most advanced relational database, completely implementing all SQL features and having good concurrently executed tasks support, but has lower performance and higher complexity, comparing to other instruments.

Answer 2

The main difference is that in case of monolith, all our solution parts are stored together, in tight integration, and in case of microservices, our solution is separated into few independent modules, responsible for certain narrow functionality, and communicating with each other using Internet. About advantages and disadvantages of each approach. Monolith is good because it all project parts and together in one repository, and if you need to connect one part of application with another, you can do it easily and quickly. Monorepository is bad from that side, because if you need to access functionality of one application part from another application part, stored inside another monorepository, then you have to do this through HTTP requests or somehow else through the Internet, and this is much slower and not so convenient. Second moment, is scalability. This is much easier and much faster to expand module with narrow functionality, than expanding monolith. Also, this is onboarding on microservices projects is much faster and easier, comparing to monoliths, because this case new developer need to learn less new information. Finally, I would say, that this can be a problem if microservices are not completely agreed about format of their endpoints and functionality. There is needed more interaction between different teams to make everything work. Same time, in monolith, you always can see what works in what way and how these parts are connected to each other. ⏩Monolith: ✔️Simple: One codebase, easier to manage. ✔️Fast: No network delays, just quick performance. ✔️Unified: Easier data management. ❌Scaling: Hard to scale parts individually. ❌Changes: One change can affect the whole system. ⏩Microservices: ✔️Scalable: Scale what you need, when you need it. ✔️Flexible: Teams work independently, deploy faster. ✔️Resilient: One service fails, others keep going. ❌Complex: More moving parts to manage. ❌Slow: Network communication can add delays. ❌Data: Keeping everything in sync is tricky.

Answer 3

Рассказать про разработку идеи и внедрение новой системы ролей

Answer 4

ORM means Object Relational Mapping. The idea of ORM is to make it possible to connect database tables and objects in code with no need to write SQL requests manually. Instead of it, your classes in your code are used to create a table in database, all fields of class and database columns are linked, and when you do changes and save them, then database is automatically changed too. So, ORM makes it easier for regular developer to write code, related to databases, because you don’t need to manually write SQL code, so it saves time and reduces chances to write code with bugs. While not so experienced developer can write complicated and hard to read query, ORM usually generates optimized and reliable code, so it also improves safety in most cases. From the other side, you need another dependency in your project, and more code means more possible vulnerabilities. Also, code, generated by ORM in some cases can be slower, than code, written by programmer.

Answer 5

To make long story short, TCP is slower but more reliable, and UDP is faster, but less reliable. The main reason is because TCP ensures that it has successfully connected to the destination server before sending any infomation, and after sending information it ensures that information was delivered successfully. UDP, same time, doesn't care about that. In UDP, we just send data and forget about it. Was it delivered or not - that is not our problem anymore. And of course, here can appear question, why if UDP is so unreliable, it is still in use? Well, because in certain spheres we need speed and data loss is not so critical for us. For example, FPV drones video streaming. We need to send video as fast as possible, to give user enough time to react. If our video looks a bit strange because of lost fragments, this is not critical, but if we would use TCP here and ensure that all data chunks were delivered, we could have significant input lag, that could make the drone completely not possible to control. From other side, there are spheres where we need to have reliability. The simplest examples are internet webistes, or file transfer. In these examples we must have files intergrity to make them work, because even if one percent of this data would be lost, it would become almost useless.

Answer 6

That are logs and related instruments like Prometheus and Grafana. Using them you can dynamically see statistics for different things, from time needed to handle request sent to certain API endpoint, to current server load, this can be almost anything. We can observe certain metrics, and if they are out of allowed borders, cre Also, we can use tracing instruments, like Jaeger for network requests analysis or Postman for debugging. In addition, there are more specific things like database or code profilers, which can be used in order to improve performance.

Answer 7

JWT is a JSON Web Token. This is a special encrypted authentification key used for access to protected api endpoints. JWT is needed to avoid sending user data like login and password each time with request, and make this process more safe. The reason main advantage of JWT over this procedure is that, first of all, JWT has certain deadlines, and become outdates quite quickly, comparing to time during what user password remains the same, for example. So, even if somebody could decrypt the request to the server itself, there would be no useful information for this hacker, because server won't accept this token anymore. Second reason is that in case of JWT, there are no need to go to the database and compare data in database and in JWT, because token already gurantee correctness of data. There are two types of JWT tokens: First one is access token. This is token that is used for accessing protected resources. It can be used multiple times, but has limited lifespan, usually something like 15 minutes. Second is a refresh token. Refresh token is used to generate new access token and refresh token, after existing access token will expire. It can be used only once, but has long lifespan, usually few days. JWT consist of three parts, divided by dot. That are header, body and signature Header contains infromation about encryption type, body contains token infromation like expiration time, user login, etc, and signature is an encrypted header plus body, and used to verify their authentity. Header and body are public and not encrypted. The whole system works that way, that after user successfully authorized, server generates this JWT token, and its signature part is encrypted using secret key. After that token is stored in client cookies or localStorage Every time client want to access protected API endpoints, it also send this token under Authorization request header. When server receive token, it uses its public part to re-create signature, and if generated signature, and signature from token are same, then this proves that token is valid and can be used for authorization purposes.

Answer 8

Idendification is a process of receiving information that is used to identify user, like login or email Authentication is a process of user identity verification using password or any other method. Authorization is a process of giving user permissions to do certain things through providing him special token, for example.

Answer 9

В общем, у нас один из сервисов это сервис переработки, и по нему специальная джоба раз в день собирает данные для статистики из PostgreSQL базы данных, денормализует, и отправляет в Clickhouse для того, чтобы потом можно было проанализировать какие-то метрики. Ну то есть, какие за тот или иной период были переработки, в каких командах, каким образом они компенсировались, сколько это по итогу стоило компании, как это повлияло на текучку разработчиков, и всё в таком духе. И я как раз несколько раз когда происходили какие-то изменения в структуре базы данных, переделывал эту джобу, чтобы она не падала с ошибкой.

Answer 10

Да, я пишу тесты на свой код. Я писал только юнит-тесты.

Answer 11

Prometheus is a metrics collection and storage system based on a time-series database, which stores data aggregated over time. To collect metrics, Prometheus periodically scrapes target services, retrieving data from them. As far as I know, Prometheus has some built-in capabilities for data visualization, but in our projects, we used Grafana separately for that purpose.

Answer 12

Ну, смотрите... На самом деле у Kafka может быть три потенциальные проблемы. Первая - недоступность системы из-за ребалансировки. Вторая - перегруженность дискового пространства. Третья - в старых версиях Kafka могло возникнуть дублирование сообщений, потому что существовали только варианты доставки at most once и at least once, а exactly once не было. На практике я ни с какой из них не сталкивался, потому что в первом случае всё решается настройкой консумера на периодические ретраи запросов, во втором - всё это решается заранее, так как настроены алерты на кейсы, когда уже использовано какое-то большое количество памяти, а третий в новых версиях kafka исправлен тем, что была добавлена exactly once доставка. Суть в том, что Kafka не статична, то есть, в рамках одного и того же топика могут меняться консумеры, например из-за того, что количество подов изменилось. Из-за этого время от времени могут происходить такие ситуции, при которых часть консумеров обрабатывает очень много партиций, а другая часть бездействует. В этом случае данные обрабатываются медленно, но при этом мощности простаивают. Для того, чтобы этого избежать нужна ребалансировка, то есть перераспределение партиций между консумерами. Во время этого процесса система недоступна.

Answer 13

Это реализуется через consumer groups Консумеры в одной группе могут обрабатывают одно и то же сообщение один раз.

Answer 14

Ключ партиционирования - это штука, которая используется для определения той партиции, куда будет записано сообщение. Смысл в том, что сообщения с одним ключом партиционирования пишутся всегда в одну и ту же партицию.

Answer 15

Graceful shutdown is a process of correct application shutdown, when we don't just kill all processes that we have and exit immediately, but instead wait until all tasks that are currently in work will be finished, ensure that we won't have any unexpected data losses, and only after that exit from our program.

Answer 16

Есть системные сигналы. Мы можем их прослушивать внутри приложения и при получении понимаем что надо заканчивать работу.

Backend Flashcards

(40 cards)