Backend Flashcards
What is difference between monolith and microservices?
Monolith is a development approach, when we have all our code inside one project. So, we have backend, frontend, different parts of our application, all in same space.
Microservice architecture means, that we split our application into smaller logical parts, each responsible only for narrow functionality, and treat each of them as separate project.
Both monolith and microservice architecture have their advantages and disadvantages.
Microservice advantages are next:
First of all, our microservices can be developed by smaller and separate teams. That means potentially higher programmer performance, because they need less time to communicate, plan their work, and also when new team member start working, there are no need to deeply learn how whole application works, you just need to learn how your microservice work.
Next thing, it that microservice are independent from each other. They can be developed, scaled and delivered separately. Also, if one of microservices will crash, this won’t lead to problems in other services, except if they directly depend on each other.
Third advantage, is that micrcoservices can be developed using different technologies and programming languages. That mean, that first of all, you can use language, that is most suitable for your needs, and not limited by need to select one more-less universal language.
To sum up, microservices give you flexibility, scalability and convenience in most cases.
From other side, monolith.
Monolith also has certain advantages over microservice architecture.
First of all, monolith is easier to manage and monitor. You have one service, not one hundred, and all in one place.
Second, is problem with data consistency. When you data is distributed among multiple microservices, you can have problems with keeping all data consistent at every moment of time. This can lead to potential bugs.
Third, is performance and data transfer. In monoliths, when you need to access other part of application, you just call required functions or methods. In microservices architecture, you must send request using network. This is much slower and reduces performance of whole system.
Finally, I would say, that testing in certain aspects can become more difficult. For example, that is much more difficult to test interaction between application parts in microservices, than in monolith.
To sum up, I would say that monolith is good solution for small and low load applications, because this case it makes development faster and more reliable, and microservices are perfect for large companies experiencing high load, because it provides better scalability and convenience.
What are first class functions?
First class functions are functions what can be treated as variables. They can be passed into other function as parameter, or received from other function as function return value.
What is Docker and what is it used for?
Docker is an instrument, that allows for creation and deployments of containers.
Container is isolated and portable software package, containing all required all components required to run certain applications in any environment
What is Kafka and what it is used for?
Kafka is a message broker, used to store and distribute messages.
The idea of Kafka is to handle messages, sent by producers, store them for a long time in order they were sent, and make them accessible for multiple consumers, and do all of these things with high performance and scalability.
So, when, in what situation we could need Kafka?
I would say, that typical use case is a situation when we have multiple services what have to communicate asynchronously and be sure that any important infromation won’t be lost.
For example, let’s imagine that we have two services, let’s name them A and B. Service A send request to service B, providing certain information, and want to receive response containing results of processing this information.
There are two potential problems we have.
First is that information can be lost. For example, we sent request from service A to service B, service B received it, but accidentally server B was turned off, so it lost all the information. And if we didn’t store this information on service A, and we don’t have any mechanisms for this exact case, that will resend this request after service B will be turned on, we have all chances to lose all information related to this request.
But if we use Kafka, then that is not a problem, because after server B reboot it will just check all messages from Kafka, which were not successfully handled by it, and restart the process, so no information will be lost at all.
Second problem is resources consumption. Yes, of course, we can try to handle this request asynchronously, and while server B is processing our request, thread on server A can do something else, but this anyway consumes processing power and memory, which can be quite noticeable if request handling takes long time.
But if we use Kafka, service A can just send a message containing all of the required information, and forget about it. After that service B reads this message, do some work, and send another message containing request result, which can be later read by service A.
This case we increase performance of our services.
Also, we need Kafka in case if we have multiple consumers, consuming same information.
For example, let’s imagine that we have service handling employees, and the employee resigned. And we need to send information about that to service what works with documents, to service what works with salaries, and service that works, I don’t know, with equipment issued to an employee.
Without Kafka, you will have to send three separate HTTP requests to each of these servers. It does not sound like something time-saving, isn’t it?
That’s why you need Kafka, because this case using it you can send one message and forget about it, and this message can be read by multiple consumers later.
On my current workplace in our team it is used mainly in order to notify other services about events related to employee status and employee documents. Was new employee hired or fired, which document employee signed up and which documents employee refused to sign, everything like that.
What is Kafka topic?
Kafka topic is some kind of channel dedicated only for certain type of messages with certain scructure.
It is needed in order to simplify messages handling.
Without it, if we had one common channel, we would need to somehow manually detect message structure, its origin and purpose on consumer side, which is not convenient.
But if we have topics, then consumers can be sure about purpose and structure of each message they read from them.
What is difference between different delivery gurantees types?
- At most once
- At least once
In case of at most once delivery, message will be delivered once or won’t be delivered in case of any problems on its way.
This is fastest variant of delivery, because this case we don’t have to save our messages on disk to gurantee their delivery, and we don’t need to implement additional mechanisms to ensure they were delivered.
This can be used in cases when we can afford losing our messages.
For example, this can be web analythics collection, or low-importance logs.
At least once delivery gurantee that message will be delivered once or more times.
This case message is stored on disk, and when it is send to Kafka, or read from Kafka, Kafka ensures that all these steps were successfull, and for example will tell producer if message uploading failed.
But this case there is a chance of message duplication, so we need to implement certain mechanisms hanlding it on consumer side.
What does it mean for Kafka consumer do be idempotent?
Idempotentance of consumer means that consumer through its internal mechanisms implements protection agains messages duplication
What is difference between classical databases and columnar databases?
The difference is that in classical SQL databases information is stored in rows, so whole table splitted into separate items, where each item is kind of object what consist of values of columns that you have in table.
Same time, in columnar databases, our data is stored by columns.
Practically, that means that when we make a request in regular database, then database has to scan the whole row, and only after reading it it can get values from certain columns that we need. In case of columnar databases, we don’t have such problem, because we read only these columns that we need and ignore rest of data.
Second major difference is use scenarios. In case of regular database this is situation when we have frequent inserts, updates and deletes, and we need access to all or to most of field in a row. Typical example is request returning certain user from database, with all its fields.
In case of columnar databases, typical use is selection of thousands or maybe millions of items, containing only few certain columns. Also, ususally we don’t make small requests or inserts, we usually operate with large amounts of data.
What is Kubernetes
Kubernetes is a software for containers management, which allows to automatize containers administration, monitoring, deployment and scaling of applications inside containers.
Kuberneted helps to make application always accessible, more suitable for operation under high load, and easily recoverable.
Kubernetes operates a group of servers, connected to each other. This group is called a cluster, and each server inside the group is called a node.
Nodes are splitted into two types - master node and worker node.
Master node is responsible for control and tasks distribution between worker nodes. It consists of few main blocks, which are:
- API server, block responsible for communication between master and workers
- Controller Manager, responsible for current cluster state observing and control, moving system from current state into desired state through pods management
- Schelduler, block which purpose is to decide what containers should be placed on what nodes depending on different factors, like current node load or potential node performance
- etcd storage. This is simple key-value database, where is stored information about cluster, like configuration data, nodes statuses, containers statuses etc
Worker node also consists of few blocks, which are:
- So called “kubelet”, communicating with master node and receiving instructions about what and how should work on this exact node.
- Second component is container runtime, responsible for containers images, staring and stopping it and resources management
- Kube-proxy, reponsible for communication and internal network balancing
Also, the main mechanism that Kubernetes is using to reach its goals is pods. Pods are some kind of lightweight wrappers over containers, and through distributing them between nodes, adding or removing new pods, and recovering containers inside pods in case of problems Kubernetes shares the load between different nodes and makes whole system reliable and recoverable.
// INFO
Container is isolated and portable software package, containing all required all components required to run certain applications in any environment
What is HTTP protocol?
HTTP is a text protocol that operates on the request-response principle.
An HTTP request consists of a start line (including a method, resource identifier, and protocol version), headers, and a message body.
An HTTP response is similar in structure to an HTTP request, but the start line contains a status code instead of a method and URI.
What are main HTTP methods?
GET, POST, PUT, DELETE
What is the difference between REST and RPC?
Rest is an architectural style for developing applications, which are using network and work in which is based on interaction with resources. REST uses some restrictions and patterns to keep system simple, reliable and scalable.
Each resource is represented by an API endpoint.
Each request is independent; servers don’t store client state
Final principle is code on demand. So, client does not have any source code on itself, instead, sever is sending source code to client, and then client executes it. Through this we get flexibility, because we can freely change source code on server side and it will automatically affect on final result on client side.
REST is best for standard, scalable APIs with predictable CRUD operations.
RPC (Remote Procedure Call) is a call-based model where the client executes remote functions as if they were local.
It is focused on function execution, not on CRUD operations.
Also RPC support more formats than REST. It support JSON, XML and also binary formats, like gRPC.
RPC is optimized for high-performance needs.
RPC is best for microservices or high-performance systems with actions over resources management.
What are cookies?
Cookies are small pieces with limited lifetime, can be sent to server through special HTTP headers. Both server and client can read and write cookies, and this is possible to make cookies accessible only from server side, which is useful for storing authorization tokens. Usually they are used to store small pieces of information about, for example, authorization session inside application, so this case they store access token, for example.
What is a closure, and how/why would you use one?
Closure is a function, which remembers its lexical environment variables when it was created, and then can access them later.
I would use a closure in case if I needed to create a callback, or in more specific cases, for example if I needed to create a function that returns another function that acts like a counter, returning increased value each time it is called.
Can you give an example of a curry function and why this syntax offers an advantage?
Curry function is a function of higher order, which takes arguments one by one and on each step except of last, returns another function.
Why this syntax offers an advantage? Well, probably is some cases we could need to have intermediate function with already partially applied arguments, and then we can use it in code later, without need to put again and again same parameters.