Highload Application Flashcards
What is Kafka?
Kafka is an open source software which provides a framework for storing, reading and analysing streaming data.
Something like Redis but with database-level reliability
What is Memcached?
Memcached is an open source, high-performance, distributed memory caching system intended to speed up dynamic web applications by reducing the database load. It is a key-value dictionary of strings, objects, etc., stored in the memory, resulting from database calls, API calls, or page rendering.
( Tools for caching )
What is ElasticSearch?
Elasticsearch is a real-time distributed and open source full-text search and analytics engine.
What is Solr?
Solr is a scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data
What is Reliability?
The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)
What is Maintainability?
Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively
What kind of errors can break Relibity?
- Hardware error ( database broken, turn of light etc… )
- Program error (Infinity recursion, cascade errors)
- Human factor ( Accidently remove something important)
What examples of scalability workload params do you know?
- Number of requests to webserver per second
- Number of read/write database request per second
- Number of active user in the chat
What is Hadoop?
Hadoop is an open-source software framework with ability to store and process huge amounts of any kind of data, quickly.
What is MapReduce?
MapReduce is a module in the Apache Hadoop open source ecosystem. We use MapReduce to write scalable applications that can do parallel processing to process a large amount of data on a large cluster of commodity hardware servers.
What is a rolling upgrade?
A rolling upgrade is an upgrade of a software version, performed without a noticeable down-time or other disruption of service. ( we have a load balancer and roll upgrade one by one on each server )
What is Shared-nothing architecture?
Shared Nothing Architecture (SNA) is a distributed computing architecture that consists of multiple separated nodes that don’t share resources. The nodes are independent and self-sufficient as they have their own disk space and memory. In such a system, the data set/workload is split into smaller sets (nodes) distributed into different parts of the system. Each node has its own memory, storage, and independent input/output interfaces.
What is replication?
Replication is the continuous copying of data changes from one database (publisher) to another database (subscriber).
What is a database table partitioning (секционирование/шардинг)?
Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.
What replication strategies do you know?
- single-leader ( main node send changes to others)
- multi-leader ( several main nodes send changes to others)
- leaderless ( send data to all nodes together)
What are the differences between synchronous, asynchronous and semi-synchronous replication?
- synchronous replication waits untill all child nodes receive all updated info and then send succcess status
- asynchronous replication doesn’t wait
- semi-synchronous replication ( works synchronous only with one node and asynchronous with others)