(F) Distributed Databases Flashcards
What is Distributed databases concepts
Distributed database (DDB) consists of a collection of multiple, logically interrelated databases distributed over a computer network. A DDB system is a software that handles a Distributed database (DDB),making it so the distribution is transparent to the users.
The 2 Architectures of Distributed Databases
The architecture of a distributed database can vary based on the network structure and the database’s functional requirements:
- Client-server: Traditional model where one or more client node request data from a central server node which manages database access and queries.
- Peer-toPeer (P2P) - each node in the network acts both as a client and a server. Every node can contribute resources, such as storage, and processing power, and can directly communicate and exchange data with other nodes without the need for central server.
What are the challenges with distribution in distributed databases
Distributing databases across different sides introduces several challenges:
- Performance: The geographical spread of nodes can result in latency, which impacts the time taken for transaction and query responses
- Complexity in Management: The addition of nodes to a distributed system increases the complexity of managing the database, requiring more sophisticated approaches to transaction processing and query optimization
What are the protocols in Distributed databases?
Protocols in distributed databases are crucial for managing how data is accessed and maintained across different sites:
Replication protocols: These are crucial for ensuring data consistency across different nodes.
———————————-
1. Eager replication: involves immediately copying data to multiple nodes (propagated to all nodes) within the network. The ACID properties apply to all copy updates (Read one/Write All) Ensures that the data is readily available on mulple nodes, increasing redundancy and better availability.
- Lazy replication: Updates are first applied to one node and then propagated to others. This can lead to temporary inconsistencies but improves performance due to asynchronous updates.
- Centralized: There is only one copy which can be update (master), all others (slaves) are updated reflecting the changes to the master
- Distributed: any site within the network that holds a copy of the data item has the capability to update its value
synchronization types in distributed databases
Synchronous (Eager) synchronization
- Eager sync: changes to the databases are propagated immediately to all nodes carrying characteristics such as the following:
1. guarantees strong consistency,
2. updates are atomic across all nodes
3. could result in higher transaction latency because the transaction has to update al lsides (longer execution time)
Asynchronous (Lazy) Synchronization
- Lazy sync: Updates are first applied to one node, and then these changes are propagated to other notes at a later time. Async sych has these characteristics:
1. A transaction is always local (good response time)
2. can lead to temporary inconsistencies between nodes until the update is propagated
3.suitable for environments where eventual consistency is acceptable
Replication styles
Eager replication: updates are propagated immediately within the transaction’s scope, ensuring no inconsistencies and that each copy is up-to-date. However, this can lead to longer transaction times and potentially lower availability due to the need to update all sites simultaneously
Lazy replication: updates are applied to one copy first, and then propagated to others after the transaction commits, allowing for better response times. This can cause temporary inconsistencies and changes to all copies are not guaranteed
Centralized replication: Only one master copy can be updated, and all other slave copies reflect these changes. However, this can create a high load on the master and potentially stale data at the slave sites.
Distributed replication: Changes can be initiated at any site owning a copy. However, this requires synchronization across copes to maintain consistency.
Horizontal and vertical scalability in distributed database systems
Horizontal scalability (scale-out): This involves adding more servers to the system to handle increased load. It’s a way to scale the system by distributing the load across multiple machines. This form of scalability is often easier and more cost-effecitve as it allows for the addition of standard, off-the-shelf hardware and software components.
Vertical scalability (scale-up): This involves increasing the capacity of a single server, such as adding more CPUs, memory, or storage to handle more transactions or data. Vertical scaling can eventually hit limits due to the maximum capacity of a single machine’s hardware and the increasing cost of high–end hardware.
Advantages & Disadvantages of distributed databases
Advantage: Distributed databases can spread data across different locations, making systems more reliable and quicker, while also allowing for easier growth. This distribution ensures that the load is balanced and that the system can handle more users and transactions without a single point of failure.
Disadvantage: It can be complex to set up and manage, and it’s tough to keep all the data consistent across various locations. This complexity often requires sophisticated synchronization and replication techniques to maintain data accuracy and integrity.