ADBMS - unit 3 & 4: intro to DDBMS & Arch Flashcards
What is distributed database system?
A distributed database is a database that runs and stores data across multiple computers, as opposed to doing everything on a single machine.
What is Node or Instance?
Typically, distributed database systems operate on two or more interconnected servers on a computer network. Each location where a version of the database is running is often called an instance or a node.
How instance runs on centralized and on distributed?
A distributed database, for example, might have instances running in New York, Ohio, and California. Or it might have instances running on three separate machines in New York. A traditional single-instance database, in contrast, only runs in a single location on a single machine.
what is distributed in D dbms
- program logic
- functions
- data
- control
synonymous terms for D DBMS
distributed data processing
multiprocessors / multi computers
satellite processing
backend processing
dedicated / special purpose computers
timeshared systems
functionally modular systems
peer to peer systems
What is DDB system
- DDB is collection of multiple, logically interrelated databases distributes over a computer network
what is D DBM system software
D-DBMS is the s/w that manages the DDB and provides an access mechanism that makes this distribution transparent to the users
D DBMS = DB + communication
Why not D DBMS
timesharing computer system
loosely or tightly coupled multiprocessor system
database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node
distributed dbms promises
transparent management of distributed, fragmented, and replicated data
improved reliability / availability through distributed transactions
improved performance
easier and more economical system expansion
what is meaning of “Promises of Distributed Databases”
Promises of distributed databases, meaning advantages of distributed databases
what is first promise of distributed database
First promise or advantage of distributed database is
1. transparency of data, fragmentation and replication
what is transparency transparency
Explain transparency of data, fragmentation and replication
- transparency refers to separation of the higher level semantics of a system from lower level implementation issues
- DDBMS hides all the added complexities of distribute allowing users to think that they all working with a single centralized systems
eg:
engineering firm that has offices in boston, mumbai, paris, and delhi
- they run projects and maintain database of these employees ex: projects, employees etc
- let us assume that the database is relational and stored in following two relations
Emp( eno, ename, title )
Proj( Pno, Pname, Budget )
- the other relation to store salary information
SAL(Title, Amt)
the 4 relation to know the assign projects with duration and responsibility indicates as
ASG( eno, pno, resp, dur )
if we want to find out the names and employees who worked on a project for more than 12 months the query that we all going to write is:
select ename amt
from ASG.dur > 12
AND Emp.eno = ASG.eno
AND sal.title = emp.title
based on queries it is going to search in different databases of boston paris etc…
in order to quick processing of query we are going to partition each of the relations and store each partition at a different siet
this is known as fragmentation
- data independence DI
- DI is a fundamental form of transparency
- it is capacity of changing the database scheme at one level of database system without efficiency the schema at the next higher level
2 types
- logical DI
- physical DI
LDI stores information about how data is managed inside
PDI deals with hiding the details of the storage structure from user applications
if network transparency / distribution transparency
- other than data the user should be protected from the operational details of the network
- allowing a user to access a resource (application program or data) without the user needing to know whether the resource is located on the local machine or on a remote machine
- replication transparency
- replication transparency ensures that replication of databases are hidden from the users
- it enables users to query upon a table as if only a single copy of the table exists
- fragmentation transparency
- dividing each database relation into smaller fragments and treat each fragment as a separate database object
- this is for reasons of performance, availability and reliability
so to provide easy and efficient access of the DBMS we need to have fully transparency
what is second advantage of promise of distributed database
reliability through distributed transactions
explain reliability through distributed transactions
- distributed DBMSs are designed to improve reliability by having replicated components results in eliminating failures
- so the failure creates problem to the entire system
- in distributed proper case is taken such that instead of failure part user may be permitted to access other parts of the distributed database
- this is useful to support for distributed transactions
- a transaction is a basic unit of consistent and reliable computing, consisting of a sequence of database operations executed as an atomic action
ex: transaction based on the engineering firm
- assuming that there is an application that updates the salaries of all the employees by 10%
- in the middle of this transaction if system fails we would like the DBMS to be able to determine, pon recovery, where it left off and continues with its operation
- distributed transactions execute at a no. of sited at which they access the local database
- here we are providing a facility that there is no interruption in any transaction
-
what is third promise or advantage of distributed databases
improved performance
- a distributed DBMS fragments the conceptual database, this is also called as data localization
advantages
1. since each site handles only a portion of the database, correction for CPU and I/O services is not severe
- localization reduce remote access delays
- implementation of inherent parallelism of distributed system
- it has enter query and infra query parallelism
- inter query parallelism - to execute multiple queries at the same time
- intra query parallelism is achieved by breaking up a single query into a number of sub queries each of which is executed at a different site, accessing a different part of the distributed database
what is fourth advantage or promise of distributed database
- easier system expansion
in a distributed environment it is mush easier to accommodate increasing database sizes
in general expansion can be handled by adding processing and storage power to the network
- this also depends on the overhead of distribution
- one aspect of easier system expansion is economics
- it normally costs much less to put together a system of smaller computers with equivalent power of a single big machine
what are some problem areas
complexity
data replication
overall cost
security issue
integrity control
lacking standards
explain complex nature of D DBMS
distributed databases, are network of many computers present at dif. location and they provide an outstanding level of performance, availability and of course reliability
therefore nature of distributed database is more complex than centralized database
we also need complex and advance software’s to manage distributed databases
also it ensures no data replication, which adds even more complexity in its nature
explain overall cost in detail
costs such as maintenance cost, procurement cost, hardware cost, network / communication costs, labor costs etc, adds up to the overall cost and make it costlier than normal DBMS
explain security issues of distributed databases
- along with maintaining no data redundancy/duplication, security of data as well as a network is a prime concern
- network can be easily attacked for data theft and misuse
Explain integrity control
- in vast distributed database system, maintaining data consistency is important
- all changes made to data at one site must be reflected on all the sites
- the communication and processing cost is high in distributed DBMS in order to enforce the integrity of data