Distributed databases Flashcards
What is the motivation for distributed databases?
- optimises queries
- Say you make a query about sales in Manchester, you don’t need to specify I want the table for Manchester then I want to make this query on it, give me the output of that one
- Instead you’re saying I want the output for here and the DBSM will then get all of this information and pass you the output for the query you requested.
What does each site store data about in distributed databases?
- Each site stores only data primarily relevant to it
- Distributed DBMS provide access to data at all sites. One site needs to be able to access information from another site.
When are distributed databases needed?
More general: large organisations/companies
◦ …with different branches or offices or sub-companies
◦ …or simply so large that one computer can’t handle all the request/transactions you want to do, so distributed databases can handle this.
Why are distributed databases useful for providing access to large datasets to many users?
◦ Distribute data over several computers –don’t have to be identical in software or hardware
◦ Computers could be at geographically different physical locations (but also in the same place) depending on what system you’re dealing with.
What are the advantages of using distributed databases?
◦ Balance workload & network traffic, handle multiple queries simultaneously, twice as many operations executed by having two sites.
◦ Easier to extend capacity or scale to higher number of users, just plug in more hardware.
◦ If there’s a physical damage to one site the other locations remain undamaged, for example a fire occurs.
What is the formal definition of Distributed Databases
◦ Collection of multiple logically interrelated databases that are distributed over a computer network.
What does DDBMS stand for?
- a distributed database management system
what does a graph representing a DDBMS contain?
-different sites are known as nodes - they correspond to where the database is stored.
- lines connecting them represent network links. In general you may not have network links between every pair.
What are the advantages of DDBMS?
- Performance improvements (quicker querying as each DB is handling less users)
- Scalability (Easier extension of the system capacity/performance)
- Resilience (Data can be replicated at geographically separate sites. Catastrophic failures don’t affect the entire system)
What are the advantages of DDBMSs?
- You need more computer hardware.
- It’s more complicated
- It doesn’t scale perfectly.
What are the two methods of transparency in DDBMS?
Fragmentation and replication
what is transparency in DDBMS?
Keeps data hidden from people accessing the database
What is fragmentation?
- Taking a database and splitting it into different parts to store it at different nodes#
- two types horizontal and vertical fragmentation
Describe horizontal fragmentation
- Take some rows out and then you store these rows at different sites
- so you might have one bunch of rows stored on one side and another bunch of rows at another site
- The sets don’t have to be disjoint, but it’s fine either way.
How do you get the original table back after doing horizontal fragmentation?
- You take the union of these fragments
Describe vertical fragmentation
- Start with a table and we cut different columns out and store them at different sites.
- With this method the tables should typically have some overlap because the method for restoring the original table is by taking the natural join of all these columns.
- A good/common method for vertical fragmentation restoration is that all the sites have an overlap of one specific key attribute.
How is the database transparent due to using fragmentation?
The user doesn’t actually see all these fragmentations, they just see the full relation when they query R because the DBMS puts all the fragmentations back together
Which methods for transparency are the most commonly used?
Fragmentation and replication transparency
What is typical with fragmentation so a DBMS can put all the fragments back together?
Typically tuples are stored at a particular site according to a common value of a specific attribute
- so all rows that have a as their type may be stored at the site in Liverpool
What is special about fragmentation?
- You can mix vertical and horizontal fragmentation
- You can start with vertical then mix in horizontal, or the other way around and do it multiple times
- it’s most common to only do a few steps of this because whenever you do a vertical fragmentation you have fewer columns, so you can only split it so many times.
Why does redundancy improve resilience?
- If we keep multiple copies of the same item, also known as application transparency
- This improves resilience in case of failures
Example: we have a distributed db: we run a query about sales in manchester, but if the manchester branch fails we can’t run the query: - If we have other sites keep copies of the fragments stored at Manchester, this allows us to answer queries involving data from there.
- So it allows us to answer the queries of data from systems that have failed by storing copies
Why does redundancy improve efficiency?
Example: we have a query about suppliers, if other sites keep copies of data about the suppliers, then we may be able to execute it faster by getting different parts of the query from different sites.
- Allows stores to answer queries involving suppliers without establishing a connection to the central office