BASE DE DATOS 2 Flashcards
APRENDER
Fundamental, Additional, Union compatibility
Relational Algebra Operators
Selection, Projection, Union, Set difference, Cartesian product
Fundamental
Intersection, 0-join, Natural join, Semijoin, Division
Additional
Produces a horizontal subset of the operand relation
(figura que parece una o con colita arriba) F(R)={t |tÎR and F(t) is true}
Selection
Produces a vertical slice of a relation
(figura que parece pi)A1,…,An
(R)={t[A1,…, An] | tÎR}
Projection
Similar to set unión
R U S={t | t Î R or t Î S}
Union
General Form
R – S = {t |t Î R and t Ï S}
Set Difference
R × S = {t [A1,…,Ak1
, Ak1+1,…,Ak1+k2
] | t[A1,…,Ak1
] Î R and
t[Ak1+1,…,Ak1+k2
] Î S}
Cartesian (Cross) Product
R (la figura de interseccion que parece una: n)S = {t | t Î R and t Î S}
= R – (R – S)
Intersection
is derivate of Cartesian product. There are various forms of join. The primary classification is between inner join and outer joi
0-Join
is derivate of Cartesian product. There are various forms of join. The primary classification is between inner join and outer joi
0-Join
The above example is a special case of 0-join which is called the equi-join.
Equi-join
It is a equi-join of two relations R and S over an attribute (or attributes) common to both R and S and projecting out one copy of those attributes
Natural join
Inner join requires the joined tuples from two operand relations to satisfy the join predicate. In contrast, in outer join tuples exist in the result relation regardless
Outer-Join
The tuples from the left operand relation are always in the result.
Left outer join
The tuples from the right operand relation are always in the result.
Right outer join.
Tuples from both relations are always in the result.
Full outer join
defined over the set of attributes A, by relation S, defined over the set of attributes B, is the subset of tuples of R that participate in the join of R with S
Semijoin
is that it decreases the number of tuples that need to be handled to form the join. It is important because it usually results in a decreased number of secondary storage access by making better use of memory.
The advantage of semijoin
Specify the properties that the result should hold Tuple relational calculus
Domain relational calculus
Relational Calculus
An interconnected collection of autonomous computers that are capable of exchanging information among themselves.
Computer Network
Irregular, Bus, Star, Ring, Mesh
Types of Networks
Point-to-point (unicast), Broadcast (multi-point)
Communication Schemes
One or more (direct or indirect) links between each pair of nodes
Communication always between two nodes
Receiver and sender are identified by their addresses included in the message header Message may follow one of many links between the sender and receiver using switching or routing
Point-to-point (unicast)
Messages are transmitted over a shared channel and received by all the nodes
Each node checks the address and if it not the intended recipient, ignores
Multi-cast: special case
Broadcast (multi-point)
Twisted pair
Coaxial
Fiber optic cable
Satellite
Microwave
Wireless
Communication Alternatives
Software that ensures error-free, reliable and efficient communication between hosts
Communication Protocols
It has been suggested that the organization of distributed systems can be investigated along three orthogonal dimensions
Dimensions of the Problem
each application and its data execute at one site, and there is not communication with any other program.
No sharing
all the programs are replicated at all sites, but data files are not. Accordingly, user requests are handled at the site where they originate, and the necessary data files are moved around the network.
Data sharing
both data and programs may be shared, meaning that a program at a given site can request a service from another program at second site.
Data-plus program sharing
The user requests do not change over time (difficult to find in real apps)
Static
The user request change over time more commonly in DDBS
Dynamic
is that designers do not have any information about how users will access the database.
One possibility
is that designers have complete information, where the access pattern can reasonably be predicted and do not deviate signifiable from these predictions.
The second possibility
is that designers have partial information, where there are deviations from the predictions.
The third possibility
There are two approaches for developing any database, the top-down method and the bottom-up method. While these approaches appear radically different, they share the common goal of uniting a system by describing all of the interaction between the processes.
Distribution Design
mostly in designing systems from scratch
mostly in homogeneous systems
Top-down
when the databases already exist at a number of sites
Bottom-up
method starts from the general and moves to the specific. In other words, you start with a general idea of what is needed for the system and then work your way down to the more specific details of how the system will interact.
The top-down design
Defines the environment of the system and elicits both the data and processing needs of all potential database users. Also specifies where the final system is expected to stand with respect to the objectives of a distributed DBMS.
Requirement analysis.
Defines the environment of the system and elicits both the data and processing needs of all potential database users. Also specifies where the final system is expected to stand with respect to the objectives of a distributed DBMS.
Requirement analysis.
This activity deals with defining the interfaces for end users.
View design
This is the process by which the enterprise is examined to determine entity types and relationships among these entities.
Conceptual design
it is quite common to divide them into sub-relations, called fragments, which are then distributed
Rather than distributing relations
consists of two steps: fragmentation and allocation
the distribution design activity
the important issue is the appropriate unit of distribution to fragmentation. A relation per se (table), is not a suitable unit because if an application have views defined on a given relation, this should reside all at different sites
With respect to fragmentation
two alternatives can be followed with the entire relation being the unit of distribution. Either the relation is not replicated and is stored at only one site, or it is replicated at all or some of the sites where the application reside.
For the above
the issue is one of finding alternative ways of dividing a table into smaller ones. There are two alternatives for this: dividing it horizontally or dividing it vertically.
Relation instances are essentially tables
each being treated as a unit, permits a number of transactions to execute concurrently. In addition, the fragmentation of relations typically results in the parallel execution of a single query by dividing it into a set of subqueries that operates on fragments.
the decomposition of a relation into fragments
is an important decision that affects the performance of query execution.
The extend to which the database should be fragmented
that is, not to fragment at all, to the other extreme, to fragment to the level individual tuples (in the case of horizontal fragmentation) or to the level of individual attributes (in the case of vertical fragmentation).
The degree of fragmentation goes from one extreme
We will enforce the following three rules during fragmentation, which, together ensure that the database does not undergo semantic change during fragmentation.
Distribution Design Issues
Decomposition of relation R into fragments R1, R2, …, Rn is complete if and only if each data item in R can also be found in some Ri
Completeness
If relation R is decomposed into fragments R1, R2, …, Rn, then there should exist some relational operator ∇ such that
Reconstruction
If relation R is decomposed into fragments R1, R2, …, Rn, and data item di is in Rj , then di should not be in any other fragment Rk (k ≠ j ).
Disjointness
partitioned : each fragment resides at only one site
Non-replicated
fully replicated : each fragment at each site partially replicated : each fragment at some of the sites
Replicated
If read-only queries / update queries «_space;1,replication is advantageous, otherwise replication may cause problems
Rule of thumb
One aspect of distribution design is that too many factors contribute to an optional design. The logical organization of the database, the location of the applications, the access characteristics of the application to the database, and the properties of the computer system at each site all have an influence on distribution decisions.
Information requirements
(HF)
Horizontal Fragmentation
(PHF) It is performed using predicates that are defined on the original relation
Primary Horizontal Fragmentation
(DHF) It is the partitioning of a relation that results from predicates being defined on another relation.
Derived Horizontal Fragmentation
(VF)
Vertical Fragmentation
(HF)
Hybrid Fragmentation
The number of tuples of the relation that would be accessed by a user query which is specified according to a given minterm predicate mi
minterm selectivities: sel(mi )
The frequency with which a user application qi accesses data.
Access frequency for a minterm predicate can also be defined.
access frequencies: acc(qi )
Since Pr’ is complete and minimal, the selection predicates are complete
Completeness
If relation R is fragmented into FR = {R1,R2,…,Rr}
Reconstruction
Minterm predicates that form the basis of fragmentation should be mutually exclusive.
Disjointness
Defined on a member relation of a link according to a selection operation specified on its owner
Derived Horizontal Fragmentation
design methodology, physical clustering
Has been studied within the centralized context
Two approaches :
grouping => attributes to fragments
splitting => relation to fragments
More difficult than horizontal, because more alternatives exist
grouping
Overlapping fragments
splitting
Non-overlapping fragments