DataBases Flashcards
Definition of a database
an organised collection of
related data
Definition of DBMS
Database Management System (DBMS):
software that manages & controls access to
the database
What is a database application?
Database Application: program that interacts
with the database at some point in its
execution
What is a database system?
Database System: a collection of programs that interact with the database
Define a file based & database system and the difference between them
File-based system: A collection of application programs that perform services for the end-users. Each program defines and manages its own data.
* Database system: A shared collection of logically related data and its description, designed to meet the information needs of an organisation.
What are advantages and disadvantages of a file-based approach?
Advantages:
* User control:
― End user has total control of the stored data
― Can modify application or data at will
* Applications and data file can be optimised for particular
tasks
Disadvantages:
* Data Duplication
* Data Dependence
* Incompatible file formats
* Proliferation of application programs (APPS)
Define data duplication
A record is duplicated in > 1 files
― A change in details requires updates to all relevant files
* Integrity of records may be compromised if the files are not all updated
― i.e. Which file holds the correct information?
Define data dependence and the two types
Definition:
Data dependence: Refers to the relationship between database operations based on their reliance on each other’s results.
Types:
RAW Dependence (Read-after-Write):
Operation depends on the result of a preceding write operation.
Example: Retrieving data after it has been updated.
WAR Dependence (Write-after-Read):
Write operation depends on the result of a preceding read operation.
Example: Updating data based on a prior retrieval.
What is the proliferation of apps and name a problem that occurs with it
In time, users might want to get more information from their data files than what they originally anticipated when they initially wrote the programs
* To do so they would need to add more application programs to handle the new queries
* This leads to a proliferation of files and application programs which each user has to handle
This can be addressed via the use of DBMS
Name 5 advantages of using the DBMS
- Control of data redundancy
- Data Consistency
- Sharing of Data
- Improved data integrity
- Improved Security
- Enforcement of Standards
- Economy of Scale
- Balance of conflicting requirements
- Improved data accessibility and responsiveness
- Increased Productivity
- Improved maintenance through data independence
- Increased concurrency
- Backup and Recovery services
explain how the DBMS helps with the economy of scale
- Combining the entire organisation’s data into one database and
creating a set of applications that work on this one source
results in cost savings
― For instance, only one information system department and one set of computer systems in the organisation will be responsible for the
database as opposed to separate information systems in the different departments
― This is much cheaper than having many separate small departmental information systems units
explain how the DBMS helps with improved maintenance through data
independence
- A DBMS separates data management from the application
programs
― Hence if changes are made to the underlying data structure, the
DBMS can handle this without changing application programs - The DBMS will provide the necessary mapping between
application program and data stored on the database (using
system catalogue)
What are some disadvantages of DBMS (6 total)
- Complexity
― DBMSs are complex pieces of software and hence require those
operating and using them to fully understand them if they are to be used to the best advantage - Size
― The complexity and breadth of functionality of the DBMS makes it a large piece of software that requires a large amount of memory to run efficiently - Cost of DBMS
― DBMS cost varies depending on whether it is for a single user or a big organisation. Maintenance and operations costs also need to be factored in. For large organisations with lots of data, hardware costs (e.g. hard disks, high performance processors, memory) are
considerable - Cost of conversion
― When converting from a file-based system to a DBMS, there are data conversion costs, hardware and software procurement costs, as well as costs of training and recruiting personnel - Performance
― The DBMS is written for general use. Hence some applications may not run as fast as customised file-based applications - Greater impact of failure
― Centralisation increases system vulnerability. Failure of the DBMS may result in everyone being affected within an organisation
What are the three levels present in three level database architecture(ANSI-SPARC)
External level
Conceptual level
Internal level
Describe the External level of databases
It deals with how data is presented to and accessed by end users or applications.
Users at this level define their individual views of the database, specifying what data they need and how they want it to be displayed.
Multiple external views can exist, tailored to the specific needs and requirements of different user groups or applications.
Changes to the external level, such as modifications to views or access patterns, do not affect the underlying database structure.
Describe the conceptual level
The conceptual level represents the abstract, logical structure of the entire database.
It defines the relationships among data elements, the constraints, and the integrity rules that must be maintained.
The schema at this level provides a comprehensive, unified view of the entire database, independent of any specific user’s perspective.
Modifications at the conceptual level, such as changes to the data model or structure, impact all external views that rely on it.
The conceptual level acts as a bridge between the external and internal levels, providing a logical representation that abstracts the physical storage details.
Describe the Internal level
The internal level is the lowest layer, dealing with the physical storage and retrieval of data.
It is concerned with how data is stored, indexed, and processed at the machine level.
The internal schema defines the storage structures, access paths, and techniques used for efficient data retrieval.
Changes at this level, such as modifications to indexing or storage methods, do not affect the external or conceptual levels.
The internal level provides a level of abstraction over the physical storage details, allowing changes in storage technology without impacting the higher levels.
Define data abstraction
Hide storage details and present the users with a conceptual view of the database
Data abstraction in databases refers to the process of simplifying complex details and presenting a high-level view of data to users and applications. It involves creating abstract representations of data that hide the implementation details and complexities of the underlying database structure
Define data independence
Data independence in databases refers to the ability to modify the database schema or organization without affecting the applications that use the data.
What is the difference between physical and logical data independence
Physical data independence hides how data is stored, allowing changes to storage structure without affecting users or applications while Logical data independence shields users from changes in the database structure, permitting modifications without disrupting user interactions.
what is a database schema
Database schema: the description of the structure of the
database
Database schema: the description of the structure of the
database
What is a conceptual data model?
Identifies the high level data structure
* Independent of DBMS, application programs, physical considerations.
What is a Logical data model
- Describes the data in terms of data structures such as
― Records (nodes) and sets (edges) - e.g., Hierarchical model, network model
― Relational tables and columns - E.g., Relational model
― Object-oriented classes - E.g., Object-oriented model, object-relational model
- Independent of a particular DBMS product and storage
technology.
What is a physical data model
- Describe how data is stored in the computer, representing
record structures, record orderings, and access paths. - Highly dependent on the target DBMS
Define a relationship and what a degree of relationship means
Relationship (type): meaningful associations among
two or more entities.
* Degree of Relationship: The number of participating
entity types in a relationship e.g Binary if 2, ternary if three and Quaternary if 4
What is a recursive relationship?
Degree one: recursive
― A relationship between occurrences of the same entity type
― E.g., AcademicStaff (professor) supervises AcademicStaff (lecturer).
Define a candidate Key and a primary key
Candidate Key: The minimal number of attribute(s) whose
value(s) uniquely identify each entity occurrence
― A candidate key can not contain a null
Primary Key: The candidate key that is selected to uniquely identify
each occurrence of an entity type
* Principles of choosing the primary key:
― Attribute length
― Minimal number of attributes required
― The future certainty of uniqueness
What is a strong/weak entity type?
Weak Entity Type: existence-dependent on some other
entity
― Each entity occurrence cannot be uniquely identified using only the attributes associated with that entity type
strong is not that ig
Define multiplicity
Multiplicity: the number of possible occurrences of an
entity type that may relate to a single occurrence of an
associated entity type through a particular relationship
e.g that 1..* stuff
Representation of Multiplicity
Constraint
Meaning
0..1 Zero or one entity occurrence
1..1 (or just 1) Exactly one entity occurrence
0..* (or just ) Zero or many entity occurrences
1.. One or many entity occurrences
3..7 Minimum of 3 up to a maximum of 7entity
occurrences
1, 5, 10-12 1 or 5 or 10 to 12 entity occurrences
What is cardinality and Participation in multiplicity ?
Cardinality: the maximum values for the multiplicity ranges
on either side of the relationship.
― One-to-one (1:1)
― One-to-many (1:)
― Many-to-many (:*)
Participation: the minimum values for the multiplicity
ranges on either side of the relationship
― Optional participation (if 0)
― Mandatory participation (if 1 or more)
Which side of the relationship does participation appear on?
The participation of entities appears on the opposite side of the relationship - google if u confused it’s kinda wierd
What is a fan trap and when may it occur?
A fan trap is a type of join path between three tables when a “one-to-many” join links a table which is in turn linked by another “one-to-many” join. The fanning out effect of “one-to-many” joins can cause incorrect results to be returned when a query includes objects based on both tables.
What is a chasm trap and when may it occur?
A chasm trap is a type of join path between three tables when two “many-to-one” joins converge on a single table, and there is no context in place that separates the converging join paths.
What are 9 steps of conceptual database design
- [Step 1] Identify entity types
- [Step 2] Identify relationship types
- [Step 3] Identify and associate attributes with entity and
relationship types - [Step 4] Determine primary keys for entity types
- [Step 5] Identify the multiplicity constraints
- [Step 6] Consider use of enhanced modeling (optional)
- [Step 7] Check model for redundancy, traps
- [Step 8] Validate conceptual model against user
transactions - [Step 9] Review conceptual data model with user
Explain specialization in terms of Entities
Specialisation: The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics. e.g making more entities from one(Manager, SalesPersonnel and Secretary from Staff)
Explain generalization in terms of Entities
Generalisation: The process of minimizing the differences between entities by identifying their common characteristics.
What are some properties of relations
- A relation has a name that is distinct from all other relation
names in the database - Each cell of the relation contains exactly one atomic (single) value
- Each attribute has a distinct name
- The values of an attribute are all from the same domain
- Each tuple is distinct – there are no duplicate tuples
- The order of attributes has no significance
- The order of tuples has no significance
What is a relation schema and a relation instance
Relation schema: a named relation defined by the relation name
(table name) and attributes.
― 𝑅(𝐴1, 𝐴2, … , 𝐴𝑛)
Student (stdID, stdEmail, stdFstName, stdSecName, gender, programCode)
Primary Key: stdID, Alternate Key: stdEmail
Foreign Key: programCode references Program(programCode)
- Relation instance/state: a set of tuples from a relation schema.
― r(R)
^This is like an actual database table
Name the three Integrity constraints put on attribute values
― Entity integrity
― Referential integrity
― Domain constrain
Define Entity Integrity
― Uniqueness: No two tuples can have identical values for candidate keys.
― Not-Null: No attribute of a candidate key can be a NULL.
* Null: A value of an attribute is currently unknown or not applicable
for this tuple.
Define referential integrity
Referential integrity: If a foreign key exists in a relation,
― either the foreign key value matches a primary key value in its home relation
― or the foreign key value is NULL
Define Domain constraints
Domain Constraints: enables the RDBMS to carry out checks
on data errors and to determine the range of operations
that can be carried out on the domain
― E.g., if the attribute Age is specified to be in the range16-30, any number outside would give an error
― E.g., If the attribute Name is to hold a set of characters, then any
multiplication or division operations on the attribute values are illegal
Define views
View: a subset of base relation(s)
Base relation: A named relation
corresponding to an entity in the conceptual
schema, whose tuples are physically stored
in the database
Name some features of views
Virtual/derived relation. It does not necessarily exist in the
database, but can be produced upon request.
* Dynamic. The changes made to the base relation(s) that affect the
view are immediately reflected in the view.
* Generated by applying appropriate relational operations (e.g., 𝜎,
𝜋, x, ⋈, ….)
What is the purpose of views in databases
- Security mechanism
- Customize user’s need
- Simplify complex operations on the base relations
How are composite attributes represented in relation schemas
For composite attributes, include only the constituent simple attributes.
So for example staffName which is a composite attribute containing staffFName and staffLName would be ignored and the relation schema would look like this:
AcademicStaff(staffID, staffFName,staffLname, academicPosition, gender,
DOB)
What is Normalisation and what is the purpose of it?
Normalization: A technique for producing a set of relations with
desired properties.
― remove redundancy
― remove potential for insertion, modification, deletion anomalies
If A and B are attributes of relation R, when is B functionally dependent on A? Out of A and B which one is the determinant
if each value of A is associated with exactly one value of B, this is denoted A -> B
In this case A is the determinant
If A -> B and B -> C where is the transitive dependency
C is transitively dependent on A via B … A -> C
Given A -> B what would make this a partial functional dependency?
if by removing some attribute(s) from A, the dependency still
holds.
Given A -> B what would make this a full functional dependency
B is not functionally dependent on any subset of A.
What should be done to a database to put it into first normal form?
Repeating groups must be removed, a primary key must exist and duplicate rows must be removed. Additionaly there must be no order dependency
What should be done to a database to put it into second normal form?
All partial dependencies must be removed
Define a partial dependency
Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.
What should be done to a database to put it into third normal form
All transitive dependencies must be removed
Define a transitive dependency
A transitive dependency refers to some non-prime attribute other than the candidate key that depends on another non-prime attribute that is dependent entirely on the candidate key.
^this is hard to remember but it’s not that hard to actually do
Show_ID Telecast_ID Telecast_Type CD_Cost ($)
F08 S09 Thriller 50
F03 S05 Romantic 30
F05 S09 Comedy 20
In this case Telecast_Type needs to be removed cos it depends on telecast ID which in turn depends on Show ID
What needs to occur for a database to be in Boyce-Codd Normal Form(BCNF)
Every determinant must be a candidate key
or
Every non-prime attribute must be functionally dependent on the entire primary key
What are the Unary operations that are represented by these symbols 𝜋, 𝜎, 𝜌
― Projection (𝜋 )
― Selection (𝜎)
― Rename (𝜌)
What are the set operations that are represented by these symbols
∪, ∩, −, X
― Union (∪)
― Intersection (∩)
― Set difference (−)
― Cartesian Product (X)
What are the Binary operations represented by these symbols
⋈, (⋉, ⋊), ÷
― Join (⋈)
― Outer Join (⋉, ⋊)
― Division (÷)
How is the projection operator used?𝜋
𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>(𝑹): Produce a new relation that has only some of
𝑅‘s columns</𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>
e.g
𝝅ID, Name, Salary(Lecturer)
Removes duplicate tuples
how is the selection operator used? 𝜎
𝝈<𝒔𝒆𝒍𝒆𝒄𝒕>(𝑹): Produce a new relation that contains only
those tuples of R that satisfy the condition.</𝒔𝒆𝒍𝒆𝒄𝒕>
e.g
𝝈𝒅𝒆𝒑𝒕𝑰𝑫=′𝟎𝟐′∧ 𝒂𝒈𝒆>𝟓𝟎(𝑳𝒆𝒄𝒕𝒖𝒓𝒆𝒓) or 𝝈barName=’Sues’(Sells)
How is the rename operator used? 𝜌
𝝆𝑺 𝑬 : Rename the expression E to S
* 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S,
and also re-names the attributes as 𝑎1, 𝑎2, … , 𝑎𝑛.
e.g
Q: List all the professors, rename the output relation as Professor:
𝜌𝑃𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟(𝜎𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=′𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟′(𝐿𝑒𝑐𝑡𝑢𝑟𝑒𝑟))
How is the union operator used? ∪
Will take the union of both tables (R U S ), both tables must have the same relation schema for this to work
How is the intersection operator used? ∩
Will take the intersections of both tables (R ∩ S ), both tables must have the same relation schema for this to work
How is the set difference operator used? -
Will take the difference (R - S), records that exist in both R and S are remove from the R table and then that’s outputted. Both tables must have the same relation schema for this to work
𝑅 − 𝑆 ≠ 𝑆 − 𝑅
How is the Cartesian product operator used?
This one is kinda weird
* 𝑹 × 𝑺: Produces a relation that is the
concatenation of every tuple of relation 𝑅 with
every tuple of relation 𝑆.
―If both 𝑅 and 𝑆 have the same attribute 𝐴, then use: 𝑅. 𝐴
and 𝑆. 𝐴
Basically just stick S on the back of R and if they have the same attribute then call it R.attribute and S.attribute. Like for example Bar.barName and Sells.barName
What are all the join operations
― Theta join (Θ-join)
― Equijoin
― Natural join
― Outer join
Explain how theta (Θ) join works
- 𝑹 ⋈<𝒋𝒐𝒊𝒏> 𝑺: Defines a relation that contains
all combinations of tuples from 𝑅 and 𝑆 that satisfy the
join condition</𝒋𝒐𝒊𝒏> - 𝑅 ⋈<𝑗𝑜𝑖𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛>(𝑅 × 𝑆)</𝑗𝑜𝑖𝑛></𝑗𝑜𝑖𝑛>
Explain how Equijoin works
- Equijoin: Produces all the combinations of tuples
from that satisfy a join condition with only
equality comparisons.
―Equijoin is a special case of theta-join.
Explain how natural join works
- 𝑹 ⋈ 𝑺: an equijoin of the two relations R and S
over all common attributes.
What is SQL used for
- SQL is a standard language for interacting with a relational
database
What does Data Manipulation Language do and what are the statements used in it?
It retrieves and updates statements:
-Select
-Insert
-Update
-Delete
What does Data Definition Language do and what are the statements used in it?
Define DB structure & control access:
― CREATE
― ALTER
― DROP
― GRANT
― REVOKE
What are the 5 basic select conditions?
- Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT
- Range: BETWEEN … AND, NOT BETWEEN … AND
- Set membership: IN, NOT IN
- Pattern match: LIKE, NOT LIKE
- IS NULL, IS NOT NULL
What are the 5 Aggregate functions?>
- COUNT
- MIN
- MAX
- SUM
- AVG
What does a Select - Between statement look like?
SELECT *
FROM Products
WHERE Price BETWEEN 10 AND 15;
-Note you can add a not in front of between if needed
What does a Select - In statement look like
SELECT *
FROM Customers
WHERE Country IN (‘Germany’, ‘France’, ‘UK’);
-Note you can add a not before IN for the opposite result
What does a Select - Like statement look like?
SELECT *
FROM Customers
WHERE CustomerName LIKE ‘a%’;
What are the two types of wildcards in a select Like statement and what do they do?
% – represents zero, one, or multiple characters
_ – represents a single character