Lecture 10 - Intro To DBs Flashcards
What is DBMS
- Definition: Software that interacts with the user, applications, and the database itself to capture and analyze data.
- Functions: Data storage, retrieval, and updating; ensuring data integrity and security; managing concurrency and transactions.
Advantages of Using a DBMS
Provides efficient, reliable, convenient and safe multi-user storage of and access to massive amounts of persistent data.
- massive: terabytes
- persistent
- safe: hardware, software, power
- multi-user: concurrency control
- convenient: physical date independence; high level query languages (declarative)
- efficient: thousands of queries / updates per second
- reliable
Data Models
- Defines the logical structure of a database, including tables, relationships, a set of records, XML, graph
Data Definition Language (DDL)
- used to define and set up the schema of a database. It includes commands like
CREATE
,ALTER
, andDROP
to create or modify database structures such as tables, indexes, and constraints.
Data Manipulation or Query Language (DDL)
- used to query and manipulate data within the database. It includes commands like
SELECT
,INSERT
,UPDATE
, andDELETE
to retrieve and modify data.
Normalisation
Process of organizing data to minimize redundancy.
ACID Properties
-
ACID Properties: Ensures reliable processing of database transactions:
- Atomicity: Transactions are all-or-nothing.
- Consistency: Transactions bring the database from one valid state to another.
- Isolation: Transactions do not interfere with each other.
- Durability: Once a transaction is committed, it remains so.
DBMS Implementer
Develops and maintains the database management system software.
Database Designer
Designs the the database schema and structure to meet organisational needs
Database Application Developer
Builds applications that interact with and operate on the database for data manipulation and retrieval.
Database Administrator (DBA)
Manages the operation, performance, and security of the database. Loads data and keeps the database running smoothly.
Relations (Tables) - RM
- A relation is a table in the database that contains data about a specific entity, such as customers, orders, or products.
- Each relation consists of a set of rows (also called tuples) and columns (also called attributes).
Attributes (Columns) - RM
- Attributes are the columns in a table that define the properties or characteristics of the entity. For example, in a “Customers” table, attributes might include
CustomerID
,Name
, andEmail
. - Each attribute has a specific data type, such as integer, string, or date, which determines the kind of data it can hold.
Tuples (Rows) - RM
- Tuples are the individual records or rows in a relation. Each tuple represents a single instance of the entity described by the table.
- For example, a tuple in the “Customers” table might contain the data for one specific customer.
Type - RM
- The type of an attribute defines the nature of the data it can store. Common types include
INTEGER
,VARCHAR
,DATE
, andBOOLEAN
. - The type is crucial for ensuring that the data in each attribute is consistent and valid.
Schema - RM
- The schema of a relation defines its structure, including the names of the attributes, their types, and any constraints.
- The database schema is a blueprint for how the data is organized and includes all the tables and their relationships within the database.
Instance - RM
- An instance of a relation is the actual data contained in the table at a specific point in time. It’s the current set of tuples in the relation.
- While the schema remains constant, the instance can change as data is added, updated, or deleted.
Null - RM
- Null is a special marker in the rlational model used to represent missing or unknown data.
- A null value in an attribute indicates that the value is either not applicable or not known.
- e.g. GPA > 3.5 or GPA ≤ 3.5
Key - RM
-
Primary Key: A unique identifier for each tuple in a relation. It ensures that no two rows have the same primary key value. For example,
CustomerID
could be the primary key in the “Customers” table. -
Foreign Key: An attribute in one relation that links to the primary key of another relation, establishing a relationship between the two tables. For example, an
OrderID
in an “Orders” table might refer to aCustomerID
in a “Customers” table. - Candidate Key: An attribute, or a set of attributes, that can uniquely identify a tuple in a relation. A relation can have multiple candidate keys, but one of them is chosen as the primary key.
Steps in Creating and Using a Relational DB
-
Design Schema; Create Using DDL:
- Design Schema: Define the structure of the database by specifying the tables, columns, data types, and relationships between tables.
-
DDL (Data Definition Language): Use DDL commands like
CREATE
,ALTER
, andDROP
to create and modify the schema in the database.
-
“Bulk Load” Initial Data:
- After the schema is defined, load the initial set of data into the database. This is often done in bulk using SQL commands like
INSERT
or tools that support batch loading.
- After the schema is defined, load the initial set of data into the database. This is often done in bulk using SQL commands like
-
Repeat: Execute Queries and Modifications:
-
Execute Queries: Run SQL queries to retrieve and manipulate data. This includes
SELECT
statements for data retrieval. -
Modifications: Continuously update, delete, or insert new data into the database as needed. Use DML (Data Manipulation Language) commands like
INSERT
,UPDATE
, andDELETE
for these operations. - Repeat Process: As the database is used, the cycle of querying and modifying data is repeated.
-
Execute Queries: Run SQL queries to retrieve and manipulate data. This includes
Ad-Hoc Queries
- Definition: These are specific, often one-time queries written in a high-level language to retrieve or analyze data based on particular conditions.
-
Examples:
- All students with GPA > 3.7 applying to Stanford and MIT only: A query to find high-performing students applying exclusively to top universities.
- All engineering departments in CA with < 500 applicants: A query to identify engineering programs in California with relatively low application numbers.
- College with highest average accept rate over last 5 years: A query to determine which college has had the most successful acceptance rate recently.
Queries Return Relations
- Compositional: Queries are compositional, meaning the result of one query can be used as the basis for another query. This is a fundamental property of relational databases.
- Closed: The concept of “closed” means that the result of a query is itself a relation (a table), which can be further queried or manipulated.
SQL (Structured Query Language)
- Actual/Implemented: SQL is the practical implementation of the concepts of relational algebra. It is used to interact with databases, perform queries, and manage data.
-
Example:```sql
SELECT Student.ID
FROM Student, Apply
WHERE Student.ID = Apply.ID
AND GPA > 3.7 AND college = ‘Stanford’;```- This SQL query retrieves the
ID
of students who have a GPA greater than 3.7 and are applying to Stanford.
- This SQL query retrieves the