Exam 1 Flashcards
Database
Interrelated collection of data
Why databases
Easy for software engineers to manage datasets.
They are efficient and robust
DBMS
Application that facilities inserting, deleting, updating, and querying data within a database.
Why DBMS
Users don’t need to know how data is physically stored. They only need to see simple views of the data.
Keeps data in a good state
Efficient
DML
Data manipulation language
Either declarative or procedural
SQL = declarative
relational algebra = procedural
DDL
Data Definition Language
provides the ability to define the structure (schema) of data
Identify data types, provide data constraints, specify referential integrity, make assertions, define auth levels
SQL
Structured Query Language
select
filters tuples from a relation
σ (predicate) Relation
project
filters columns from a relation
π (attr1, attr2, …) Relation
join
Creates pairs from like attributes
Relation ⨝ (attr) Relation
Union
found in either
relation U relation
query1 union query2
add ALL to keep all duplicates
query1 UNION ALL query2
Intersection
Found in both
Relation ∩ Relation
query1 intersect query2
Set Difference
provides set based filtering
A - B =
set that contains those elements of A that are NOT in B
query1 except query2
And
this sign ^
AND in sql
Or
V
schema
defines how records can be described by identifying attributes and their primary keys
ex) Course_offering(ID, name, credits)
Relation Instance
Specific instance of such a table
A singular view of it
Attribute Domain
Describes the possible range of values an attribute can take
ex) student year {fr, so, jr, sr}
Super Key
Set of attributes that uniquely identify tuples within a relation.
No 2 tuples in the relation may have the same value for the key
Primary Key
Key chosen by the DB designer as the principal means of uniquely identifying tuples within a relation.
No 2 tuples in the relation may have the same value for the key
Can’t have 2 students whose ids are the same
Foreign Key
A relation may include the primary key of another relation. Foreign keys are used to reference the other relation
May have 0, 1, or Many
Functional Dependencies - what
An understanding that you can determine the values for one set of attributes from another
ex) an employees SSN will allow us to determine their salary
Functional Dependencies - why
Whenever a change is made, we must check all the functional dependencies that exist to determine if we should force the modification to fail.
Must preserve the functional dependencies at all cost!
TLDR
-Data integrity (avoid anomalies)
-Normalization (condensing tables into smaller, more efficient tables)
-efficient database design (avoid redundancy)
Query Types
Select, Insert, Delete, Update
select
Basic Select Query Structure
SELECT ___ FROM ___ WHERE ___
Insert
Insert into tableName (attr1, attr2, attr3) values (“value1”, “value2”, value3);
Delete
DELETE FROM tableName WHERE attr1 = value1 or attr2 > value2;
DELETE FROM tableName; will delete all rows in tableName
Update
UPDATE tableName SET attr1=value1 WHERE attr2>value2
without a where clause all records are updated
UPDATE employees SET salary=salary*1.04
give every employee a 4% raise
where
Identify any desired restrictions
group by & having
group by allows you to preform aggregation on groups of tuples that have similar values
ex)
select dept_name, count(*) as numStudents
from student group by dept_name
having numStudents > 2;
predicates cam be applied to the results of group by using HAVING
aggregation
avg()
min()
max()
sum()
count()
order by
ASC or DESC
ex)
select name, tot_cred FROM student
where dept_name = “Comp. Sci.”
order by tot_cred ASC;
Limit
add LIMIT (int) at the end of a query to limit the number of rows returned
From
Identify the tables the data comes from
SQL Injection - Risk?
bad things happen to good people
“bad” or malicious input is executed as SQL
leads to unauthorized access or manipulation of data/database
ex)
select userID from users where username = “ “ and password = “ “;
username = “admin”
password = ???” or “1”=”1
SQL Injection - Fix?
Input sanitization
prepared statements
Pattern Matching
% matches any substring
“%science” matches “computer science”
_ matches any character
“Comp. _ _ _.” matches “Comp. Sci.” and “Comp. Eng.”
Must use LIKE operator
select * from course where title LIKE “Intro%”;
Joins
FROM Relation1, Relation2, Relation…
without specifying a predicate or columns to join/match on the result is a cross product
FROM instructor join teaches on instructor.id = teaches.id
FROM instructor NATURAL JOIN teaches
Entities
A thing or object that is distinguishable from other things or objects
Described by a set of attributes
Relationships
An association among several entities
Degrees of relationships:
Binary and Tertiary
Tertiary exist but are more rare
Cardinality of Relationships
one to one
one to many
many to one
many to many
participation constraints
Either total or Partial
Total participation
Entity must participate in the relationship
ex) Students must be advised by an instructor
double lines
Partial participation
Entity may participate in the relationship
ex) Instructors may advise students
single line
attribute types
simple and composite
single valued & multivalued
derived attributes can be determined from other attributes
What is the eviction policy for most buffer managers/pools?
Least Commonly
Used
Evict the page in the buffer pool that was used the longest ago
What is a buffer manager? (Chapter 13)
A buffer manager implements the logic that determines which blocks should be added or evicted from the buffer pool.
What is a buffer? (Chapter 13)
A space we allocate in main memory to keep copies of often fetched blooks.
What are the three parts of a slotted page structure? (Chapter 13)
(Variable length record)
Block headers, free space, and records.
What is Variable Length Record? (Chapter 13)
A length record where offsets are stored at the beginning of a record, and values at the end. An optional null bitmap can be used to represent null values.
What is Fixed Length Record? (Chapter 13)
All records take up the same space on storage. We know how many fit in a block, gives us random access, and faster lookups.
What are blocks and pages? (Chapter 13)
Data storage structures
data read/written from/to disk are stored in blocks
data read/written from/to RAM are in the form of pages
Block size is specified by the file system
page size is specified by the OS
both typically around 4KB
What is RAID 1? (Chapter 12)
Mirroring
Data is copied onto 2 disks, making it so if one of the disks fails, you can still access your data.
for data redundancy
cuts your amount of storage in half
What is RAID 0? (Chapter 12)
striping
Data is split up into two disks, alternating which disk information is put in.
for better performance
still get access to your total amount of storage
What does RAID do? (Chapter 12)
Splits up data/files into different disks, usually 2.
What does RAID stand for? (Chapter 12)
Redundant Array of Inexpensive Disks.
In determining access time for magnetic disks, how long does rotational latency take on average? (Chapter 12)
1/2 the worst-case rotational latency.
Usually 4-11 milliseconds.
In determining access time for magnetic disks, how long does seek time take on average? (Chapter 12)
1/2 the worst-case seek time.
Usually 5-10 milliseconds.
In determining access time for magnetic disks, what happens before anything else? (Chapter 12)
The spin up
access time =
seek time + rotational latency
What are sectors in magnetic disks? (Chapter 12)
The smallest unit of data that can be read or written. Size is usually 512 bytes.
How many circular tracks are platters divided into? (Chapter 12)
50k - 100k
How do magnetic disks read/write data? (Chapter 12)
Using a read/write head that sits very close to the platter
reads/writes magnetically encoded information.
Only one head per platter that is mounted on a common arm
What is the storage hierarchy, from the top down? (Chapter 12)
Cache, main memory, flash memory, magnetic disk, optical disks, magnetic tapes.
Fastest/Most expensive at top
Slowest/Cheap at the bottom
What are the 4 classifications of physical data storage? (Chapter 12)
Speed, cost per byte, reliability, and volatility
What does DDL look like?
CREATE TABLE student (
name varchar(100),
id numeric(6,0),
dept_name varchar(30),
tot_cred numeric(3,0).
primary key (tot_cred),
foreign key (dept_name) references department(department_name)
);
What should you produce schema for based on an ER diagram? (Chapter 6)
Produce one based on each entity set
strong relation (many-to-many) set, and weak entity/relation set.
Attributes generally respond to columns in a relation model. Remember your primary keys as well!
Strong entity sets reduces to a table whose columns correspond with the same attributes
Many to many relationship sets are represented by a table with attributes for the primary keys of the participating entity sets
What does schema for the ER model look like? (Chapter 6)
Using the university database:
Student(ID, name, dept_name, tot_cred)
Pretend the ID is underlined.
What are the two main benefits of DBMSs? (Chapter 1)
Physical data independence and data abstraction.
What is an Entity-Relationship model? (Chapter 6)
A form of the relational model that utilizes geometrical representations of entities and relationships.
What are the three basic concepts in the ER data model? (Chapter 6)
Entity sets, relationship sets, and attributes.
What are entity sets represented as? (Chapter 6)
Rectangles that contain a list of attributes. Primary keys are underlined!
What are relationship sets represented as? (Chapter 6)
Diamonds with lines connecting two entity sets.
What is one-to-one cardinality represented as? (Chapter 6)
Two arrows on either side of the diamond.
What is one-to-many/many-to-one represented as? (Chapter 6)
One arrow and one undirected line.
What is many-to-many represented as? (Chapter 6)
Two undirected lines.
Free Lists
use the header of a block to refer to the next empty/deleted record space
a empty/deleted space points to the next empty space
first option of variable length records
offsets are stored at the beginning of the record
values come last
0000 null bitmap can be used to represent null values for any attribute
Data transfer rate for mangnetic disks
25 - 100MB
Lower for inner tracks