ch7 Flashcards
Explain the different meanings of NULL with example. Why does SQL not allow to compare an attribute value to NULL using = and <> operators?
different meanings of NULL:
(* SQL doesn’t distinguish btwn these 3 meanings because it’s often not possible to determine which of the 3 meanings applies )
- Unknown value:
* AKA value unknown
* value exists but is not known, or it is not known whether or not the value exists
* * ex1: A person’s DOB is not known, so it is represented by NULL in the DB.
* * ex2: A person’s home phone is represented by NULL in the DB because we don’t know if they even have a home phone. - Unavailable or withheld value:
* AKA value not available
* value exists but it is purposely withheld
* * ex: A person has a home phone but does not want it to be listed, so it is withheld and represented as NULL in the DB. - Not applicable attribute:
* AKA value not applicable
* the attribute does not apply to this tuple or is undefined for this tuple
* * ex: An attribute LastCollegeDegree is NULL in the DB for a person who does not have any college degrees.
In addition to the IN operator, which comparison operators can be used to compare a single value v to a set or multiset V?
IN: compares a value v with a set or multiset of values V and evaluates to TRUE if v is one of the elements in V.
ex:
SELECT last_name, first_name, ssn
FROM Employee
WHERE p_no IN (
SELECT num FROM Project WHERE department=’Math’);
In addition to the IN operator, a number of other comparison operators can be used to compare a single value v (typically an attribute name) to a set or multiset v (typically a nested query).
The = ANY (or = SOME) operator returns True if the value v is equal to some value in the set V and is hence equivalent to IN.
The two keywords ANY and SOME have the same effect.
Other operators that can be combined with ANY and SOME include: >, >=, .
ex: #get the names of employees in New York whose salary is larger than at least one employee’s salary in San Francisco
SELECT last_name, first_name
FROM employee
WHERE
current_city=’New York’
AND salary > ANY (
SELECT salary FROM employee WHERE current_city=’San Francisco’);
The keyword ALL can also be combined with each of these attributes.
ex: #get the names of employees in New York whose salary is larger than all of the employee[‘]s[’] in San Francisco salaries
SELECT last_name, first_name
FROM employee
WHERE
current_city=’New York’
AND salary > ALL (
SELECT salary FROM employee WHERE current_city=’San Francisco’);
[nested queries: complete SELECT-FROM-WHERE blocks within another SQL query
- outer query=”the other query”
- can also appear in the SELECT clause or FROM clause or WHERE clause or other clauses of other query]
[If nested query returns a single attribute AND a single tuple, the query result will be a single (AKA scalar) value.]
In such cases, it is permissible to use = instead of IN for the comparison operator.
[In general, the nested query will return a table, which is a multiset of tuples.]
What is the use of the EXISTS function in SQL?
EXISTS(Q) is used to check whether the result of a nested query is empty (contains no tuples) or not.
- value is TRUE if the nested query result Q contains at least one tuple
- value is FALSE if nested query result Q contains no tuples
- typically used in conjunction with a correlated nested query
NOT EXISTS(Q):
- value is TRUE if there are no tuples in the result of nested query Q
- value is FALSE otherwise [if there is at least one tuple in the result of nested query Q]
[EXISTS, NOT EXISTS, and UNIQUE are boolean functions in SQL; can be used in a WHERE clause condition]
[UNIQUE(Q): can be used to test whether the result of a nested query is a set (no duplicates) or if it is a multiset (duplicates exist)
- returns TRUE if there are no duplicate tuples in the result of query Q;
- otherwise returns FALSE]
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
a. Nested queries
nested queries: complete SELECT-FROM-WHERE blocks within another SQL query
- outer query=”the other query”
- can also appear in the SELECT clause or FROM clause or WHERE clause or other clauses of other query
USEFUL FOR:
Some queries require that existing tuples in the database be fetched and then used in a comparison condition.
In general, the nested query will return a table, which is a multiset of tuples.
can have several levels of nested queries
- can face ambiguity among attr names if there are any dups among attr names in relations in the FROM clause of outer query and in relations in the FROM clause of the nested query
- rule: a reference to an unqualified attr refers to the relation declared in the innermost nested query
- generally advisable to create aliases (AKA tuple variables) for ALL THE TABLES REFERENCED IN AN SQL QUERY to avoid potential errors and ambiguities
HOW ITS USED, OPTIONS: can use operators: IN [[COMP OP]] SOME [[COMP OP]] ANY (same as SOME) [[COMP OP]] ALL {COMP OPs: =, >, >=, } = *if the nested query returns a single attribute AND a single tuple; query result is a single(AKA scalar) value*
ex: SELECT E.first_name, E.last_name, E.ssn, P.num FROM Employee E WHERE E.p_no IN ( SELECT P.num FROM Project P WHERE P.department='math');
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
b. Joined tables and outer joins
USEFUL FOR:
The concept of a joined table (AKA joined relation) was incorporated into SQL to permit users to specify a table resulting from a join operation in the FROM clause of a query. This construct may be easier to comprehend than mixing together all the select and join conditions in the WHERE clause.
HOW ITS USED, OPTIONS:
- JOIN (AKA INNER JOIN): result only includes tuples from the relations that satisfy the join condition; default
- NATURAL JOIN: no join condition is specified; an implicit EQUIJOIN for each pair of attributes with the same name in both relations is created
- may need to rename attrs with AS if the names don’t match
- OUTER JOIN: result includes all tuples from one or both of the relations, even if they don’t satisfy the join condition
- LEFT OUTER JOIN, AKA LEFT JOIN
- ** some systems [Oracle]: +=
- RIGHT OUTER JOIN, AKA RIGHT JOIN
- ** some systems [Oracle]: =+
- FULL OUTER JOIN, AKA FULL JOIN
- ** some systems [Oracle]: +=+
- CROSS JOIN: CARTESIAN PRODUCT
- can nest join ops: one of the tables in the join may be a joined table itself
- multiway join: a join of 3 or more tables
ex:
SELECT Fname, Lname, Address
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=’Research’);
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
c. Aggregate functions and grouping
AGGREGATE FUNCTIONS
AGGREGATE FUNCTIONS
USEFUL FOR:
- Aggregate functions are used to summarize information from multiple tuples into a single-tuple summary
- Grouping is used to create subgroups of tuples before summarization.
- Grouping and aggregation are required in many DB apps
HOW ITS USED, OPTIONS:
- built-in aggregate functions:
- COUNT: returns the number of tuples or values as specified in a query
- SUM: can be applied to a set or multiset of numeric values to return the sum of the values
- MIN: can be applied to a set or multiset of numeric values to return the minimum value; can also be used among non-numeric domains if the domain values have a total ordering [for any 2 values in the domain it can be determined that one occurs before the other in the defined order] among one another
- MAX: can be applied to a set or multiset of numeric values to return the maximum value; can also be used among non-numeric domains if the domain values have a total ordering [for any 2 values in the domain it can be determined that one occurs before the other in the defined order] among one another
- AVG: can be applied to a set or multiset of numeric values to return the average of the values
- SQL also has aggregate functions that can be applied to a collection of Boolean values:
- SOME: returns TRUE if at least 1 element in the collection is TRUE
- ALL: returns TRUE if all elements in the collection are TRUE
- can be used in SELECT or HAVING clause
- in general: NULL values are discarded when aggregate functions are applied to a particular column
- “The only exception is for COUNT(*) because tuples instead of values are counted.”
- general rule: when an aggregate function is applied to a collection of values, NULLs are removed from the collection before the calculation; if the collection becomes empty because all values are NULL, the aggregate function will return NULL (except for COUNT, which returns 0)
- return a single-row summary of the whole relation or a subset of tuples
- hence produces a table with a single row or a single value
- count the number of rows in the result of a query: COUNT(*)
- count values in a column:
- COUNT(DISTINCT Salary): dupes eliminated
- COUNT(Salary): dupes not eliminated
- both cases: tuples with Salary==NULL are not counted
employee: ssn, email, salary (1111223333, 'user1@gt.edu', 50000), (2222334444, 'user2@gt.edu', 55000), (3333445555, 'user3@gt.edu', 55000), (4294967295, 'user4@gt.edu', 60000), (1111224444, 'user5@gt.edu', NULL), (1111225555, 'user6@gt.edu', 50000), (1111226666, 'user7@gt.edu', NULL), (1111227777, 'user8@gt.edu', 50000), (1111228888, 'user9@gt.edu', 50000)
* ex1: #get the total number of employees SELECT COUNT(*) FROM employee; #9
* ex2: #get a count of ALL salary values in employee SELECT COUNT(salary) FROM employee; #7
* ex3: #get a count of the distinct salary values in employee SELECT COUNT(DISTINCT salary) FROM employee; #3
ex4: #get the the average of all employees' salaries, the sum of all employees' salaries, the min salary, the max salary SELECT AVG(salary), SUM(salary), MIN(salary), MAX(salary) FROM employee; #52857.1429, 370000, 50000, 60000
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
d. Triggers
USEFUL FOR:
- can be used to specify automatic actions that the DB system will perform when certain events and conditions occur
- [this type of functionality is generally referred to as active databases)
- in many cases it is convenient to specify the type of action to be taken when certain events occur and when certain conditions are satisfied
- ex: specify a condition that if violated causes some user to be informed of the violation
- [useful for] monitoring the DB
HOW ITS USED, OPTIONS:
CREATE TRIGGER
BEFORE{or}AFTER
WHEN
ex: #creates a trigger called SALARY_VIOLATION; before inserting a new EMPLOYEE record or updating an EMPLOYEE’s SALARY or updating an EMPLOYEE’s SUPERVISOR, if the employee’s new salary is higher than the employee’s supervisor’s salary, then invoke the sproc INFORM_SUPERVISOR
CREATE TRIGGER SALARY_VIOLATION
BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN ON EMPLOYEE
FOR EACH ROW
WHEN (NEW.SALARY > (SELECT SALARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN) ) INFORM_SUPERVISOR(NEW.Supervisor_ssn, NEW.Ssn);
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
e. Assertions and how they differ from triggers
USEFUL FOR:
* can be used to specify additional types of constraints that are outside the scope of the built-in relational model constraints (primary and unique keys, entity integrity, and referential integrity)
VS TRIGGERS:
- ASSERTIONS specify additional constraints outside scope of relational model
- TRIGGERS specify actions to take when certain conditions occur
HOW ITS USED, OPTIONS:
CREATE ASSERTION: can specify general constraints via declarative assertions
* CREATE ASSERTION
CHECK ();
** assertion condition must hold true on every DB state in order for the assertion to be satisfied
** any WHERE clause condition can be used
** many constraints can be specified using the EXISTS and NOT EXISTS style of SQL conditions
* DBMS is responsible for making sure that condition is not violated
* basic technique for writing assertions:
1. specify a query that selects any tuples that violate the desired condition
2. put the query inside a NOT EXISTS clause
** the assertion will specify that the result of this query must be empty so that the condition will always be true
CHECK clause and constraint condition can also be used to specify constraints on individual attrs and domains and on individual tuples
CREATE ASSERTION vs individual domain constraints vs tuple constaints:
- CHECK clause on individual attrs, domains, and tuples are checked only when tuples are inserted or updated in a specific table
- should be used only when the constraint can only be violated by insertion or updating of tuples
- CREATE ASSERTION is checked ???? for every op??? no idea, stupid book.
- should be used only when it is not possible to use CHECK on attrs, domains, or tuples
ex: #the salary of an employee must not be greater than the salary of the manager of the department that the employee works for
CREATE ASSERTION SALARY_CONSTRAINT
CHECK (NOT EXISTS (
SELECT *
FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D
WHERE E.Salary>M.Salary AND E.Dno=D.Dnumber AND D.Mgr_ssn=M.Ssn));
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
f. The SQL WITH clause
USEFUL FOR:
- allows a user to define a table that will only be used in a particular query
- once the query is executed, the temp table (that was declared with WITH) is discarded
- can usually be written using other SQL constructs
HOW ITS USED, OPTIONS:
WITH AS ()
>;
ex: WITH BIGDEPTS (Dno) AS ( SELECT Dno FROM EMPLOYEE GROUP BY Dno HAVING COUNT(*) > 5) SELECT Dno, COUNT(*) FROM EMPLOYEE WHERE Salary > 40000 AND Dno IN BIGDEPTS GROUP BY Dno;
same as: SELECT Dno, COUNT(*) FROM EMPLOYEE WHERE Salary > 40000 AND Dno IN (SELECT Dno FROM EMPLOYEE GROUP BY Dno HAVING COUNT(*) > 5) GROUP BY Dno;
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
g. SQL CASE construct
USEFUL FOR:
- can be used when a value can be different based on certain conditions
- can be used in any part of a SQL query where a value is expected, including when querying/inserting/updating tuples
- can also be used when inserting tuples that can have different attributes being NULL depending on the type of record being inserted into a table
- ex: when a specialization is mapped into a single table
- ex: or when a union type is mapped into relations
HOW ITS USED, OPTIONS: CASE WHEN THEN WHEN THEN WHEN THEN ELSE ;
ex: UPDATE EMPLOYEE SET Salary = CASE WHEN Dno=5 THEN Salary + 2000 WHEN Dno=4 THEN Salary + 1500 WHEN Dno=1 THEN Salary + 3000 ELSE Salary + 0;
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
h. Views and their updatability
view (AKA virtual table): a single table that is derived from other tables
- a view does not necessarily exist in physical form
- limits possible update ops that can be applied to views but doesnt limit querying a view
USEFUL FOR:
- when we need to reference a table frequently, even though it may not exist physically
- when we need refer to the result of 3 joins; can define a view that is specified as the result of the 3 joins, then issue queries on the view as single-table retrievals rather than retrievals involving 2 joins on 3 tables
- one of the main advantages of a view is to simplify the specification of certain queries
- can also be used as a security auth mechanism
UPDATABILITY:
- a view is supposed to be always up-to-date
- if we modify the tuples in the base tables on which the view is defined, the view must automatically reflect these changes;
- the view does not have to be realized/materialized/physically stored at the time of view definition, but rather at the time when we query the view
- ** responsibility of DBMS to make sure that a view is kept up-to-date
- two main strategies for keepign a view up-to-date:
- query modification: involves modifying or transforming the view query into a query on the underlying bas tables
- ** inefficient for views defined via complex queries that are time consuming to executite, especially if multipel view queries are going to be applied to the same view within a short period of time
- view materialization: involves physically creating or storing a temp or permenant view table when the view is first queried or created and keeping that table on the assumption that other queries in the view will follow
- ** needs an efficient strategy to auto update the view table when the base tables are updated:
- *** incremental update: let hte DBMS determine when new tuples must be inserted/deleted/modified in a materialized view table when a DB update is applied to one of the defining tables
- *** immediate update: update view as soon as base tables are changed
- *** lazy update: update the view when needed by a view query
- *** periodic update: update the view periodically (so in this case a view query may get a result that is not really up to date)
HOW ITS USED, OPTIONS:
CREATE VIEW ()
AS ;
ex: CREATE VIEW DEPT_INFO(Dept_name, No_of_emps, Total_sal) AS SELECT Dname, COUNT(*), SUM(Salary) FROM DEPARTMENT, EMPLOYEE WHERE Dnumber=Dno GROUP BY Dname;
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
i Schema change commands
USEFUL FOR:
- adding or dropping tables, attributes, constraints, and other schema elements
- can be done while DB is operational and doesnt require recompilation of the DB schema
- certain checks can be done by DBMS to ensure that changes do not effect the rest of the DB and make it inconsistent
HOW ITS USED, OPTIONS:
- DROP: drop (delete) named schema elements
- 2 drop behavior options:
- ** CASCADE: remove named element and all of its contained elements (ex: remove schema and all of its tables, domains, etc)
- ** RESTRICT: named element is dropped only if it has no elements in it; otherwise the DROP command is not executed
- DROP TABLE deletes all the records in the table and removes the table definition from the catalog
- ** if you just want to delete all the records from the table use DELETE instead
- ALTER: changes the definiton of a base table or of other named schema elements
- ALTER TABLE possible actions: add or drop a column, change a column definition, addor drop table constraints
- ** to drop a column, need to choose CASCADE or RESTRICT for the drop behavior
- *** CASCADE: all constraints and views that ref the column are dropped auotmatically from the schema along with the column
- *** RESTRICT: col is dropped only if there arent any views or constraints or other schema elements that reference it
- ** alter column definition by adding/dropping default clause, or adding/dropping a named constraint
ex1: DROP SCHEMA COMPANY CASCADE;
ex2: DROP TABLE DEPENDENT CASCADE;
ex3: ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);
ex4: ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address CASCADE;
ex5: ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn DROP DEFAULT;
ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn SET DEFAULT ‘3333445555’;
ex6: ALTER ATBLE COMPANY.EMPLOYEE DROP CONSTRAINT EMPSUPERFK CASCADE;
[Explain the different meanings of NULL with example.] Why does SQL not allow to compare an attribute value to NULL using = and <> operators?
SQL allows queries that check whether an attribute value is NULL.
Rather than using = or <> to compare an attribute value to NULL, SQL uses the comparison operators IS or IS NOT.
This is because SQL considers each NULL value as being distinct from every other NULL value, so equality comparison is not appropriate.
It follows that when a join condition is specified, tuples with NULL values for the join attributes are not included in the result (unless it is an OUTER JOIN).
ex: SELECT Fname, Lname FROM EMPLOYEE WHERE Super_ssn IS NULL;
correlated queries
correlated queries: whenever a condition in the WHERE clause of a nested query references some attribute of a relation declared in the outer query, the two queries are said to be correlated.
* we can understand a correlated query better by considering that the nested query is evaluated once for each tuple (or combination of tuples) in the other query.
ex: #Retrieve the name of each employee who has a dependent with the same first name and is the same sex as the employee.
SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE E.Ssn IN (
SELECT D.Essn
FROM DEPENDENT AS D
WHERE E.Fname = D.Dependent_name
AND E.Sex=D.Sex);
- In general a query written with nested SELECT-FROM-WHERE blocks and using the = or IN comparison operators can always be expressed as a single block query.
ex (can rewrite above query as):
SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E, DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex AND E.Fname=D.Dependent_name;
[Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for.]
c. Aggregate functions and grouping
GROUPING
GROUPING
USEFUL FOR:
- Grouping is used to create subgroups of tuples before summarization.
- Grouping and aggregation are required in many DB apps
- In many cases we want to apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some attribute values. (GROUP BY)
- Sometimes we want to retrieve the values of agg functions only for groups that satisfy certain conditions. (HAVING)
GROUP BY
- need to partition the relation into nonoverlapping subsets (AKA groups) of tuples
- each group has the same value of some attribute(s) (AKA grouping attributes)
- can then apply the aggregate function to each group independently to produce summary info about each group
- if NULLs exist in grouping attr: a separate group is created for the tuples with NULL
HAVING
- only the groups that satisfy the condition are retrieved in the result
- conditions in WHERE clause: limit the tuples to which functions are applied
- vs conditions in HAVNG clause: limits whole groups
HOW ITS USED, OPTIONS:
- GROUP BY: specifies the grouping attributes
- grouping attributes should also appear in the SELECT clause (so that the value of the agg function for those tuples appears along with how they were grouped that way)
- can use a join condition in GROUP BY:
- ** grouping and agg fns are applied after joining the 2 relations
- HAVING: provides a condition on the summary info regarding the group of tuples associated with each value of the grouping attribute
- appears in conjunction with GROUP BY
- only the groups that satisfy the condition are retrieved in the result of the query
employee: ssn, email, salary, dept (1111223333, 'user1@gt.edu', 50000, 3), (2222334444, 'user2@gt.edu', 55000, 1), (3333445555, 'user3@gt.edu', 55000, 3), (4294967295, 'user4@gt.edu', 60000, 3), (1111224444, 'user5@gt.edu', NULL, 3), (1111225555, 'user6@gt.edu', 50000, 2), (1111226666, 'user7@gt.edu', NULL, NULL), (1111227777, 'user8@gt.edu', 50000, 2), (1111228888, 'user9@gt.edu', 50000, NULL)
department: num, d_name (1, 'human resources'), (2, 'research'), (3, 'engineering'), (4, 'marketing'), (5, 'sales')
* ex1: #get the number of employees in each department SELECT E.dept, COUNT(*) FROM employee E GROUP BY E.dept; #null, 2 #1, 1 #2, 2 #3, 4
* ex2: #get the average salary for employees in each department SELECT D.num, D.d_name, AVG(E.salary) FROM employee E INNER JOIN department D ON E.dept=D.num GROUP BY D.num; #1, human resources, 55000.0000 #2, research, 50000.0000 #3, engineering, 55000.0000
* ex3: #get the average salary for employees in each department that has at least 2 employees SELECT D.num, D.d_name, AVG(E.salary) FROM employee E INNER JOIN department D ON E.dept=D.num GROUP BY D.num HAVING COUNT(*) >= 2; #2, research, 50000.0000 #3, engineering, 55000.0000
* ex4: #get the average salary for employees whose salary is higher than 50000 in each department SELECT D.num, D.d_name, AVG(E.salary) FROM employee E INNER JOIN department D ON E.dept=D.num WHERE E.salary > 50000 GROUP BY D.num; #1, human resources, 55000.0000 #3, engineering, 57500.0000
* ex5: #get the average salary for employees whose salary is higher than 50000 in each department that has at least 2 employees SELECT D.num, D.d_name, AVG(E.salary) FROM employee E INNER JOIN department D ON E.dept=D.num WHERE E.salary > 50000 GROUP BY D.num HAVING COUNT(*) >= 2; #3, engineering, 57500.0000