SQL - CL2 Flashcards
Creating, modifying, removing database objects
CREATE TABLE statement
Even though we can make some generalizations about the database
table creation process, internal table implementations and CREATE
TABLE statement clauses differ from vendor to vendor significantly.
For example, Oracle’s CREATE TABLE syntax diagram is about fifteen
pages long; DB2’s takes roughly seven pages; MS SQL Server has
the shortest definition — only one-and-a-half pages.
We are going to concentrate on the most common clauses as
described in the SQL99 standards with emphasis on vendor
implementation differences.
SQL99 offers the following CREATE TABLE syntax:
CREATE [{GLOBAL | LOCAL}
TEMPORARY] TABLE (
[ |
[[,] ] [,…] [DEFAULT
] [COLLATE ],…
[] [ON COMMIT {DELETE | PRESERVE} ROWS])
Column definitions
The table has to have one or more column definitions, which consist of
the column name and the data type.
CREATE INDEX statement
The CREATE INDEX statement differs slightly for different
implementations. The somewhat simplified syntax is below.
CREATE [UNIQUE | BITMAP]
INDEX [.] ON
[.] ({ |
}[ASC | DESC],…)
[];
This statement creates a unique index on the IDX_CUST_NAME on
the CUST_NAME_S column of the CUSTOMER table with column
values stored in descending order: CREATE UNIQUE INDEX
idx_cust_name ON CUSTOMER(cust_name_s DESC)
CREATE VIEW statement
This section explains the CREATE VIEW statement for different
RDBMS implementations.
Here is the SQL99 syntax for a CREATE VIEW statement:
CREATE VIEW
[(,…)] AS [WITH [CASCADED |
LOCAL] CHECK OPTION]
Creating complex views
We already mentioned that you can create a view based on practically
any SELECT statement (with some insignificant limitations). The
SELECT statement itself is one of the most difficult SQL topics and will
be covered in detail in later chapters. Examples below are to illustrate
the main concepts used when creating a complex view.
Join view with GROUP BY clause and aggregate
function
V_CUSTOMER_TOTALS displays the total calculated order price
grouped by the CUSTOMER_NAME and then by ORDER_NUMBER
fields:CREATE
VIEW v_customer_totals ( customer_name, order_number, total_price ) AS
customer.cust_name_s, order_header.ordhdr_nbr_s, sum(product.prod_price_
order_line.ordline_ordqty_n) FROM customer, order_header, order_line,
product WHERE customer.cust_id_n = order_header.ordhdr_custid_fn AND
order_header.ordhdr_id_n = order_line.ordline_ordhdrid_fn AND product.pr
= order_line.ordline_prodid_fn AND order_line.ordline_ordqty_n IS NOT N
GROUP BY customer.cust_name_s, order_header.ordhdr_nbr_s
View based on another view example
The V_CUSTOMER_TOTALS_OVER_15000 view displays the same
data as its underlying view V_CUSTOMER_TOTALS but only for
orders with a total price over $15,000:
CREATE VIEW
v_customer_totals_over_15000 AS SELECT * FROM v_customer_totals W
total_price > 15000
View with UNION example
The V_CONTACT_LIST view displays the combined list of customers
and salesmen with their phone numbers and contact types (customer
or salesman):
CREATE VIEW v_contact_list (
name, phone_number, contact_type ) AS SELECT cust_name_s, phone_pho
‘customer’ FROM customer, phone WHERE cust_id_n = phone_custid_fn A
phone_type_s = ‘PHONE’ UNION SELECT salesman_name_s, phone_phon
‘salesperson’ FROM salesman, phone WHERE salesman_id_n = phone_sales
AND phone_type_s = ‘PHONE’
View with subqueryV_WILE_BESS_ORDERS displays orders for customer WILE BESS
COMPANY:
CREATE VIEW
v_wile_bess_orders ( order_number, order_date ) AS SELECT ordhdr_nbr_s
ordhdr_orderdate_d FROM order_header WHERE ordhdr_custid_fn IN ( SE
cust_id_n FROM customer WHERE cust_name_s = ‘WILE BESS COMPAN)
Schemas
A schema is a logical database object holder. SQL99 defines a
schema as a named group of related objects. Creating schemas can
be useful when objects have circular references, that is, when we
need to create two tables each with a foreign key referencing the
other table. Different implementations treat schemas in slightly
different ways.
CREATE SCHEMA statement
SQL99 states that the CREATE SCHEMA statement creates a group
of objects that somehow logically relate to each other; the group has a
name that is called schema name. Also, you can grant privileges on
these objects within the CREATE SCHEMA statement. The syntax is
CREATE SCHEMA { | AUTHORIZATION
| AUTHORIZATION
} ,…
,…
ALTER TABLE statement The SQL99 ALTER TABLE syntax is: ALTER TABLE {[ADD [COLUMN] ] | [ALTER [COLUMN] {SET DEFAULT | DROP DEFAULT}] | [DROP [COLUMN] RESTRICT | CASCADE] | [ADD ] | [DROP CONSTRAINT RESTRICT | CASCADE] }; Basically this code defines the syntax for adding or deleting a table column or a default value for a column as well as for dropping a table constraint.
Adding columns
Often you will need to add a column to an existing table. For example,
our ACME database can easily handle situations when a customerhas more than one phone number. But now imagine we have a new
set of business rules saying that if a customer or a salesman has
more than one phone, we need a way to know which number is
primary and what kind of phone it is (business, mobile, home, etc.).
So, we want to add two new columns to PHONE:
PHONE_PRIMARY_S with a default value of Y and
PHONE_CATEGORY_S. We also want to specify the range of valid
values for PHONE_PRIMARY_S (Y and N). The task is quite simple to
implement: ALTER TABLE PHONE ADD PHONE_PRIMARY_S
CHAR(1) DEFAULT ‘Y’ CHECK (PHONE_PRIMARY_S IN (‘Y’, ‘N’))
ADD PHONE_CATEGORY_S CHAR(15);
As you can see from this example, Oracle allows you to use one
ALTER TABLE statement to add multiple columns at the same time.
You can drop multiple columns within one ALTER TABLE statement
as well.
Modifying existing columns
You can modify columns in a couple of different ways. You can
change the column’s data type; increase or decrease the size of a
CHARACTER or a RAW column (or precision of a NUMERIC column).
You can also specify a new default for an existing column and/or add
a NOT NULL constraint to it.
As usual, certain restrictions apply. You can decrease the size or
precision of a column only if it’s empty, that is, if it contains all NULLs.
(You can decrease the size and precision with no limitations though.)
You also can’t add NOT NULL constraint to a column that contains
NULLs. (Again, you can do it the other way around.) You can change
a column’s data type to a compatible data type only (for example,
CHAR can be changed to VARCHAR) if the column is not empty; no
limitations exist for empty columns.
Removing table columns
Now imagine the business rules have changed again, and the
columns you added and modified in previous examples are no longer
needed. The following statement removes PHONE_PRIMARY_S and
PHONE_CATEGORY_S columns from PHONE table: ALTER TABLE
PHONE DROP (PHONE_PRIMARY_S, PHONE_CATEGORY_S);
Modifying constraints
The MODIFY CONSTRAINT clause enables you to change the
existing column or table constraints’ status of INITIALLY IMMEDIATE
to INITIALLY DEFERRED assuming the constraints were created with
keyword DEFERRABLE. Assuming Chapter 4 examples, it would
probably make sense to issue the following command after the data
load is complete: ALTER TABLE SALESMAN MODIFY CONSTRAINT
chk_salesstatus INITIALLY IMMEDIATE;
The constraint behavior is now indistinguishable from one of a
nondeferrable constraint; that is, the constraint is checked after each
DDL statement. When you want to perform another data load, the
following statement changes the constraint’s behavior again: ALTER
TABLE SALESMAN MODIFY CONSTRAINT chk_salesstatus
INITIALLY DEFERRED;
Creating new constraints
As with new columns, you might need to create new constraints on
the existing table. The situations are usually similar to those when you
might need new table columns — for example, when implementing
new business rules or amending bad initial designs. For example, you
might have realized the salesman’s code in the SALESMAN table
must be unique. The following statement accomplishes the task:
ALTER TABLE salesman ADD CONSTRAINT uk_salesmancode
UNIQUE (salesman_code_s);
And here are examples of implementing referential integrity
constraints on tables created in Chapter 4: ALTER TABLE customer
ADD CONSTRAINT fk_cust_payterms FOREIGN KEY
(cust_paytermsid_fn) REFERENCES payment_terms
(payterms_id_n);
ALTER TABLE customer ADD CONSTRAINT fk_cust_salesman
FOREIGN KEY (cust_salesmanid_fn)
REFERENCES salesman (salesman_id_n);
Removing constraints
ALTER TABLE enables you to remove column or table constraints.
For example, to remove the unique constraint you just created, use
ALTER TABLE SALESMAN DROP CONSTRAINT uk_salesmancode;
Disabling and enabling constraints
In some situations you might want to be able to defer constraint’s
action: ALTER TABLE SALESMAN DISABLE CONSTRAINT
chk_salesstatus; Some data load action here… ALTER TABLE
SALESMAN ENABLE CONSTRAINT chk_salesstatus;
Renaming a tableYou can change the name of a table using a RENAME TO clause. For
example, if you (or your boss) decided that name SALESMAN is not
politically correct, you can change it to SALESPERSON using this
command: ALTER TABLE SALESMAN RENAME TO
SALESPERSON;
Please keep in mind that any programs (including your PL/SQL
procedures, functions, packages, and triggers that refer the renamed
table) will no longer be valid after this change. You would have to find
and change all references to the SALESMAN table in your code and
recompile the programs.
So, you’ve found out that’s too much of an effort, so you decided to
change the name back. No problem: ALTER TABLE SALESPERSON
RENAME TO SALESMAN;
DROP TABLE statement
DROP TABLE [.] [CASCADE CONSTRAINTS];
ALTER INDEX statement
As usual, we are not going to concentrate on details of the ALTER
INDEX statement and will rather refer you to Oracle documentation
for more information, and we’ll illustrate only a couple of clauses that
might be interesting for us in the course of this book:
ALTER INDEX
{[RENAME TO ] |
[REBUILD TABLESPACE ]
};
Renaming indexes The RENAME clause can be useful if your indexes were created with system-generated names (that is, if you did not name them specifically on object creation). For instance, suppose you created table SALESMAN with the following statement: CREATE TABLE salesman ( salesman_id_n NUMBER PRIMARY KEY, salesman_code_s VARCHAR2(2) UNIQUE, salesman_name_s VARCHAR2(50) NOT NULL, salesman_status_s CHAR(1) DEFAULT 'Y', CONSTRAINT chk_salesstatus CHECK (salesman_status_s in ('N', 'Y')));
Rebuilding indexes into a different tablespace
Another frequent mistake while creating Oracle indexes is to create
them in a wrong tablespace. This happens even more often than with
tables because Oracle does not provide a default index tablespace
option. Specifically, regardless if a user has an assigned default data
tablespace (DATA01 for example), all physical objects (including
indexes) are created in this tablespace by default if otherwise was not
specified. That is not always the desirable behavior, especially in a
production environment where data and index tablespaces are usually
located on separate physical devices (hard disks); creating indexes in
wrong tablespaces can significantly degrade performance.
The ALTER INDEX command can be used to fix the problem:
ALTER INDEX IDX_SALESMAN_ID
REBUILD TABLESPACE INDEX01;
DROP INDEX statement
Similar to the DROP TABLE statement, DROP INDEX releases the
allocated space and removes the index definition from the database
information schema. You cannot drop indexes created to implement
PRIMARY KEY or UNIQUE constraints using DROP TABLE; ALTER
TABLE … DROP CONSTRAINT statement would have to be used
instead.
The syntax is
DROP INDEX [.];
ALTER VIEW statement
The two main uses of ALTER VIEW in Oracle are either to recompile
a view invalidated because of underlying table(s) DDL changes or to
add, modify, or drop view constraints. The syntax is
ALTER VIEW [.]
{[COMPILE] |
[ADD CONSTRAINT ] |
[MODIFY CONSTRAINT ] |
[DROP {CONSTRAINT | PRIMARY KEY}]
};
DROP VIEW statementDROP VIEW statement removes view definition from the system
catalog. The dependent objects become invalid.
The syntax is
DROP VIEW view_name [CASCADE CONSTRAINTS];
For example
DROP VIEW v_phone_number;
The optional CASCADE CONSTRAINTS clause removes all referential
integrity constraints that refer to primary and unique keys in the view
to be dropped. See Oracle’s DROP TABLE statement described
previously.
Aggregations (ORDER BY, GROUP BY, HAVING, SUM, COUNT, AVG, etc)
Nested subqueries
The subquery could host a subquery in its turn. This is called nested
subqueries. There is no theoretical limit on the nesting level — i.e.,
how many times there could be a query within a query — though
some vendors limit it. Subquery is an expensive way (in computer
resources terms) to find out information and should be used
judiciously.
GROUP BY and HAVING Clauses:
Summarizing Results
Grouping records in the resultset based on some criteria could
provide a valuable insight into data that has accumulated in the table.
For example, you would like to see the final resultset of your orders
(where there could be one or more order items per order) not in the
random order they were entered in, but rather in groups of items that
belong to the same order:
SELECT ordline_ordhdrid_fn,
ordline_ordqty_n AS QTY_PER_ITEM
FROM order_line
GROUP BY ordline_ordhdrid_fn,
ordline_ordqty_n;
Note the repeating values (groups) in the ORDLINE_ORDHDRID_FN
field, representing the order header ID for which there could be one or
more order items and for which there are different quantities. This
information might become more concise with the use of aggregate
functions that could sum the quantity for the order or calculate the
average, and so on (see the example in this paragraph using SUM
function).
The GROUP BY clause is mostly (but not always) used in conjunction
with aggregate functions, which are introduced in Chapter 10. The
aggregate functions return a single value as a result of an operation
conducted on a set of values. The set is grouped to provide a series
of sets for use with the aggregate functions.
The HAVING clause used exclusively with the GROUP BY clause
provides a means of additional selectivity. Imagine that you need to
select not all records in your GROUP BY query but only those that
would have their grouped value greater than 750. Adding additional
criterion to the WHERE clause would not help, as the value by which
we could limit the records is calculated using GROUP BY and is
unavailable outside it before the query has completed execution. The
HAVING clause used within the GROUP BY clause allows us to add
this additional criterion to the results of the GROUP BY operation. For
example, to display orders with a total quantity greater than 750, the
following query could be used:
SELECT ordline_ordhdrid_fn,
SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER
FROM
order_line
GROUP BY ordline_ordhdrid_fn
HAVING SUM(ordline_ordqty_n) > 750
ORDER BY Clause: Sorting Query Output
The query returns results matching the criteria unsorted — i.e., in the
order they’ve been found in the table. To produce sorted output —
alphabetically or numerically — you would use an ORDER BY clause.
The functionality of this clause is identical across all “big-three”
databases.
The following query sorts the output by the customer name
alphabetically in ascending order:
SQL> SELECT cust_name_s,
cust_alias_s,
cust_status_s
FROM
customer
ORDER BY cust_name_s;
The results could be sorted in either ascending or descending order.
To sort in descending order, you must specify keyword DESC after the
column name; to sort in ascending order you may use ASC keyword
(or omit it altogether, as it is done in the above query, since ascending
is the default sorting order).
It is possible to sort by more than one column at the same time,
though results might not be as obvious. The precedence of the
columns in the ORDER BY clause is of importance here: First results
are sorted by the values of the first column, then — within the order
established by the first column — the results will be sorted by the
second column values.
It is even possible to specify different orders — ascending or
descending for the different columns listed in the ORDER BY clause.
The following example orders the records from the PRODUCT table
first by the price in ascending order, then by the net weight — in
descending order for each price value:
SELECT prod_id_n,
prod_price_n,
prod_netwght_n
FROM product
ORDER BY prod_price_n ASC,
prod_netwght_n DESC
Combining the results of multiple queries (union, except, intersect, minus, subqueries)
Combining the Results of Multiple Queries
It is possible to produce a single result combining the results of two or
more queries. The combined resultset might be a simple aggregation
of all records from the queries; or some operation related to the theory
of sets (see Appendix L) could be performed before the final resultset
was returned.
The SQL99 standard supports UNION, INTERSECT, and EXCEPT
clauses that could be used to combine the results of two or more
queries.
UNION
The following query returns all the records containing some
information about customers that do not yet have an assigned
salesman:
SELECT phone_custid_fn OWNER_ID,
‘CUSTOMER PHONE’ PHONE_TYPE,
phone_phonenum_s
FROM phone
WHERE phone_type_s = ‘PHONE’
AND phone_salesmanid_fn IS NULL
This query returns a total of 37 records. Now, assume that you also
would like to include in the resultset the list of salesmen’s phones who
do not have a customer assigned to them yet. Here is the query to
find these salesmen; it returns six records:
SELECT phone_salesmanid_fn,
‘SALESMAN PHONE’,
phone_phonenum_s
FROM
phone
WHERE
phone_type_s = ‘PHONE’
AND
phone_custid_fn IS NULL
To combine these records into a single resultset, you would use the
UNION statement
Now you have a full list that includes all records from the query about
customers, combined with the results brought by the query about
salesmen. You may visualize this as two resultsets glued together. All
queries in an SQL statement containing a UNION operator must have
an equal number of expressions in their lists. In addition, these
expressions (which could be columns, literals, results of functions,
etc.) must be of compatible data types: For example, if the expression
evaluates to a character string in one query, it cannot be a number in
the second query that is joined to the first by the UNION operator.
The results of UNION could be ordered (as we can see in the UNION
query above) but the ORDER BY clause could be used only with the
final resultset — that is, it can refer to the result of the UNION, not to
particular queries used in it.
If the queries potentially could bring duplicate records, you may want
to filter the duplicates, or, conversely, make sure that they all are
present. By default, the UNION operator excludes duplicate records;
specifying UNION ALL makes sure that your final resultset has all the
records returned by all the queries participating in the UNION.
INTERSECT
The INTERSECT operator is used to evaluate results returned by two
queries but includes only the records produced by the first query that
have matching ones in the second. This operator is implemented in
Oracle and IBM DB2 UDB but not in Microsoft SQL Server 2000,which uses EXISTS operator for this purpose.
Consider the query that selects customer IDs (field CUST_ID_N) from
the CUSTOMER table of the ACME database and intersects them with
results returned by a second query, producing a resultset of
customer’s IDs who placed an order:
SELECT cust_id_n
FROM customer
INTERSECT
SELECT ordhdr_custid_fn
FROM order_header
EXCEPT (MINUS)
When combining the results of two or more queries into a single
resultset, you may want to exclude some records from the first query
based on what was returned by the second. This keyword is
implemented in IBM DB2 UDB only, whereas Oracle uses the MINUS
keyword and Microsoft SQL Server 2000 uses EXISTS for the same
purpose.
The EXCEPT result will be all the records from the first minus those
returned by the second.
SELECT cust_id_n
FROM customer
MINUS
SELECT ordhdr_custid_fn
FROM order_header;
Sessions, transactions, locks
In an ideal world, a database is accessed by one and only one user,
changes are made and saved in a proper order, and malicious
intrusion is an unheard of concept. Unfortunately, it doesn’t happen
like that in the real world. Frequently data must be shared; it might
come from different sources, sometimes at unusual times; and rarely,
if ever, would a database server be accessed by one user at a time.
Relational databases were designed to work in a multiuser
environment. When more than one user accesses the same set of
data, a completely different set of problems appears: What data
should be visible for the users? Which modification should take the
precedence? What is the guarantee that the data changes will not be
lost during execution of a lengthy database procedure? The answer to
these and many other problems comes in terms of sessions,
transactions, and locks.
The transaction is a solution to potential data consistency problems
(discussed in detail later in the chapter), while locks deal with data
concurrency problems. The session represents the context in which
— along with some other things — transactions and locks take place.
Sessions
Whatever happens in terms of communication between an RDBMS
server and a user accessing it happens in the context of a session. In
a multiuser environment, one of the primary concerns is data integrity.
When a client application establishes a connection to an RDBMS
server, it is said that it opens a session. The session becomes this
application’s private communication channel. The user of the
application may change some preferences within the session (for
example, default language or default date format); these settings
would affect only the session environment and remain valid only for
the duration of the session. The details of the implementation and
default behavior of the sessions might differ among the RDBMS, but
these basic principles always remain the same.
By now, you ought to be acquainted with at least one of the tools
provided by Oracle, IBM, or Microsoft to access their respective
databases. Each RDBMS package is a resource intensive piece of
software, and in general it is recommended not to install all of them
onto the same machine. Once you’ve installed your RDBMS of
choice, you could run multiple instances of Oracle’s SQL Plus to
access Oracle 9i RDBMS, Microsoft’s OSQL (if you’ve selected MS
SQL Server 2000), or IBM’s Command Line Processor for IBM DB2
UDB from the same computer where your RDBMS is installed, and
each instance will open its own session, which would be isolated from
every other session established to the RDBMS server.
The SQL standard specifies a number of parameters that could be set
in a session (listed in Table 7-1). None of these are implemented
directly by the RDBMS, though some elements made it into
proprietary syntax, ditching the letter, preserving the spirit.
SET CONNECTION
If more than one connection is opened
by a user to an RDBMS, this
statement allows that user to switch
between the connections.
SET CATALOG
This statement defines the default
catalog for the session.
SET CONSTRAINTS MODE
Changes the constraints mode
between DEFERRED, and
IMMEDIATE.
SET DESCRIPTOR
Stores values in the descriptor area.
SET NAMES
Defines the default character set for
the SQL statements.
SET SCHEMA
Sets the schema to be used as a
default qualifier for all unqualified
objects.
SET SESSION AUTHORIZATION
Sets the authorization ID for the session, no other IDs can be used.
SET TIME ZONE
Sets the default time zone for the
session.
The changes made with an ALTER SESSION statement are valid for
the duration of the session. To make changes permanent, the ALTER
SYSTEM statement should be used.
Transactions
A transaction is one of the mechanisms provided within SQL to
enforce database integrity and maintain data consistency. The details
of implementation differ among the RDBMS vendors, though the
SQL92/99 spirit is generally preserved.
What is a transaction?
A transaction complements the concept of the session with additional
granularity — it divides every operation that occurs within the session
into logical units of work. In this way, database operations — those
involving data and structure modifications — are performed step-by-
step and can be rolled back at any time, or committed if every step is
successful. The idea of the transaction is to provide a mechanism for
ensuring that a multistep operation is performed as a single unit. If
any of the steps involved in a transaction fails, the whole transaction
is rolled back. If all steps have been completed successfully, the
transaction can be either committed (to save all the changes into a
database) or rolled back to undo all the changes.
The SQL standard defined transactions from the very beginning and
enhanced the concept during subsequent iterations. According to the
standard, a transaction is started automatically by RDBMS and
continues until COMMIT or ROLLBACK statements are issued; the
details were left for the vendors to implement.
A transaction must pass the ACID test:
- Atomicity. Either all the changes are made or none.
- Consistency. All the data involved into an operation must be
left in a consistent state upon completion or rollback of the
transaction; database integrity cannot be compromised.
- Isolation. One transaction should not be aware of the
modifications made to the data by any other transaction
unless it was committed to the database. Different isolationlevels can be set to modify this default behavior.
- Durability. The results of a transaction that has been
successfully committed to the database remain there.
Transactions COMMIT and ROLLBACK
The COMMIT statement ends the current transaction and makes all
changes made to the data during transaction permanent. The syntax
is virtually identical for all three RDBMS vendors, as well as for the
SQL99 standard, and is very straightforward: COMMIT [WORK]
The keyword WORK is not required, though it might be added for
clarity; a simple COMMIT is usually all that is required.
COMMIT [WORK] [COMMENT
()] [FORCE (), []] ;
When changes made to the data in the databases need to be
“undone” the ROLLBACK should be used. It may be issued anytime
before the last COMMIT and results in automatic rollback of all
changes made since the controlling transaction had started.
Transaction isolation levels
There are different transaction isolation levels. Isolation levels refer to the
ability of the transaction to see the world (data) outside its own scope,
i.e., data modified by any other transaction. The SQL99 standard
isolation levels are listed in Table 7-5.
READ UNCOMMITED
This level is the lowest of all isolation levels,
permitting dirty reads (i.e., able to see
uncommitted data). No locks are issued, none
honored.
READ
This level specifies that shared locks will be
held while data is being read. No dirty readsCOMMITED
(containing uncommitted data) are permitted;
though phantom reads (when row number
changes between the reads) may occur.
REPEATABLE READ
No changes will be allowed for the data
selected by a query (locked for updates,
deletes, etc.), but phantom rows may appear.
SERIALIZABLE
The highest level of transaction isolation;
places a lock for the whole dataset; no
modifications from outside are allowed until the
end of the transaction.
SET TRANSACTION [READ ONLY] |
[READ WRITE] [ISOLATION LEVEL [SERIALIZABLE | READ COMMITTED
SEGMENT ] [NAME ]
Understanding Locks
Concurrency is one of the major concerns in a multiuser environment.
When multiple sessions write or read data to and from shared
resources, a database might loose its integrity. To prevent this from
happening, every RDBMS worth its salt implements a concurrency
control mechanisms. In the case of RDBMS servers, the concurrency
is managed through various locking mechanisms. All three leading
RDBMS vendors have implemented sophisticated mechanisms for
concurrency management.
Oracle has probably the most evolved and complex locking schema. It
follows the rule that reading and writing processes cannot block each
other, even if working on the same (or a close) set of data. Each
session receives a read-consistent image of the data. Thus, even if
some other process has begun modifying data in the set but did not
commit the changes, every subsequent session will be able to read
the data just as it was before; once the changes are committed in the
first session, every other session is able to see it. The locks are
acquired only when the changes are being committed to the
database. Oracle automatically selects the least-restrictive lock. User
can choose to manually lock a resource (a table, for example). In this
case, other users still might be able to access the data, depending on
the type of lock deployed.
IBM DB2 UDB and Microsoft SQL Server 2000 both employ locks that
can enable a reader to block a writer and vice versa. The problem of
concurrent access to the data is somewhat alleviated by the
granularity of the locking — table, page, row, and so on. There are
locks acquired by read-only queries, DDL statements, DML queries,
and so on. There are different lock types for each scenario, which
we’re going to discuss in more detail.
Most of the time, a user does not have to worry about locking, as
RDBMS automatically select the most appropriate lock (or locks) for a
particular operation; only if this programmed logic fails should you
attempt to specify the locks manually, using the SQL statements.
Locking modes
There are two broad categories of concurrency — optimistic and
pessimistic. The names are self-explanatory. Transactions with
optimistic concurrency work on the assumption that resource conflicts
— when more than one transaction works on the same set of data —
are unlikely (though possible). Optimistic transactions check for
potential conflicts when committing changes to a database and
conflicts are resolved by resubmitting data. Pessimistic transactions
expect conflicts from the very beginning and lock all resources they
intend to use. Usually RDBMS employ both optimistic and pessimistic
transactions, and users can instruct their transactions to use either.
Note
Locking granularity has a significant effect on system
performance. Row-level locking increases concurrency (i.e.,
does not block other transactions from accessing a table)
but usually incurs overhead costs of administration. A full
table lock is much less expensive in terms of system
resources but comes at the price of concurrency. This is
something to keep in mind when designing database
applications.
Locks are used to implement pessimistic transactions, and each
RDBMS has its own levels of locking, though there are some
similarities. In general, there are either share locks or exclusive locks,
which refer to the way a resource (e.g., a table) is being used.
For example, the following statement locks table CUSTOMER of the
ACME database in exclusive mode:
LOCK TABLE customer IN
EXCLUSIVE MODE;
The transaction that issues this statement will attempt to lock the table
for its exclusive use, subject to the restrictions specified in Table 7-6.
If any other process keeps a lock on the table, the transaction will be
put in a queue, and the lock will be acquired in priority received. The
lock will be in place for the duration of the transaction (i.e., until
COMMIT is executed). A deadlock situation might occur (see next
paragraph) if the transaction that already holds a lock on the table
attempts to acquire a lock on a resource that the second transaction
has a lock on. The clause NOWAIT instructs a transaction to move on
if a table it tries to lock is already locked.
LOCK TABLE customer IN
EXCLUSIVE MODE NOWAIT;
If the lock command is issued for a view, Oracle will attempt to lock
the base tables for the view. Certain types of operations require
locking. Oracle will allow you to perform DDL operations on a table
only if that table can be locked.
Dealing with deadlocks
The classic deadlock situation arises when two (or more) sessions are
waiting to acquire a lock on a shared resource, and none of them can
proceed because a second session also has a lock on some other
resource that is required by the first session. Imagine a situation, in
which Session 1 holds resource A, while trying to access resource B;
at the same time Session 2 holds resource B while trying to access
resource A.
Usually RDBMS resolves situations like this automatically by killing
one of the processes and rolling back all the changes it may have
made.
Oracle implements a sophisticated mechanism enforcing the rule
“reader and writer processes cannot block each other.” The idea
behind this rule is to present each process with a consistent image of
data without noncommitted changes. Nevertheless, deadlocks do
occur in Oracle and usually are resolved by the RDBMS itself; in
some rare cases, manual resolution — choosing the deadlock “victim”
process — is required. The most common deadlock types are ORA-
00060 (en queue deadlocks) and ORA-04020 (library cache
deadlocks). It is possible to specify the NOWAIT clause or set up
session timeouts to avoid deadlocks, some other techniques involve
explicit locking and use of the isolation levels within the transaction. A
deadlock may also be resolved manually through Oracle’s interfaces.
IBM DB2 runs a background process, called Deadlock Detector, to
find and resolve the deadlock situation. The session chosen as a
deadlock victim is rolled back, and a special error is generated(SQLCODE-901, SQLSTATE 40001). The read-only process is a prime
candidate for the deadlock victim, and beyond that, DB2 employs
“least cost” criteria to select the session to be killed. If deadlocks ever
become a problem, IBM recommends using system monitoring tools
to collect information about the deadlock situations and either
optimize the system or redesign any applications involved.
Microsoft SQL Server 2000 employs a proprietary algorithm for
detecting deadlocks and resolves them in a way similar to that
implemented by Oracle or DB2 UDB: deadlocks are resolved
automatically or manually through the Enterprise Manager Console. It
is possible to volunteer a session to become a deadlock victim by
setting the DEADLOCK_PRIORITY parameter within that session (see
paragraph about sessions earlier in the chapter).
SET DEADLOCK_PRIORITY LOW
Another way of dealing with the situation would be setting
LOCK_TIMEOUT for the session. Setting the timeout means that the
session will hold the resource under the lock no longer than a
specified interval. Once the time set for locking expires, SQL Server
returns an error and the transaction is rolled back. The resolution of
the situation will be similar to that for every other RDBMS: handle the
situation in which an error indicating a deadlock situation is returned
(Error 1205 for SQL Server, SQLSTATE 40001) by re-running the
transaction, redesigning the application to decrease or eliminate the
deadlock possibility, and so on.
Isolation levels understanding
The Bad Stuff
Before I can explain what’s different about the isolation levels we have to understand what it is that can go wrong with concurrent transactions (more than one client viewing or modifying the data at the same time).
Dirty Reads
A dirty read is when the current transaction reads a row written by another uncommitted transaction that’s currently in-flight.
Basically, if the database is not able to keep track of who is changing the data (by keeping multiple versions of the same row with different visibilities) then rows be read even through they should not yet be visible to that other transaction.
Non-repeatable Reads
A non-repeatable read occurs when the current transaction reads the same data but this time it is different. It is different because another transaction has been committed during the life of the current transaction
Basically, the database does not maintain what the transaction has already seen so each time the data is read (such as multiple SELECT statements in the same transaction) the same visibility check is done on those rows but some rows may have changed in the mean time.
Phantom Reads
A phantom read happens when the current transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
This can be thought of as a special type of non-repeatable read, but it is important to distinguish it as a different case because the possibility of it occurring depends on the isolation levels explained next.
The distinction is that that the original rows are re-read correctly (even if they had been changed) but a new row (or rows) have been inserted into the range it previously selected. So it hasn’t broken the re-read rule, but the data returned is still different.
The SQL Standard
The SQL Standard defines four isolation levels which are described by what type of errors they are allowed to permit. It does not specify how the implementations themselves work.
In most databases MVCC is used. Sometimes with some complex rules to handle higher isolation levels built on top. However, there are other mechanisms; SQLite3 use a separate journal, for example.
- Read uncommitted permits dirty reads, non repeatable reads and phantom reads.
- Read committed permits non repeatable reads and phantom reads.
- Repeatable read permits only phantom reads.
- Serializable does not permit any read errors.
Read Uncommitted
“Read uncommitted” is the most loose transaction isolation and is in fact not even implemented in some databases because it is too loose to be useful. A transaction in this mode is susceptible to all types of read errors mentioned above since it is not required to even check if a row is committed or not.
If the database itself was not able to keep multiple version of a row but you still wanted the virtual container of a transaction to be able to perform a ROLLBACK this might be your only choice.
Read Committed
“Read committed” is the most common transaction isolation and usually is the default as it has the best balance between locking and performance. It will never read versions of rows that are currently uncommitted, however it is still susceptible to other transactions changing data between statements.
Serializable
In a perfect world (at least for a database) we would like only one client to be connected to the database at a any given time. This would guarantee there are no side effects between concurrent clients, since there are none.
This is “Serializable”, the most strict mode and works exactly like this — client perform transactions in serial.
In practicality a database could not serve just one person at a time so some fancy internal algorithms are used to make sure that clients are not overlapping in anyway. This is tricky but means clients can concurrently work with the database but have the same guarantees as if the transactions were formed one after another.
Depending on the database system it may rollback a transaction at the time of, or at the end of the transaction if it thinks the transaction isolation has been compromised. Applications that need this level of isolation should be built carefully so that they can always retry the transaction on failure.
Implementing stored procedures, user-defined functions, triggers
As was mentioned before, SQL is a nonprocedural programming
language by definition. That means it lacks procedural constructs,
specifically, the ability to perform operations in optionally named
hierarchical logical blocks that can accept and return values, perform
iterations, execute conditional statements, and so on.
Note
Most programming languages are procedural. A procedural
program is a list of step-by-step instructions that tell the
computer what to do; for example, repeatedly read user’s
input, multiply it by some predefined constant, and store the
result in a database table. A procedural program can also
have error handling — for example, if the value is
nonnumeric, do not try to multiply it but rather display an
error message; or multiply the input by different values
depending on how large the input was. Also, instead of
storing the result into a table a procedural program can pass
it to a different program (subprogram) that in its order might
perform more calculations and pass the results to yet
another programming module, and so on.
First RDBMS implementations did not have procedural language
capabilities — all procedural database processing was done using
embedded programming. All major procedural languages that were
popular back then (C, COBOL, Pascal, etc.) had (and still have)
special extensions (precompilers) that allow the programmer to
embed SQL statements directly into programming language code. The
work of precompiles is to translate SQL into appropriate language
constructs that can be later compiled into binary code.
However, while relational databases became increasingly
sophisticated and more internal control was delegated to RDBMS, the
idea arose to store procedural programming modules inside RDBMSin compiled (binary) form. The problem was that ANSI/ISO standards
specified no (or almost none) guidelines for such elements. As a
result, each vendor implemented its own version of internal RDBMS
procedural modules.
SQL99 added persistent stored routines (PSR) and triggers to SQL
standards when all major RDBMS vendors already had their own
procedural languages. Oracle’s procedural SQL extension is called
PL/SQL; MS SQL Server uses Transact-SQL; and DB2 UDB
introduced its own version that does not seem to have an official
name, but is sometimes referred to as T-SQL or IBM SQL.
Even though the basic syntax elements of these three languages are
similar, the advanced features differ significantly. It is impossible to
cover all three syntaxes in depth in the scope of this book, so we are
going to explain the common basic features and encourage you to
refer to vendor-specific documentation for more details.
The two main forms of RDBMS procedural routines are stored
procedures and triggers that embody two different procedural
programming approaches — linear and event-driven, correspondingly.
A user-defined function can be envisioned as a special case of a
stored procedure, and a module is just a number of stored procedures
and functions bundled together.
Procedural Extension Uses and Benefits
Stored procedures, user-defined functions, and triggers can be used
in many different ways and for many different reasons. The main
categories include performance improvement, network traffic
reduction, database security, and code reusability.
Performance and network traffic
Stored routines can be used to improve application performance.
Since they simply appear to be compiled code stored inside the
RDBMS, they generally execute faster uncompiled SQL statements
(that normally have to be compiled each time a user or program calls
them).
Network traffic can also be significantly reduced because there is no
need to send SQL statements as they are already compiled and
stored on the server. Each individual statement is probably not large
enough to improve the overall network performance, but, in a large
system with thousands of users and tons of SQL statements, it can
make a difference.
Database security
Stored procedures, functions, and triggers can be used for database
security purposes.
A stored procedure (or function) is a separate database object with its
own database privileges. That means you can design a stored
procedure in such way that it would, for example, update only certain
columns; or insert rows with NULL values for columns that a user who
executes the procedure has no permission to update — s/he would
only need the privilege to execute that particular stored procedure.
Triggers are even more useful for security implementation. For
example, they can be designed in such way that certain actions
performed by users on certain objects are written to special database
tables or OS files. These records can be reviewed later.
Code reusability
Another important thing about stored routines is code reusability —
once compiled, a stored procedure or user-defined function can be
used over and over again by multiple users (or applications), saving
time on retyping large SQL statements and reducing the probability of
human errors. Also, when a persistent module needs to be changed,
the change won’t affect the client programs that access it, as long as
all the calling parameters remain the same.
Stored procedures
As we mentioned before, stored procedures are linear or sequential
programs. The syntax varies from implementation to implementation,
but some common features can be emphasized. Stored procedures
can accept parameters and allow local variable declarations; they are
structured and allow the use of submodules; also, they allow repeated
and conditional statement execution.
CREATE PROCEDURE syntax
The CREATE PROCEDURE syntax is different among RDBMS
implementations. The simplified syntaxes for SQL99 and all our three
major RDBMS vendors are provided below:
CREATE PROCEDURE
Creating a simple stored procedure
Our stored procedure implements the following business logic. New
products are often entered into the PRODUCT table of the ACME
sample database. The problem is the clerks sometimes enter
duplicate product names using the different combinations of
uppercase and lowercase letters. For example, SPRUCE LUMBER
30X40X50, spruce lumber 30x40x50 and Spruce Lumber 30X40X50 are
supposed to be a single entry, but could be entered as three separate
entities with different primary keys referred by foreign keys from other
tables causing data integrity problems.
The procedure SP_PRODUCTADD adds a row to the PRODUCT table
if the product with the given product description does not exist or
updates the existing record with new values. It accepts ten
parameters, one for each column in the PRODUCT table, then checks
if a product with such a description (in uppercase or lowercase letters)already exists, and then performs the appropriate action (INSERT or
UPDATE) based on the result.
Removing a stored procedure
The basic syntax to remove a stored procedure is identical for all
three databases: DROP PROCEDURE [qualifier.]
User-Defined Functions
User-defined functions combine the advantages of stored procedures
with the capabilities of SQL predefined functions. They can accept
parameters, perform specific calculations based on data retrieved by
one or more SELECT statement, and return results directly to the
calling SQL statement.
CREATE FUNCTION syntax
The CREATE FUNCTION syntax is different for different
implementations. The simplified SQL99 syntax as well as syntaxes for
all “big three” SQL procedural extensions are given here:
CREATE FUNCTION
Creating a simple function
Functions can be very useful in many situations. For example,
imagine you need to extract order totals for customers with state sales
tax added to the total price. The sales tax rate is different in different
states; for example, in California it is 7.25 percent at this writing. In
addition, in the ACME database the total price for orders is not stored
anywhere (normalization tradeoffs) but has to be calculated
dynamically by adding prices for all order items multiplied by item
quantities. The latter is done in the ACME view
V_CUSTOMER_TOTALS (see Chapter 4); user-defined function
UF_ORDERTAX takes two parameters, tax factor and order number,
and returns the total order amount multiplied by that factor.
Removing a user-defined function
The basic syntax to remove a user-defined function is identical for all
three databases:
DROP FUNCTION [qualifier.]
Again, as in case with the stored procedures, Transact-SQL allows
you to drop multiple functions within a single DROP FUNCTION
statement.
Triggers
A trigger is a special type of stored procedure that fires off
automatically whenever a special event in the database occurs. For
example, a trigger can be invoked when a row is inserted into a
specified table or when certain table columns are being updated.
CREATE TRIGGER syntax
The syntax differs among RDBMS products, none of which complies
with the SQL99 standards.
SQL99 only mandates triggers that automatically execute on DML
events; here is the syntax to create a trigger: CREATE TRIGGER
[BEFORE | AFTER] {INSERT | UPDATE | DELETE}
ON [FOR EACH ROW]
BEFORE and AFTER keywords specify whether the trigger fires off
before or after the DML event actually takes place. In other words, if a
particular trigger is designed to execute whenever a row is inserted
into table A and the BEFORE keyword is specified, the trigger will fire
off just before the RDBMS tries to perform the insert; if the trigger
produced an error, the insert will never take place. The AFTER
triggers execute upon the triggering DML statement completion; if an
error occurs, the transaction will be rolled back.
Triggers with a FOR EACH ROW optional clause are sometimes
referred as row triggers. A row trigger executes once for each row
affected by a DML statement; for example if an UPDATE statement
modifies column values in ten rows, the trigger will fire off ten times. If
the clause is omitted, the trigger will only execute once no matter how
many rows are affected (table trigger).
Removing a trigger
This syntax removes an existing trigger definition from the system
catalog: DROP TRIGGER [qualifier.]
The syntax is identical for all three databases. Transact-SQL allows
you to drop multiple triggers just in the same way as when dropping
stored procedures, user-defined functions, or any other database
objects.
Cursors
Cursors
Cursor is a special programming construct that allows you to create a
named working area and access its stored information. The main
advantage of cursors is the ability to work with individual rows one-by-one rather than with the record set as a whole. For example, all DML
statements work with record sets, so you could change “all or
nothing”. If your update statement is changing hundreds of thousands
rows, just one single row could cause the whole statement to fail and
roll back all the changes if it violates a column constraint, resulting in
serious time losses. Fortunately, cursors are able to handle such
situations working with each row individually. A logic that combines
cursors, loops, and conditional statements could generate a warning,
store the unsuccessfull row information in a special table, and
continue processing. Also, cursors give you flexibility on commits and
rollbacks (you can commit after each row, after ten rows, or after
every five hundred rows), which sometimes can be very useful to save
system memory space; you can employ conditional logic and perform
calculations on certain values before they are used in your DML
statements; using cursors, you are able to update multiple tables with
the same values, and much more.
Different RDBMS vendors implement cursors in different ways. Both
syntax and functionality vary, which makes it difficult to talk about
some generic cursor. SQL99 standards require a cursor to be
scrollable, that is, you should be able to move back and forth from one
record in the record set to another, but until recently only a few
RDBMS vendors (notably MS SQL Server) implemented such
functionality. The main reason is that a scrollable cursor is a huge
resource waste and not every system can afford it. It is a known fact
that many MS SQL Server developers are explicitly warned against
using cursors unless it is absolutely necessary, whereas for PL/SQL
programmers cursor use is an integral part of their everyday work.
In spite of all the differences, all cursor implementations have some
common features. In general, the main operations you can perform on
a cursor are DECLARE, OPEN, FETCH, and CLOSE.
DECLARE
DECLARE associates a cursor name with a certain SELECT statement
and defines a memory structure to hold the appropriate columns (that
could be of different data types). The simplified SQL99 syntax isDECLARE [SENSITIVE | INSENSITIVE] [SCROLL] CURSOR
FOR
[FOR [READ ONLY | UPDATE [OF ,…]]
INSENSITIVE means that a temporary copy of data based on the
is made, so the cursor fetches are not sensitive to
any DML changes of underlying tables that may happen afterwards;
the default is SENSITIVE, which means every subsequent fetch uses
the actual data at this particular moment of time.
SCROLL specifies the cursor as scrollable; that is, additional FETCH
options (discussed later) are available.
READ ONLY prevents a cursor from changing data while UPDATE
specifies updatable columns within the cursor.
The same syntax is valid in Transact-SQL (even though it has many
additional advanced cursor features not mentioned in this book); the
Oracle PL/SQL cursor declaration is quite different:
CURSOR [] [RETURN ]
IS ;
As you can see, the keyword DECLARE is omitted; and the IS
keyword is used instead of FOR; also PL/SQL cursors are not
scrollable, but can optionally take parameters and return values.
DB2 cursors also do not allow scrolling; other than that the syntax is
similar to the SQL99 one:
DECLARE CURSOR
FOR
OPEN
The OPEN statement executes the underlying query and identifies the
result set consisting of all rows that meet the conditions specified on
the cursor declaration. The basic syntax is consistent between all
three RDBMS:
OPEN
FETCH
FETCH retrieves the current row (for nonscrollable cursors) or a
specific row (for scrollable cursors), parses the row, and puts the
column values into predefined set of variables. The Transact-SQL
syntax is
FETCH [ NEXT | PRIOR | FIRST | LAST | ABSOLUTE n | RELATIVE n ]FROM ,…
The PL/SQL and DB2 syntax for nonscrollable cursors is
FETCH
INTO ,…
The cursor rows of nonscrollable cursors are usually fetched in a loop
until the last row is processed; the scrollable cursors can be used as a
part of a more complicated processing logic.
CLOSE
The CLOSE statement deallocates memory, releases locks, and
makes the cursor’s result set undefined. The syntax is
CLOSE
The cursor can be reopened with the OPEN statement.