Midterm Flashcards
Data
Human-made symbols invented for tracking and communication purposes
Metadata
Data descriptions specifying data content, relationships, and type; skeleton of a database.
Database
A collection of organized data (electronic or paper format). A set of records. Database is a part of a database system.
Database Management System (DBMS)
A software package for creating and managing databases and other elements of a database system.
DBMS Product (Database Product)
A certain brand of DBMS; DBMS produced by a software vendor (e.g., MS Access, Sybase, Informix).
Database System (DBMS Application, Application)
An implementation of a DBMS product that includes one or more databases, code, logic and user interface, and is designed to support organizational information needs. The data infrastructure of organizations upon which specific-purpose information systems are built. A DB system with an output designed according to particular business needs can be thought of as an information system.
Record
A logical unit of organizing data; a set of data describing an instance of a thing that a database system keeps a track of. In the object oriented methodology, a record usually tracks an object.
DBMS Components
Database engine (Data & Metadata management, Security Management), Query processor (Retrieval), Report writer (Input/Output; User Interface), Application generator (System Development tools).
CRUD
Create, Read, Update, Delete
File Programming Approach (also called Flat File Systems, File/File Processing Systems)
Data Files and program files (code) that work on data files, or data and code in the same file.
Hierarchical database
An older DBMS type that organizes data in hierarchies that can be rapidly searched from top to bottom, e.g., Customer – Order – OrderItem.
Network database
An older DBMS type that expanded the hierarchical database by supporting multiple connections between entities. A network database is characterized by the requirement that all connections had to be supported by an index.
Relational database
The most popular type of DBMS. All data is stored in tables (sometimes called relations). Tables are logically connected by the data they hold (e.g., through the key values). Relational databases should be designed through data normalization.
Object-Oriented Databases
The goal is to define
objects that can be reused in many programs—thus saving time and reducing errors.
Object-Relational DB Systems
- More frequent than pure object-oriented systems.
- Architecture:
- -Database is relational
- -Objects are created in main memory according to class diagram and business rules, and populated by data from the relational databases.
- -System operations, then, performed by objects.
Database Engine Components
data dictionary, concurrency and lock manager, recovery manager, disk space manager
Data-Program Dependence
File systems; tight coupling between data and code—programs working on data
Data-Program Independence
Database approach; decoupling of code, data, and metadata
Class Diagram
A static model of classes and their relationships (classes & relationships) and write business rules.
Entity
The aspect of reality recorded in the system. (Example: Customer) (DEFINITION: An item in the real world that we wish to identify and track.)
Class
Description of an entity, which includes its attributes (properties) and behavior (methods, procedures). (Example: Customer, includes attributes and behavior) (DEFINITION: A descriptor for a set of objects with similar structure, behavior, and relationships. That is, a class is the model description of the business entity. A business model might have an Employee class, where one specific employee is an object in that class.)
Object (Class Instance)
One instance of a class with specific data; a record in a database. (Example: 12257, Joe Jones… (a specific customer)) (DEFINITION: An instance or particular example of a class. For example, in an Employee class, one individual employee would be an object. In a relational environment, a class is stored as a table, while an individual row in the table contains data for one object.)
Property (Attribute)
a data property of a class; an aspects of an entity. (Example: CustID, Name…) (DEFINITION: A feature or characteristic of an entity. An attribute will become a column in a data table. Employee attributes might include name, address, date hired, and phone.)
Method (Procedure, Behavior)
Procedures (methods, processes, functions) performed by the class. (Example: AddCustomer, DeleteCustomer… CRUD). (DEFINITION: A function or operation that a class can perform. For example, a Customer class would generally have an AddNew method that is called whenever a new customer object is added to the database.)
Degree of Association
Association Degree refers to the number of classes involved in an association: 1) Class with itself - unary: rare. 2) Between 2 classes - binary (previous slide); most frequent and most important!) 3) Between 3 or more classes - N-ary; rare. (A Association is a connection between classes or between entities).
Association Multiplicity (Cardinality)
- One-to-many (1:M) (Example: Customer-Order)
- Many-to-many (M:N) (Example: Order-Item)
- One-to-one (1:1) (Example:Customer-BillingAddress)
(A Multiplicity refers to the number of objects participating in a relationship)
Minimum Multiplicity
1 or 1…1
Maximum Multiplicity
- or 0…*
N-ary Association
An association among three or more classes. It is drawn as a diamond on a UML class diagram. The term comes from extending English terms: unary means one, binary means two, ternary means four; so N-ary means many.
Multiplicity Question
One object on your side of the association can be associated with how many objects on the opposite side of the association? The switch side and ask the same question.
Or, how many objects on your side of the association can be associated with one object on the opposite side of the association? The switch side and ask the same question.
Generalization
A relationship among classes that begins with a generic class. More detailed classes are derived from it and inherit the properties and methods of the higher-level classes.
Aggregation
The generic name for several SQL functions that operate across the selected rows. Common examples include SUM, COUNT, and AVERAGE.
Composition
A relationship in which an object is composed of a collection of other objects. For example, a bicycle is built from components. In UML, it is indicated with a small filled diamond on the association end.
Inheritance
In object-oriented design, the ability to define new classes that are derived from higher-level classes. New classes inherit all prior properties and methods, so the programmer only needs to define new properties and methods.
Polymorphism
In a class hierarchy each new class inherits methods from the prior classes. Through polymorphism you can override those definitions and assign a new method (with the same name) to the new class.
Computed Attribute
Denotes computed values. Written with a slash (/). Common example in business DB system: Sub = Item’s Sale Price * Quantity Sold
Event
Something that arises during database or form operations. Events are named and developers can write code that is executed when a specific event is triggered. (Example: Change in organization: Item is sold) Look at Trigger.
Trigger
An event that causes a procedure to be executed. For example, clicking a button can be a trigger, as can a change in a data value. (Example: Change in organization: Decrease inventory count) Look at Event.
Data Domain (Type)
Format of data (e.g., text vs. number) and the range of values. (Look at Data Types (Domains) CH. 2 Part 2 Slide 8)
Data normalization
The process of putting data into the format of relational databases or organizing data into correctly designed tables.
Table
A collection of data for one class or entity. It consists of columns for each attribute and a row of data for each specific entity or object.
Relational Database
A collection of tables (relations). Tables store atomic data. The most popular type of DBMS. All data is stored in tables (sometimes called relations). Tables are logically connected by the data they hold (e.g., through the key values). Relational databases should be designed through data normalization.
Primary Key Column (Primary Key, Key, PK)
A column or set of columns that identify a particular row in a table.
Surrogate Key
Internally generated keys used to identify objects. They are often better keys than keys created by external sources because they are easier to guarantee uniqueness. See AutoNumber.
Foreign Key (FK)
A column in one table that is a primary key in a second table. It does not need to be a key in the first table. For example, in an Order table, CustomerID is a foreign key because it is a primary key in the Customer table.
First Normal Form (1 NF)
A table is in 1NF when there are no repeating groups within it. Each cell can contain only one value. For example, how may items can be placed in one Order table? The items repeat, so they must be split into a separate table.
Second Normal Form (2 NF)
A table is in 2NF if every nonkey column depends on the entire key (not just part of it). This issue arises only if there is a concatenated key (with multiple columns).
Third Normal Form (3 NF)
A table is in third normal form (3NF) if each nonkey column depends on the whole key and nothing but the key.
Insertion Anomaly
Problems that arise when you try to insert data into a table that is not in third normal form. For example, if you find yourself repeatedly entering the same data (e.g., a customer’s address), the table probably needs to be redefined.
Deletion Anomaly
Problems that arise when you delete data from a table that is not in third normal form. For example, if all customer data is stored with each order, when you delete an order, you could lose all associated customer data.
Non-key Column (Non-key)
Other Attributes that are not keys are called non-keys. A non-key depends on key
Concatenated (Combined, Composed) Key
A compound key is a key that consists of 2 or more attributes that uniquely identify an entity occurrence
Dependency (Dependence) among columns
An attribute depends on another attribute if the change of its value is caused by a change of the other attribute.
The key column must be sufficient for determining values of the non-key columns.
Shorthand notation for classes & tables
Customer (TABLE NAME) (CustomerID (PRIMARY KEY IS UNDERLINED), LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber(NON-KEY COLUMNS))
Rules of thumb for normalization:
Break every many-to-many relationship into two, with the goal of getting to one-to-many relationships. Make sure the relationship between PK and FK is 1:M.
If a table has repeating sections, there is huge redundancy, different classes are mixed together, and all anomalies occur. Split the table, so that classes are clearly differentiated.
1NF: A table is in 1NF if it does not have repeating sections.
If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key.
2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key.
If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key.
3NF: A table is in 3NF if it is in 2NF and all non-keys depend on the key only.
Pointer
A logical or physical address of piece of data. Address of data, designation of data location.
Sequential File
Sequential files are the simplest method of storing data. Each row is stored in a predefined order. As long as the data is retrieved in the order specified, access is fast and storage space is used efficiently. The real problems arise when data is added or when users need to retrieve data in several different sequences.
Linked List
To solve the insert problem, indexes are generally based on linked lists instead of sequential lists. A linked list is a technique that splits data even further than a sequential index. With a linked list, any index element can be stored separately. A pointer is then used to link to the next index item. In this example each row of data is stored separately. Then an index is created that is keyed on LastName. However, each element of the index is stored separately. An index element consists of three parts: the key value, a pointer to the associated data element, and a pointer to the next index element.
B+ Tree
It is clear that binary searches are efficient, but how does that help with linked lists on indexes. First, recognize that indexes are sorted, so it should be possible to use a similar approach. Second, think about the list for a few minutes, and you can see that it can be reorganized. Instead of trying to store it sequentially, grab the middle entry (the starting point for any search), and build a tree structure. In many ways, a tree is just a more complex way of storing a linked list. Instead of linear, it contains multiple links.
Hashing Method (Hash Table)
A hash table (also hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
Binary Search
A search technique for sorted data. Start at the middle of the data. If the search value is greater than the middle value, split the following data in half. Keep reducing by half until the value is found.
Data Clustering
Grouping related data together to improve retrieval.
- Data should be close to each other on one disk.
- Preferably within the same disk page or cylinder.
- Minimize disk reads and seeks.
- Example: cluster each invoice with the matching order.
File Partitioning
File partitioning may be useful for any of the following reasons:
- Data volumes are too large to be managed within a single file;
- Separate files need a consolidated view as well as an individual view;
- Data archiving is needed with an efficient, high-performance recall capability.
Indexed Sequential Access Method (ISAM Index)
A data storage method that relies on an index to search and retrieve data faster than a pure sequential search.
List of File Types and Access Methods
Indexed Sequential Access Method (ISAM) & Sequential File, Linked List Index, B+ Tree index & Hash Index.
List of File Types and Access Methods (Similarities & Differences)
B+-Tree is best overall, Hashing is good for high-speed random access, & sequential/ISAM is good if entire larger tables often used.
Operators
The forms of implementing constraints in queries; they can be relational operators (=, >, <, etc.), Boolean operators, and others (e.g., BETWEEN…AND, LIKE).
DeMorgan’s law
An algebraic law that states: To negate a condition that contains an AND or an OR connector, you negate each of the two clauses and switch the connector. An AND becomes an OR and vice versa.
Boolean Operator
Are simple words (AND, OR, NOT or AND NOT) used as conjunctions to combine or exclude keywords in a search, resulting in more focused and productive results. This should save time and effort by eliminating in appropriate hits that must be scanned before discarding.
Boolean Operator AND
Requires both terms to be in each item returned. If one term is contained in the document and the other is not, the item is not included in the resulting list. (Narrows the search) Example: A search on stock market AND trading includes results contains: stock market trading; trading on the stock market; and trading on the late afternoon stock market.
Boolean Operator NOT
(Dependent upon the coding of the database’s search engine) – the first term is searched, then any records containing the term after the operators are subtracted from the results. (Be careful with use, as the attempt to narrow the search may be too exclusive and eliminate good records). If you need to search the word not, that can usually be done by placing double quotes («_space;») around it. Example: A search on Mexico AND NOT city includes results contains: New Mexico; the nation of Mexico; US-Mexico trade; but does not return Mexico City or This city’s trade relationships with Mexico.
Boolean Operator OR
Either term (or both) will be in the returned document. (Broadens the search) Example: A search on ecology OR pollution includes results contains: documents containing the world ecology (but not pollution) and other documents containing the word pollution (but not ecology) as well as documents with ecology and pollution in either order or number of uses.
Query
To extract meaningful data from database, which help us answer business questions. Querying databases is where business value form a DB system is drawn.
Null value
A missing (or currently unassigned) value. A blank cell, missing data
Structured Query Language (SQL)
A standardized database language, used for data retrieval (queries), data definition, and data manipulation.
Aggregate functions
The generic name for several SQL functions that operate across the selected rows. Examples: Sum, Avg, Min, Max, Count, StDev, Var.
Truth table using AND, OR, and NOT
Check slide 13 in Chapter 4.
A(T) - B(T) - A AND B (T) - A OR B (T)
A(T) - B(F) - A AND B (F) - A OR B (T)
A(F) - B(T) - A AND B (F) - A OR B (T)
A(F) - B(F) - A AND B (F) - A OR B (F)
A(T) - B(-) - A AND B (NOT A) - A OR B (F)
A(F) - B(-) - A AND B (NOT A) - A OR B (T)
Structure of SQL Query run on one table (order of command lines)
SELECT, FROM, INNER JOIN, WHERE, GROUP BY, HAVING, ORDER BY.
SQL statement: Someone From Ireland Would Grow Horseradish and Onion.
SELECT, FROM, INNER JOIN, WHERE, GROUP BY, HAVING and ORDER BY.
Inner or Equi-Join
A SQL equality join condition. Rows from two tables are joined if the columns match exactly. Equi-join is the most common join condition. Rows that have no match in the other table are not displayed.
LEFT JOIN
An outer join that includes all of the rows from the “left” table, even if there are no matching rows in the “right” table. The missing values are indicated by Nulls. See right join and inner join. Left and right are defined by the order the tables are listed; left is first.
RIGHT JOIN
An outer join that includes all the rows from the “right” table, even if there are no matching rows in the “left” table. The missing values are indicated by Nulls. See right join and inner join. Left and right are defined by the order the tables are listed; left is first.
Nested Query or Subquery
Using a second query to retrieve additional data within the main query. For example, to retrieve all sales where price was greater than the average, the WHERE clause could use a subquery to compute the average price.
UNION
An SQL clause to combine rows from two SELECT statements. Both queries must have the same number of columns with the same domains. Most systems also support INTERSECT and EXCEPT (or SUBTRACT) operators.
IN clauses in SQL
A SQL WHERE clause operator typically used with subqueries. It returns a match if the selected item matches one of the items in the list. For example, WHERE ItemID IN (115, 235, 536) returns a match for any of the items specified. Typically, another SELECT statement is inserted in the parentheses.
NOT IN clauses in SQL
The SQL negation operator. Used in the WHERE clause to reverse the truth value of a statement. Which animals have not been sold? Start with list of all animals. Subtract out list of those that were sold.
SELECT Merchandise.ItemID, Merchandise.Description, Merchandise.ListPrice
FROM Merchandise
WHERE (((Merchandise.ListPrice)>(Select Avg(ListPrice) FROM Merchandise)))
ORDER BY Merchandise.ListPrice DESC;
The query will output item IDs, decriptions and list price for the merchandise items with a list price greater than the average list price of all merchandise items
SELECT Animal.Gender, Avg([SaleDate]-[OrderDate])
FROM Sale INNER JOIN (AnimalOrder INNER JOIN
((Animal INNER JOIN AnimalOrderItem ON
Animal.AnimalID=AnimalOrderItem.AnimalID) INNER JOIN
SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID) ON
AnimalOrder.OrderID = AnimalOrderItem.OrderID) ON
Sale.SaleID = SaleAnimal.SaleID
WHERE Animal.Category=”Cat”
GROUP BY Animal.Gender;
The query will output the gender and the average time between ordering and selling cats.
SELECT Sale.CustomerID, Customer.LastName, Customer.FirstName, Sum([SalePrice][Quantity]) AS [Monthly Total]
FROM (Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID)
INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID
WHERE Month(Sale.SaleDate)=5
GROUP BY Sale.CustomerID, Customer.LastName, Customer.FirstName
ORDER BY Sum([SalePrice][Quantity]) Desc;
The query will output customers’ IDs, last names, first names, and their total expenditures in May and sort the output by the total expenditure in decreasing order.
SELECT DISTINCT Employee.EmployeeID, Employee.LastName,
Employee.FirstName
FROM Employee INNER JOIN Sale ON Employee.EmployeeID = Sale.EmployeeID
WHERE (((Sale.SaleDate) Between #6/15/2007# And #6/30/2007#));
The query will output IDs, last names and first names for those employees that sold something between June 15-30, 2007 (or in the last two weeks of June 2007).
SELECT Animal.AnimalID, Animal.Name, Animal.Category, [DateOrder]-[OrderBorn]
AS Days, AnimalOrder.OrderDate
FROM AnimalOrder INNER JOIN (Animal INNER JOIN AnimalOrderItem
ON Animal.AnimalID = AnimalOrderItem.AnimalID)
ON AnimalOrder.OrderID = AnimalOrderItem.OrderID
WHERE ( ( ([OrderDate]-[DateBorn]) Between 0 And 30) );
The query will output IDs, names, category and the period between the birth and order date for those animals that were orderd up to 30 days after thier birth (or that were less than a month 30 old when ordered).
SELECT DISTINCT Employee.LastName
FROM Employee INNER JOIN (Sale INNER JOIN SaleAnimal
ON Sale.SaleID = SaleAnimal.SaleID) ON Employee.EmployeeID = Sale.EmployeeID
WHERE (((Sale.SaleDate) Between #2/1/2007# And #2/28/2007#))
AND Employee.EmployeeID NOT IN
(SELECT Sale.EmployeeID
FROM Sale INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID
WHERE (((Sale.SaleDate) Between #2/1/2007# And #2/28/2007#)));
The query will output last names of the employees that sold animals but no merchandise in February 2007?