4 - 8 Flashcards
WHAT IS RELATIONAL ALGEBRA (RA)
Relational algebra is a low-level language that is being used to explain basic relational operations and principles. It uses a collection of operators to compose the queries. Every operator in the algebra accepts either one or two relation instances as arguments and output a resultant relation instance.
Thus operators can be composed easily to construct complex queries. Each relational algebra query describes a step-by-step procedure for computing the desired answer, based on the order in which the operators are applied in the query. A sequence of relational algebraic operations forms a relational algebraic expression. The result of the relational algebraic expression is also a relation.
By defining the WHAT and HOW data are required Relational Algebra is defined as a Procedural DML
RA OPERATORS (most common)
The relational operators have the property of closure; that is, the use of RA operators on existing relations (tables) produces new relations. Numerous operators have been defined. Some operators are fundamental, while others are convenient but can be derived using the fundamental operators.
Most common used operators are:
σ SELECT (or RESTRICT) - Sigma
π PROJECT - Pi
∪ UNION
∩ INTERSECTION
- DIFFERENCE
x PRODUCT (Cartesian Product)
⋈ ⟕ ⟖ JOINs - Theta
τ ORDER (SORT) - Tau
∪ UNION / ∩ INTERSECTION
Union Is a binary operation denoted by ∪ symbol. UNION is intended to bring together all of the facts from its arguments, however, the relational union operator is intentionally not as general as the union operator in mathematics. We cannot allow for an example that shows union of a binary and a ternary relation, because the result of such union is not a relation.
Intersection Is a binary operation denoted by ∩ symbol. Intersection is not considered a fundamental operation because it can be easily expressed using a pair of set difference operators.
Example:
π <a1, a2>(R) ⋃ π <a3, a4>(R)
π <student_name, course_id>(STUDENT) ⋃ π <student_name, course_id>(STUDENT)
π <a1, a2>(R) ∩ π <a3, a4>(R)
π <module_name>(MODULE) ∩ (π <module_name>(MODULE) ⨝ π <course_id>(STUDENT))</course_id></module_name></module_name>
DATA TYPES
Defines the kind of values that can be used or stored. Also, used in programming languages and database systems to determine the operations that can be applied to such data.
Data type choices can affect database performance
Poor data type choices can have a significant impact on the database design and performance
Right-sizing can result in tremendous storage savings, which can lead to faster database performance
Data Types in PostgreSQL
Numeric
Monetary
Character
Binary
Date/Time
Boolean
Enumerated
Geometric
Network Address
BIT String
Text Search
UUID
XML
JSON
Arrays
Composite
Range
Domain
Object Identifier
Pseudo
NUMERIC TYPES
Each parent category (e.g. Numeric) has several data types associated with the category and each one of the specific data type is suitable for a specific usage that could improve performance and save space.
Data type choices can affect database performance
Poor data type choices can have a significant impact on the database design and performance
Right-sizing can result in tremendous storage savings, which can lead to faster database performance
USING NULL VALUES
NULL values are problematic in the relational model. A NULL is the absence of any data value, and it is never allowed in any part of the primary key. As a general rule, nulls should be avoided as much as reasonably possible. An abundance of nulls is often a sign of a poor design. Also, nulls should be avoided in the database because their meaning is not always identifiable.
For example, a null could represent any of the following:
An unknown attribute value
A known, but missing, attribute value
A “not applicable” condition
QUERY PROCESSING & OPTIMISATION
Query processing requires that the DBMS identify and execute a strategy for retrieving the results of the query. The query determines what data is to be found, but does not define the method by which the data manager searches the database. Therefore Query Optimization is necessary to determine the optimal alternative to process a query.
Query Processing is a procedure of converting a query written in high level language into a correct and efficient execution plan expressed in low level language, which is used for data manipulation. In query processing, the first phase is transformation in which parser first checks the syntax of query and also checks the relations and attributes used in the query that are defined in the database.
Query processor is responsible for generating execution plan.
Execution Plan Query processing is a stepwise process. Before retrieving or updating data in database, a query goes through a series of query compilation steps. These steps are known as execution plan.
DB OPTIMISATION
One of the main functions of a database system is to provide timely answers to end users. End users interact with the DBMS through the use of queries to generate information, using the following sequence:
The end-user (client-end) application generates a query.
The query is sent to the DBMS (server end).
The DBMS (server end) executes the query.
The DBMS sends the resulting data set to the end-user (client-end) application.
INDEXES
An INDEX is a logical data structure that improves the speed of data retrieval operations on a database table by providing rapid random lookups and efficient access of ordered records.
Indexes are also used to define a primary-key or unique index which will guarantee that no other columns have the same values.
Not all indexes are created equal, each INDEX can have a different algorithm for different data types and different data types and queries.
Types of Indexes in PostgreSQL
B-Tree – most common for general purposes; It will be used by default unless is explicit required for another type.
BRIN – Block Range INDEX, designed specific for very large tables (cannot be used for PKs)
GiST – Generalized Search Tree, optimised for full text search, spatial data, scientific data, unstructured and hierarchical data
GIN – Generalized Inverted INDEX, ideal for full text search and binary json
SP-GiST - Space-Partitioning Trees Generalized Search Tree, can be used in the same situations as GiST but can be faster for certain kinds of data distribution
hash - were popular prior to the advent of GiST and GIN and they are still used in legacy systems and in specific cases can perform better than B-Tree
B-Tree-GiST/B-Tree-GIN - Hybrid or combined indexes than allows single column index with two indexes (they are as extension)
What is a VIEW?
A view is a virtual table whose contents are defined by a query. Like a table, a view consists of a set of named columns and rows of data. Unless indexed, a view does not exist as a stored set of data values in a database. The rows and columns of data come from tables referenced in the query defining the view and are produced dynamically when the view is referenced.
Source: Microsoft
In other words: A VIEW is nothing more than a query permanently stored in the database.
TYPES OF VIEWS
Standard Views
In PostgreSQL a VIEW can be created with CREATE OR REPLACE VIEW statement.
Materialized Views
They cache the result of a complex and expensive query and allow the refresh of those result periodically. The materialized views are useful in many cases that require fast data access therefore they are often used in data warehouses and business intelligence applications.
Recursive Views
In general recursive VIEWS (queries) come in handy when working with self-referential data or graph/tree-like data structures.
SUBQUERIEs
A Subquery or Inner query or a Nested query is a query within another SQL query and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved and subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
A subquery is always enclosed within parentheses, and it is usually executed prior to the containing statement. Like any query, a subquery returns a result set that may consist of:
A single row with a single column
Multiple rows with a single column
Multiple rows having multiple columns
TRIGGERs
A trigger is procedural SQL code that is automatically invoked by the RDBMS upon the occurrence of a given data manipulation event. It is useful to remember that:
A trigger is invoked before or after a data row is inserted, updated, or deleted.
A trigger is associated with a database table.
Each database table may have one or more triggers.
A trigger is executed as part of the transaction that triggered it.
Triggers are critical to proper database operation and management. For example:
Triggers can be used to enforce constraints that cannot be enforced at the DBMS design and implementation levels.
Triggers add functionality by automating critical actions and providing appropriate warnings and suggestions for remedial action.
Triggers can be used to update table values, insert records in tables, and call other stored procedures.
DB SECURITY - OVERVIEW
The goal of database security is the protection of data against threats such as accidental or intentional loss, misuse or corruption by unauthorized users. The DBA is responsible for the overall security of the database system. Therefore, the DBA must identify the most serious threats to the database and accordingly develop overall policies, procedures and appropriate controls to protect the databases.
They are three main types of security violation:
Unauthorized modification of data in database.
Unauthorized deletion of data in database.
Unauthorized reading of data in database.
Database security is a complex and challenging endeavour that involves all aspects of information security technologies and practices. It’s also naturally at odds with database usability.
The more accessible and usable the database, the more vulnerable it is to security threats; the more invulnerable the database is to threats, the more difficult it is to access and use.
SECURITY AREAS
To protect the database from these unauthorized access or security violations, the security measures must be taken at following levels:
Database system : Different access rights to different users can be given so that they can use data which they really want. Now, the database is responsible to maintain these restrictions and security; Exploitation of database software vulnerabilities (SQL/NoSQL injection attacks)
Human factor: Authorization must be given carefully and apply correct constraints for data to reduce human errors.
Operating system: Operating System must be secured to unauthorized access. Only a secured database system cannot maintain security.
Network : Systems or Computers are connected through LAN’s, internet etc. and data can be shared through these networks. So, security within the network software is important.
Physical security: Servers and computer systems must be secured physically from looting, fire, failure of systems like disasters.
AUTHORIZATION (ACCESS RIGHTS)
Security can be maintained by giving different authorities to different users according to their work. Different authorizations are :
Read access : It allows only reading of data. User cannot modify, delete, alter or update the data.
Update access : It allows only updating of data. User cannot delete data.
Insert access : It allows addition of new data. User cannot modify and delete the existing data.
Delete access : It allows deletion of data. User cannot modify the existing data.
Index access : It allows the creation and deletion of indices.
Alteration access : It allows the addition or deletion of attributes in a relation.
Resource access : It allows the creation of new relations.
Drop access : It allows the deletion of relations.
You can give people access using the GRANT statement and remove the access using the REVOKE statement. When you control the use of the SELECT statement, the database tool will control who can view a specific database object like a column, view or table.
When you control the use of the INSERT command, you can determine who can enter rows into a table. When you restrict the use of the UPDATE command, you only allow some people to modify the data in the table. The same can be said about the DELETE statement.
ACCESS PRIVILEGES
By default, PostgreSQL users—also known as roles with the login option—can access the public schema. The default PostgreSQL authentication policy allows users to access all databases from the localhost using peer authentication on a Linux system. Users can create database objects-tables, views, functions, and so on in the public schema of any database that they can access by default.
Creating login roles
psql -U valentinadamescu -c ‘CREATE ROLE test_user LOGIN;’;
Creating database
psql -U valentinadamescu -c ‘CREATE DATABASE test;’;
Creating a table
psql -U valentinadamescu -d test -c’CREATE TABLE test_permissions(id serial , name text);’
DATABASE HANDLING FOR MONETIZATION PURPOSES
All big organizations use databases today to further expand their businesses and generate even more revenue for themselves.
Organizations make use of databases to store important user data, then stored data is analysed through time to consider the ongoing business trends that the company can use to predict their most profitable moves for the future. This is a basic technique used by all existing businesses, from Facebook to large departmental stores.
POLICIES, PROCEDURES AND STANDARDS
A successful data administration strategy requires the continuous enforcement of policies, procedures, and standards for correct data creation, usage, and distribution within the database. The DBA must define, document, and communicate the following before they can be enforced:
Policies are general statements of direction or action that communicate and support DBA goals.
Standards describe the minimum requirements of a given DBA activity; they are more detailed and specific than policies. In effect, standards are rules that evaluate the quality of the activity. For example, standards define the structure of application programs and the naming conventions programmers must use.
Procedures are written instructions that describe a series of steps to be followed during the performance of a given activity. Procedures must be developed within existing working conditions and they must support and enhance the work environment.
DATABASE HANDLING FOR RESEARCH PURPOSES
Research today is extremely dependent upon database systems. The point is to store enough data that can be processed in the future to derive important scientific conclusions.
Suppose our research is in the domains of Physics or Computer Science. In that case, we may require data on a lot of calculations in a particular domain, which could be used to understand the computers’ universe better.
If the research is in the domain of subsequently psychology or biology, we may need a record of human behaviour to certain stimuli or different contexts to better facilitate the research.
SQL FOR COMPLEX QUERIES
SQL provides a powerful declarative query language. Writing queries in SQL is usually much easier than coding the same queries in a general-purpose programming language. However, a database programmer should provide a easy access to a general-purpose programming languages for at least two reasons:
Not all queries can be expressed in SQL, since SQL does not provide the full ex-pressive power of a general-purpose language. That is, there exist queries that can be expressed in a language such as C, Java, JavaScript, PHP or Python that cannot be expressed in SQL. To write such queries, we can embed SQL within a more powerful language.
Nondeclarative actions — such as printing a report, interacting with a user, or sending the results of a query to a graphical user interface — cannot be done from within SQL. Applications usually have several components, and querying or updating data are only one component; other components are written in general purpose programming languages. For an integrated application, there must be a means to combine SQL with a general-purpose programming language.
SQL FOR COMPLEX QUERIES APPROACH
Dynamic SQL:
A general-purpose program can connect to and communicate with a database server using a collection of functions (for procedural languages) or methods (for object-oriented languages). Dynamic SQL allows the program to construct an SQL query as a character string at runtime, submit the query, and then retrieve the result into program variables a tuple at a time. The dynamic SQL component of SQL allows programs to construct and submit SQL queries at runtime.
Embedded SQL:
Like dynamic SQL, embedded SQL provides a means by which a program can interact with a database server. However, under embedded SQL, the SQL statements are identified at compile time using a pre-processor, which translates requests expressed in embedded SQL into function calls. At runtime, these function calls connect to the database using an API that provides dynamic SQL facilities but may be specific to the database that is being used.
Key Differences:
Query Construction:
Dynamic SQL: SQL is constructed dynamically at runtime.
Embedded SQL: SQL is written and processed at compile time.
Flexibility:
Dynamic SQL: More flexible, as queries can be modified at runtime.
Embedded SQL: Less flexible, as queries are static once the program is compiled.
Processing:
Dynamic SQL: The SQL query is a string that is parsed and executed at runtime.
Embedded SQL: SQL is pre-processed by a pre-processor, converting it into function calls that are executed at runtime.
FUNCTIONS vs PROCEDURES
A procedure is a database object similar to a function however, there are some key differences:
Procedures are defined with the CREATE PROCEDURE command, not CREATE FUNCTION.
Procedures do not return a function value; hence CREATE PROCEDURE lacks a RETURNS clause. However, procedures can instead return data to their callers via output parameters.
While a function is called as part of a query or DML command, a procedure is called in isolation using the CALL command.
A procedure can commit or roll back transactions during its execution (then automatically beginning a new transaction), so long as the invoking CALL command is not part of an explicit transaction block. A function cannot do that.
Certain function attributes, such as strictness, don’t apply to procedures. Those attributes control how the function is used in a query, which isn’t relevant to procedures. .
LANGUAGE CONSTRUCTS FOR PROCEDURES AND FUNCTIONS
SQL supports language constructs that give it much of the functionality of a general-purpose programming language through the Persistent Storage Module (PSM). These constructs enable the creation of procedures and functions within the database, enhancing SQL’s ability to handle complex logic.
Procedures:
A procedure is a collection of SQL statements that perform a specific task.
Procedures are invoked using the CALL statement.
They can have input, output, or input/output parameters to pass data.
Functions:
A function is a set of SQL statements that return a single value.
Functions are used directly in SQL queries and can be called like built-in functions (e.g., SUM() or AVG()).
They accept input parameters and return a single value.
Control Flow:
SQL provides control flow constructs like conditionals (IF, CASE), loops (WHILE, FOR), and error handling (EXCEPTION) to manage logic flow.
Transactions:
Procedures and functions can execute multiple SQL statements within a single transaction, using constructs like BEGIN ATOMIC … END to ensure atomicity, where all changes are executed together or not at all.
TABLE REPLICATION
SQL provides a way to rapidly create a new table based on selected columns and rows of an existing table using a subquery. The inner query is always executed first by the RDBMS. In this case, the new table copies the attribute names, data characteristics, and rows of the original table as retrieved by the subquery.
CREATE TABLE new_table AS
(SELECT
col1, col2, col3
FROM
existing_table
WHERE
condition);
CHANGING DATA TYPE OF A COLUMN
If the column to be changed already contains data, you can make changes in the column’s characteristics if those changes do not alter the data type.
ALTER TABLE table_name
ALTER COLUMN column_name1 [DATA] TYPE new_data_type,
ALTER COLUMN column_name2 [DATA] TYPE new_data_type;
/*
–Example of changing the data type of
- custmer_id from SERIAL to INT
- customer_name from VARCHAR(50) to VARCHAR(100)
*/
ALTER TABLE customer
ALTER COLUMN customer_id TYPE INT
ALTER COLUMN customer_name TYPE VARCHAR(100);
ADDING/DELETE A COLUMN
When adding a column, be careful not to include the NOT NULL clause for the new column. Doing so will cause an error message; if you add a new column to a table that already has rows, the existing rows will default to a value of null for the new column. Therefore, it is not possible to add the NOT NULL clause for this new column. (Of course, you can add the NOT NULL clause to the table structure after all the data for the new column has been entered and the column no longer contains nulls.)
ALTER TABLE customers
ADD COLUMN fax VARCHAR (15),
ADD COLUMN business_phone VARCHAR (15);
– Delete a column
ALTER TABLE customers
DROP COLUMN fax;